Thierry Carrez, who works in the Ubuntu Server team, has a great series of blog posts on how to run your own Ubuntu Enterprise Cloud. I haven't had a chance to try this yet, but it's high on my TODO list. Thierry uses the Ubuntu Enterprise Cloud product (which has been part of Ubuntu server starting with 9.04) together with Eucalyptus. Here are the links to Thierry's posts:
Tuesday, October 13, 2009
Friday, October 09, 2009
Compiling, installing and test-running Scribe
I went to the Hadoop World conference last week and one thing I took away was how Facebook and other companies handle the problem of scalable logging within their infrastructure. The solution found by Facebook was to write their own logging server software called Scribe (more details on the FB blog).
Scribe is mentioned in one of the best presentations I attended at the conference -- 'Hadoop and Hive Development at Facebook' by Dhruba Borthakur and Zheng Shao. If you look at page 4, you'll see the enormity of the situation they're facing: 4 TB of compressed data (mostly logs) handled every day, and 135 TB of compressed data scanned every day. All this goes through Scribe, so that gives me a warm fuzzy feeling that it's indeed scalable and robust. For more details on Scribe, see the wiki page of the project. It's my intention here to detail the steps needed for compiling and installing it, since I found that to be a non-trivial process to say the least. I'm glad Facebook open-sourced Scribe, but its packaging could have been a bit more straightforward. Anyway, here's what I did to get it to run. I followed roughly the same steps on Ubuntu and on Gentoo.
1) Install pre-requisite packages
On Ubuntu, I had to install the following packages via apt-get: g++, make, build-essential, flex, bison, libtool, mono-gmcs, libevent-dev.
2) Install the boost libraries
Very important: scribe needs boost 1.36 or newer, so make sure you don't have older boost libraries already installed. If you install libboost-* in Ubuntu, it tries to bring down 1.34 or 1.35, which will NOT work with scribe. If you have libboost-* already installed, you need to uninstall them. Now. Trust me, I spent several hours pulling my hair on this one.
- download the latest boost source code from SourceForge (I got boost 1.40 from here)
- untar it, then cd into the boost directory and run:
$ ./boostrap.sh
$ ./bjam
$ sudo ./bjam install
3) Install thrift and fb303
- get thrift source code with git, compile and install:
$ git clone git://git.thrift-rpc.org/thrift.git
$ cd thrift
$ ./bootstrap.sh
$ ./configure
$ make
$ sudo make install
- compile and install the Facebook fb303 library:
$ cd contrib/fb303
$ ./bootstrap.sh
$ make
$ sudo make install
- install the Python modules for thrift and fb303:
$ cd TOP THRIFT DIRECTORY
$ cd lib/py
$ sudo python setup.py install
$ cd TOP THRIFT DIRECTORY
$ cd contrib/fb303/py
$ sudo python setup.py install
To check that the python modules have been installed properly, run:
$ python -c 'import thrift' ; python -c 'import fb303'
4) Install Scribe
- download latest source code from SourceForge (I got it from here)
- untar, then run:
$ cd scribe
$ ./bootstrap.sh
$ make
$ sudo make install
$ sudo ldconfig (this is necessary so that the boost shared libraries are loaded)
- install Python modules for scribe:
$ cd lib/py
$ sudo python setup.py install
- to test that scribed (the scribe server process) was installed correctly, just run 'scribed' at a command line; you shouldn't get any errors
- to test that the scribe Python module was installed correctly, run
$ python -c 'import scribe'
5) Initial Scribe configuration
- create configuration directory -- in my case I created /etc/scribe
- copy one of the example config files from TOP_SCRIBE_DIRECTORY/examples/example*conf to /etc/scribe/scribe.conf -- a good one to start with is example1.conf
- edit /etc/scribe/scribe.conf and replace file_path (which points to /tmp) to a location more suitable for your system
- you may also want to replace max_size, which dictates how big the local files can be before they're rotated (by default it's 1 MB, which is too small -- I set it to 100 MB)
- run scribed either with nohup or in a screen session (it doesn't seem to have a daemon mode):
$ scribed -c /etc/scribe/scribe.conf
6) Test run
To test Scribe, you can install it on a remote machine, configure scribed on that machine to use a configuration file similar to examples/example2client.conf, then change remote_host in the config file to point to the central scribe server configured in step 5.
Once scribed is configured and running on the remote machine, you can test it with a nice utility written by Silas Sewell, called scribe_pipe. For example, you can pipe an Apache log file from the remote machine to the central scribe server by running:
cat apache_access_log | ./scribe_pipe apache.access
On the scribe server, you should see at this point a directory called apache.access under the main file_path directory, and files called apache.access_00000, apache.access_00001 etc (in chunks of max_size bytes).
I'll post separately about actually using Scribe in production. I hope this post will at least get you started on using Scribe and save you some headaches during its installation process.
Scribe is mentioned in one of the best presentations I attended at the conference -- 'Hadoop and Hive Development at Facebook' by Dhruba Borthakur and Zheng Shao. If you look at page 4, you'll see the enormity of the situation they're facing: 4 TB of compressed data (mostly logs) handled every day, and 135 TB of compressed data scanned every day. All this goes through Scribe, so that gives me a warm fuzzy feeling that it's indeed scalable and robust. For more details on Scribe, see the wiki page of the project. It's my intention here to detail the steps needed for compiling and installing it, since I found that to be a non-trivial process to say the least. I'm glad Facebook open-sourced Scribe, but its packaging could have been a bit more straightforward. Anyway, here's what I did to get it to run. I followed roughly the same steps on Ubuntu and on Gentoo.
1) Install pre-requisite packages
On Ubuntu, I had to install the following packages via apt-get: g++, make, build-essential, flex, bison, libtool, mono-gmcs, libevent-dev.
2) Install the boost libraries
Very important: scribe needs boost 1.36 or newer, so make sure you don't have older boost libraries already installed. If you install libboost-* in Ubuntu, it tries to bring down 1.34 or 1.35, which will NOT work with scribe. If you have libboost-* already installed, you need to uninstall them. Now. Trust me, I spent several hours pulling my hair on this one.
- download the latest boost source code from SourceForge (I got boost 1.40 from here)
- untar it, then cd into the boost directory and run:
$ ./boostrap.sh
$ ./bjam
$ sudo ./bjam install
3) Install thrift and fb303
- get thrift source code with git, compile and install:
$ git clone git://git.thrift-rpc.org/thrift.git
$ cd thrift
$ ./bootstrap.sh
$ ./configure
$ make
$ sudo make install
- compile and install the Facebook fb303 library:
$ cd contrib/fb303
$ ./bootstrap.sh
$ make
$ sudo make install
- install the Python modules for thrift and fb303:
$ cd TOP THRIFT DIRECTORY
$ cd lib/py
$ sudo python setup.py install
$ cd TOP THRIFT DIRECTORY
$ cd contrib/fb303/py
$ sudo python setup.py install
To check that the python modules have been installed properly, run:
$ python -c 'import thrift' ; python -c 'import fb303'
4) Install Scribe
- download latest source code from SourceForge (I got it from here)
- untar, then run:
$ cd scribe
$ ./bootstrap.sh
$ make
$ sudo make install
$ sudo ldconfig (this is necessary so that the boost shared libraries are loaded)
- install Python modules for scribe:
$ cd lib/py
$ sudo python setup.py install
- to test that scribed (the scribe server process) was installed correctly, just run 'scribed' at a command line; you shouldn't get any errors
- to test that the scribe Python module was installed correctly, run
$ python -c 'import scribe'
5) Initial Scribe configuration
- create configuration directory -- in my case I created /etc/scribe
- copy one of the example config files from TOP_SCRIBE_DIRECTORY/examples/example*conf to /etc/scribe/scribe.conf -- a good one to start with is example1.conf
- edit /etc/scribe/scribe.conf and replace file_path (which points to /tmp) to a location more suitable for your system
- you may also want to replace max_size, which dictates how big the local files can be before they're rotated (by default it's 1 MB, which is too small -- I set it to 100 MB)
- run scribed either with nohup or in a screen session (it doesn't seem to have a daemon mode):
$ scribed -c /etc/scribe/scribe.conf
6) Test run
To test Scribe, you can install it on a remote machine, configure scribed on that machine to use a configuration file similar to examples/example2client.conf, then change remote_host in the config file to point to the central scribe server configured in step 5.
Once scribed is configured and running on the remote machine, you can test it with a nice utility written by Silas Sewell, called scribe_pipe. For example, you can pipe an Apache log file from the remote machine to the central scribe server by running:
cat apache_access_log | ./scribe_pipe apache.access
On the scribe server, you should see at this point a directory called apache.access under the main file_path directory, and files called apache.access_00000, apache.access_00001 etc (in chunks of max_size bytes).
I'll post separately about actually using Scribe in production. I hope this post will at least get you started on using Scribe and save you some headaches during its installation process.
Tuesday, October 06, 2009
Brandon Burton on 'Automation is the cloud'
Great post from Brandon Burton, my ex-colleague at RIS/Reliam, on why automation is the foundation of cloud computing. Brandon discusses automation at various levels, starting with virtualization and networking, then moving up the layers and covering OS, configuration management and application deployment. Highly recommended.
Subscribe to:
Posts (Atom)
Modifying EC2 security groups via AWS Lambda functions
One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...
-
A short but sweet PM Boulevard interview with Jerry Weinberg on Agile management/methods. Of course, he says we need to drop the A and actu...
-
Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms inte...
-
Update 02/26/07 -------- The link to the old httperf page wasn't working anymore. I updated it and pointed it to the new page at HP. Her...