Monday, July 09, 2012

Installing Python scientific and statistics packages on Ubuntu

I tried to install the pandas Python library a while ago using easy_install/pip and I hit some roadblocks when it came to installing all the dependencies. So I tried it again, but this time I tried to install most of the required packages from source. Here are my notes, hopefully they'll be useful to somebody out there.

This is on an Ubuntu 12.04 machine.

Install NumPy

# wget http://downloads.sourceforge.net/project/numpy/NumPy/1.6.2/numpy-1.6.2.tar.gz
# tar xvfz numpy-1.6.2.tar.gz; cd numpy-1.6.2
# cat INSTALL.txt
# apt-get install libatlas-base-dev libatlas3gf-base
# apt-get install python-dev
# python setup.py install



Install SciPy


# wget http://downloads.sourceforge.net/project/scipy/scipy/0.11.0b1/scipy-0.11.0b1.tar.gz
# tar xvfz scipy-0.11.0b1.tar.gz; cd scipy-0.11.0b1/
# cat INSTALL.txt
# apt-get install gfortran g++
# python setup.py install


Install pandas


Prereq #1: NumPy 

- already installed (see above)

Prereq #2: python-dateutil

# wget http://labix.org/download/python-dateutil/python-dateutil-1.5.tar.gz
# tar xvfz python-dateutil-1.5.tar.gz; cd python-dateutil-1.5/
# python setup.py install



Prereq #3: pyTables (optional, needed for HDF5 support)

pyTables was the hardest package to install, since it has its own many dependencies:

numexpr

# wget http://numexpr.googlecode.com/files/numexpr-1.4.2.tar.gz
# tar xvfz numexpr-1.4.2.tar.gz; cd numexpr-1.4.2/
# python setup.py install


Cython

# wget http://www.cython.org/release/Cython-0.16.tar.gz
# tar xvfz Cython-0.16.tar.gz; cd Cython-0.16/
#python setup.py install


HDF5

# wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.9.tar.gz
# tar xvfz hdf5-1.8.9.tar.gz; cd hdf5-1.8.9/
# ./configure --prefix=/usr/local
# make; make install


pyTables itself

# wget http://downloads.sourceforge.net/project/pytables/pytables/2.4.0b1/tables-2.4.0b1.tar.gz
# tar xvfz tables-2.4.0b1.tar.gz; cd tables-2.4.0b1/
# python setup.py install


Edit 07/10/12: statsmodels is not a prereq, see below.

Prereq #4: statsmodels


Wasn't able to install it, it said 'requires pandas' but this is what I tried:


# wget http://pypi.python.org/packages/source/s/statsmodels/statsmodels-0.4.3.tar.gz
# tar xvfz statsmodels-0.4.3.tar.gz; cd statsmodels-0.4.3/
# python setup.py install --> requires pandas?


Prereq #4: pytz

# wget http://pypi.python.org/packages/source/p/pytz/pytz-2012c.tar.gz
# tar xvfz pytz-2012c.tar.gz; cd pytz-2012c/
# python setup.py install


Prereq #5: matplotlib

This was already installed on my target host during the EC2 instance bootstrap via Chef: 

# apt-get install python-matplotlib

pandas itself

# git clone git://github.com/pydata/pandas.git
# cd pandas
# python setup.py install

NOTE: Ralf Gommers added a comment that statsmodels is not a prerequisite to pandas, but instead needs to be installed once pandas is there. So I did this:

Install statsmodels

# wget http://pypi.python.org/packages/source/s/statsmodels/statsmodels-0.4.3.tar.gz
# tar xvfz statsmodels-0.4.3.tar.gz; cd statsmodels-0.4.3/
# python setup.py install


Finally, if you also want to dabble into machine learning algorithms:

Install scikit-learn

# wget http://pypi.python.org/packages/source/s/scikit-learn/scikit-learn-0.11.tar.gz
# tar xvfz scikit-learn-0.11.tar.gz; cd scikit-learn-0.11/
# python setup.py install

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...