Monday, July 09, 2012

Installing Python scientific and statistics packages on Ubuntu

I tried to install the pandas Python library a while ago using easy_install/pip and I hit some roadblocks when it came to installing all the dependencies. So I tried it again, but this time I tried to install most of the required packages from source. Here are my notes, hopefully they'll be useful to somebody out there.

This is on an Ubuntu 12.04 machine.

Install NumPy

# wget http://downloads.sourceforge.net/project/numpy/NumPy/1.6.2/numpy-1.6.2.tar.gz
# tar xvfz numpy-1.6.2.tar.gz; cd numpy-1.6.2
# cat INSTALL.txt
# apt-get install libatlas-base-dev libatlas3gf-base
# apt-get install python-dev
# python setup.py install



Install SciPy


# wget http://downloads.sourceforge.net/project/scipy/scipy/0.11.0b1/scipy-0.11.0b1.tar.gz
# tar xvfz scipy-0.11.0b1.tar.gz; cd scipy-0.11.0b1/
# cat INSTALL.txt
# apt-get install gfortran g++
# python setup.py install


Install pandas


Prereq #1: NumPy 

- already installed (see above)

Prereq #2: python-dateutil

# wget http://labix.org/download/python-dateutil/python-dateutil-1.5.tar.gz
# tar xvfz python-dateutil-1.5.tar.gz; cd python-dateutil-1.5/
# python setup.py install



Prereq #3: pyTables (optional, needed for HDF5 support)

pyTables was the hardest package to install, since it has its own many dependencies:

numexpr

# wget http://numexpr.googlecode.com/files/numexpr-1.4.2.tar.gz
# tar xvfz numexpr-1.4.2.tar.gz; cd numexpr-1.4.2/
# python setup.py install


Cython

# wget http://www.cython.org/release/Cython-0.16.tar.gz
# tar xvfz Cython-0.16.tar.gz; cd Cython-0.16/
#python setup.py install


HDF5

# wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.9.tar.gz
# tar xvfz hdf5-1.8.9.tar.gz; cd hdf5-1.8.9/
# ./configure --prefix=/usr/local
# make; make install


pyTables itself

# wget http://downloads.sourceforge.net/project/pytables/pytables/2.4.0b1/tables-2.4.0b1.tar.gz
# tar xvfz tables-2.4.0b1.tar.gz; cd tables-2.4.0b1/
# python setup.py install


Edit 07/10/12: statsmodels is not a prereq, see below.

Prereq #4: statsmodels


Wasn't able to install it, it said 'requires pandas' but this is what I tried:


# wget http://pypi.python.org/packages/source/s/statsmodels/statsmodels-0.4.3.tar.gz
# tar xvfz statsmodels-0.4.3.tar.gz; cd statsmodels-0.4.3/
# python setup.py install --> requires pandas?


Prereq #4: pytz

# wget http://pypi.python.org/packages/source/p/pytz/pytz-2012c.tar.gz
# tar xvfz pytz-2012c.tar.gz; cd pytz-2012c/
# python setup.py install


Prereq #5: matplotlib

This was already installed on my target host during the EC2 instance bootstrap via Chef: 

# apt-get install python-matplotlib

pandas itself

# git clone git://github.com/pydata/pandas.git
# cd pandas
# python setup.py install

NOTE: Ralf Gommers added a comment that statsmodels is not a prerequisite to pandas, but instead needs to be installed once pandas is there. So I did this:

Install statsmodels

# wget http://pypi.python.org/packages/source/s/statsmodels/statsmodels-0.4.3.tar.gz
# tar xvfz statsmodels-0.4.3.tar.gz; cd statsmodels-0.4.3/
# python setup.py install


Finally, if you also want to dabble into machine learning algorithms:

Install scikit-learn

# wget http://pypi.python.org/packages/source/s/scikit-learn/scikit-learn-0.11.tar.gz
# tar xvfz scikit-learn-0.11.tar.gz; cd scikit-learn-0.11/
# python setup.py install

11 comments:

Ralf Gommers said...

Statsmodels is not a prereq for pandas, but pandas is for statsmodels. So you should be able to install it now.

Anonymous said...

You can install the ubuntu packages and update everything with pip in a virtualenv. I think there is no need to install it directly from source.

Grig Gheorghiu said...

Ralf -- thanks for the comment, I updated the post.

Anonymous -- I tried the pip/easy_install route, hit a lot of roadblocks, had to give up. If you get it to work I'd appreciate some pointers on how you did it.

Grig

matt harrison said...

On my 10.04 machine, I was able to

pip -U numpy; pip -U pandas


But I needed to get scipy installed by a non-pip mechanism. Somewhat annoying, but it could be worse (MacOS...)

Grig Gheorghiu said...

Thanks, Matt! Yes, you're right, we need to count our blessings ;-)

Anonymous said...

I wasn't able to install pandas 0.8.1 by this route, on Ubuntu 12.04. Fortunately the Neurodebian folks are maintaining a compatible version here:
http://neuro.debian.net/index.html#how-to-use-this-repository

After adding a repo it's just and apt-get install python-pandas away :)

(Thanks Neurodebian!)

Anonymous said...

Thanks very much for this post. I was struggling a lot to install scipy. Your post REALLY helped

Anonymous said...

Thanks very much for this post. I was struggling a lot to install scipy. Your post REALLY helped

Anonymous said...

Thank you very much for your clear instruction of installation which helped me to install scipy 0.11.0 which otherwise could have been difficult!

dbv said...

Great post. Couple of questions:

a. When installing with source files from sourceforge, what directory is numpy and scipy installed in?
b. Can the latest releases of numpy (1.7) and scipy (0.11) be installed with this method or will Ubuntu complain as it is usually a few releases behind?

Many thanks!

chris grijalva said...

A couple of minor updates:
HDF5 is 1.8.10
pytables is out of beta, 2.4.0

Thanks Grig, a nice post.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...