sparkplot: creating sparklines with matplotlib
Edward Tufte introduced sparklines in a sample chapter of his upcoming book "Beautiful Evidence". In his words, sparklines are "small, high-resolution graphics embedded in a context of words, numbers, images. Sparklines are data-intense, design-simple, word-sized graphics." Various implementations of sparklines that use different graphics packages have already been published, and probably the best known one is a PHP implementation that uses the GD backend. In what follows, I'll show some examples of sparkline graphics created via matplotlib by the sparkplot Python module I wrote.
Example 1
Since a picture is worth a thousand words, here is the Los Angeles Lakers' road to their NBA title
in 2002. Wins are pictured with blue bars and losses with red bars. Note how easy it is to see the streaks for wins and losses.
The Lakers' 2004
season was their last with Shaq, when they reached the NBA finals and lost to Detroit (note the last 3 losses which sealed their fate in the finals).
Compare those days of glory with their abysmal
2005 performance, with only 2 wins in the last 21 games. Also note how the width of the last graphic is less than the previous 2, a consequence of the Lakers not making the playoffs this year.
Example 2
The southern oscillation is defined as the barometric pressure difference between Tahiti and the Darwin Islands at sea level. The southern oscillation is a predictor of El Nino which in turn is thought to be a driver of world-wide weather. Specifically, repeated southern oscillation values less than -1 typically defines an El Nino.
Here is a sparkline for the southern oscillation from
1955
to 1992 (456 sample data points obtained from NIST). The sparkline is plotted with a horizontal span drawn along the x axis covering data values between -1 and 0, so that values less than -1 can be more clearly seen.
Example 3
Here is the per capita income in California from 1959
to 2003.
And here is the "real" per capita income (adjusted for inflation) in California, from 1959
to 2003.
Example 4
Here is the monthly distribution of messages sent to comp.lang.py from 1994 to 2004, plotted per year. Minimum and maximum values are shown with blue dots and labeled in the graphics.
There was an almost constant increase in the number of messages per year, from 1994
to 2004, the only exception being 2004, when there were fewer message than in 2002 and 2003.
Details on using sparkplot
1) Install the Numeric Python module (required by matplotlib)
2) Install matplotlib
3) Prepare data files: sparkplot simplistically assumes that its input data file contains just 1 column of numbers
4) Run sparkplot.py. Here are some command-line examples to get you going:
- given only the input file and no other option, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red:
sparkplot.py -i CA_real_percapita_income.txt
produces:
The name of the output file is by default.png. It can be changed with the -o option.
The plotting of the first and last data points can be disabled with the --noplot_first and --noplot_last options.
- given the input file and the label_first, label_last, format=currency options, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red and with the first and last data values displayed in a currency format:
sparkplot.py -i CA_real_percapita_income.txt --label_first --label_last --format=currency
produces:
The currency symbol is $ by default, but it can be changed with the --currency option.
- given the input file and the plot_min, plot_max, label_min, label_max, format=comma options, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red, with the min. and max. data points plotted in blue, and with the min. and max. data values displayed in a 'comma' format (e.g. 23,456,789):
sparkplot.py -i clpy_1997.txt --plot_min --plot_max --label_min --label_max --format=comma
produces:
- given the input file and the type=bars option, sparkplot.py will draw blue bars for the positive data values and red bars for the negative data values:
sparkplot.py -i lakers2005.txt --type=bars
produces:
As a side note, I think bar plots look better when the data file contains a relatively large number of data points, and the variation of the data is relatively small. This type of plots works especially well for sports-related graphics, where wins are represented as +1 and losses as -1.
- for other options, run sparkplot.py -h
I hope the sparkplot module will prove to be useful when you need to include sparkline graphics in your Web pages. All the caveats associated with alpha-level software apply :-) Let me know if you find it useful. I'm very much a beginner at using matplotlib, and as I become more acquainted with it I'll add more functionality to sparkplot.
Finally, kudos to John Hunter, the creator of matplotlib. I found this module extremely powerful and versatile. For a nice introduction to matplotlib, see also John's talk at PyCon05.
Note: the Blogger template system might have something to do with the fact that the graphics are shown with a border; when included in a "normal", white-background HTML page, there is no border and they integrate more seamlessly into the text.
Update 5/2/05: Thanks to Kragen Sitaker for pointing out a really simple solution to the "borders around images" problem -- just comment out the CSS definition for .post img in the Blogger template.
Example 1
Since a picture is worth a thousand words, here is the Los Angeles Lakers' road to their NBA title
in 2002. Wins are pictured with blue bars and losses with red bars. Note how easy it is to see the streaks for wins and losses.The Lakers' 2004
season was their last with Shaq, when they reached the NBA finals and lost to Detroit (note the last 3 losses which sealed their fate in the finals).Compare those days of glory with their abysmal
2005 performance, with only 2 wins in the last 21 games. Also note how the width of the last graphic is less than the previous 2, a consequence of the Lakers not making the playoffs this year.Example 2
The southern oscillation is defined as the barometric pressure difference between Tahiti and the Darwin Islands at sea level. The southern oscillation is a predictor of El Nino which in turn is thought to be a driver of world-wide weather. Specifically, repeated southern oscillation values less than -1 typically defines an El Nino.
Here is a sparkline for the southern oscillation from
1955
to 1992 (456 sample data points obtained from NIST). The sparkline is plotted with a horizontal span drawn along the x axis covering data values between -1 and 0, so that values less than -1 can be more clearly seen.Example 3
Here is the per capita income in California from 1959
to 2003.And here is the "real" per capita income (adjusted for inflation) in California, from 1959
to 2003.Example 4
Here is the monthly distribution of messages sent to comp.lang.py from 1994 to 2004, plotted per year. Minimum and maximum values are shown with blue dots and labeled in the graphics.
| Year | Total | |
| 1994 | ![]() | 3,018 |
| 1995 | ![]() | 4,026 |
| 1996 | ![]() | 8,378 |
| 1997 | ![]() | 12,910 |
| 1998 | ![]() | 19,533 |
| 1999 | ![]() | 24,725 |
| 2000 | ![]() | 42,961 |
| 2001 | ![]() | 55,271 |
| 2002 | ![]() | 56,750 |
| 2003 | ![]() | 64,548 |
| 2004 | ![]() | 56,184 |
There was an almost constant increase in the number of messages per year, from 1994
to 2004, the only exception being 2004, when there were fewer message than in 2002 and 2003.Details on using sparkplot
1) Install the Numeric Python module (required by matplotlib)
2) Install matplotlib
3) Prepare data files: sparkplot simplistically assumes that its input data file contains just 1 column of numbers
4) Run sparkplot.py. Here are some command-line examples to get you going:
- given only the input file and no other option, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red:
sparkplot.py -i CA_real_percapita_income.txt
produces:

The name of the output file is by default
The plotting of the first and last data points can be disabled with the --noplot_first and --noplot_last options.
- given the input file and the label_first, label_last, format=currency options, sparkplot.py will generate a gray sparkline with the first and last data points plotted in red and with the first and last data values displayed in a currency format:
sparkplot.py -i CA_real_percapita_income.txt --label_first --label_last --format=currency
produces:

The currency symbol is $ by default, but it can be changed with the --currency option.
sparkplot.py -i clpy_1997.txt --plot_min --plot_max --label_min --label_max --format=comma
produces:

- given the input file and the type=bars option, sparkplot.py will draw blue bars for the positive data values and red bars for the negative data values:
sparkplot.py -i lakers2005.txt --type=bars
produces:

As a side note, I think bar plots look better when the data file contains a relatively large number of data points, and the variation of the data is relatively small. This type of plots works especially well for sports-related graphics, where wins are represented as +1 and losses as -1.
- for other options, run sparkplot.py -h
I hope the sparkplot module will prove to be useful when you need to include sparkline graphics in your Web pages. All the caveats associated with alpha-level software apply :-) Let me know if you find it useful. I'm very much a beginner at using matplotlib, and as I become more acquainted with it I'll add more functionality to sparkplot.
Finally, kudos to John Hunter, the creator of matplotlib. I found this module extremely powerful and versatile. For a nice introduction to matplotlib, see also John's talk at PyCon05.
Note: the Blogger template system might have something to do with the fact that the graphics are shown with a border; when included in a "normal", white-background HTML page, there is no border and they integrate more seamlessly into the text.
Update 5/2/05: Thanks to Kragen Sitaker for pointing out a really simple solution to the "borders around images" problem -- just comment out the CSS definition for .post img in the Blogger template.














18 Comments:
Where is your email address?
By
Anonymous, at 9:34 AM
My email address is grig at gheorghiu dot net.
By
Grig Gheorghiu, at 9:38 AM
Grig,
Thank you for the script! I think datavisualization is really neat and combined with datablogging can be powerful.
By
T, at 2:59 PM
T, I'm glad you found my script useful. I followed the link to your blog and I saw your entry (http://t.clant2k.com/?p=74). Trading seems to be the "killer app" for sparklines :-)
By
Grig Gheorghiu, at 8:40 AM
Sparkplot is very nice.
I have a suggestion:
Matplotlib allows for the control of the transparency of the background:
ax = subplot(111)
fr = ax.get_frame()
fr.set_alpha(0.5)
This might be nice for non-white background pages.
One question:
Do you plan to add support to other plot types?
By
usagi, at 11:14 AM
It puts things in perspective that's for sure! :)
By
T, at 11:41 AM
> I have a suggestion:
> Matplotlib allows for the control of > the transparency of the background:
usagi -- that's a great idea, and I'll surely include it in the next version.
As for other types of plots, I haven't seen any others so far that are useful for sparklines. Do you have anything in mind?
By
Grig Gheorghiu, at 3:09 PM
> Do you have anything in > mind?
I haven't given much thought to this subject but it might be interesting to add (as an option) dotted 95% interval lines.
Another type of plot interesting would be stacked bar plots which matplolib provides also.
keep up the good work,
cheers,
By
usagi, at 9:33 AM
very cool concept .. great work!
By
moo, at 12:40 PM
Very nice libary!
Are there other plotting libraries or scripts inspired by Edward Tufte's books?
By
Anonymous, at 10:12 PM
moo is right.
excellent work!!!
By
Packservice, at 2:27 PM
Realy nice...got a download to offer? ...just for trial of course, to get a better overview on usage.
thx in advance,
p.t.
By
progressive_trance, at 11:45 AM
very cool concept .. great work!
By
mooo man, at 8:21 AM
Thank you, very interesting!
By
tom, at 7:55 PM
I've loved these little graphics since I first saw them in a Tufte seminar in '99.
FWIW, I'd like to see the LA Lakers sparklines stacked like you did in example 4. It would be far easier to do visual comparisons of their seasons.
By
Glenn, at 9:14 AM
As for another graph type, I'm rather fond of the bar with underline type - its a bar graph similar to the Lakers results, but it also has a second set of binary data that draws an horizontal line in the center of the graph when the data value is 1.
HardballTimes uses the bar display similar to how you used it - show team wins and losses. The underline component is used to indicate home game versus away game.
Of the libraries I've checked, the PHP implementation is the only one I've found that supports this version, which is what The Hardball Times looks to be using as well.
By
Eric, at 11:30 PM
Sparklines for Excel:
* Line, column, pie and bar charts
* Sparklines with rich formatting
* Bullet graphs for dashboard gauges
* Costing as low as $49
http://www.bonavistasystems.com
By
Anonymous, at 10:21 AM
Note that if you call import sparkplot and call it multiples times from Python you will get subsequent plots over the top of previous plots. This problem isn't what you want. You can fix this by adding clf()
after the line fig = figure(...)
By
Adrian Skilling, at 2:19 AM
Post a Comment
Links to this post:
Create a Link
<< Home