Today I will tell you a bit about my trip to the world of Python for scientific usage, and about two of the libraries that I found more amazing in general: Mathplotlib and Pandas.
Matplotlib is an amazing 2D and 3D graphics library for generating matematical plots.It includes:
- Support for LATEX formatted labels and texts
- Great control of every element in a figure, including figure size and DPI.
- High-quality output in several formats, including PNG, PDF, SVG, EPS, and PGF
Pandas is the library of reference for structures to perfom scientific data analysis. It has three major types of data structures :
- 1D : The
Series
(numerically indexed 0,1,2… by default) - 2D : The
DataFrame,
something that you can see as a spreadsheed - 3D : The
Panel
That you can see as a set of spreadsheets
Note: I will use IPython notebooks and I will talk later about that. But I use the crayon plugin to present the code snippets in this blog, so this does not look like the real thing. More on that later.
A quick look at matplotlib, later we will see some magics in pandas: The first thing that I like to do to my IPython notebooks is to tell that I want all generated plots inline. To do this, you need to use the following magic:
1 |
%matplotlib inline |
to have all your plots inline in the notebook. So, let’s have a bit of simple fun by plotting a cosine…
1 2 3 4 5 6 7 8 9 10 |
from numpy import * t = arange(-pi,pi,0.1) figure(figsize=(13,3), dpi=100) plot(cos(t), 'g') grid() #Well, I want to show this to my readers, so let us save it as an image... |
Another quick example: Use another structure (numpy.ndarray), slightly different, add a title and change the colour of the line being plotted.
1 2 3 4 5 6 7 8 9 10 11 |
%matplotlib inline import numpy as np import matplotlib.pyplot as plt x = np.linspace(0, 3*np.pi, 500) plt.plot(x, np.sin(x**2),'-r') plt.title('What a nice one') plt.savefig("WhatANiceOne.jpg") |
And this is the result:
Let us now have a look at Pandas and see what it proposes. Among the other things, it has a class called DataReader that I did not mention before but that I believe you will like a lot.
1 2 3 4 5 6 7 8 9 10 |
from pandas.io.data import DataReader from datetime import * start = datetime(2015,5,1) # Some Date manipulation... end = datetime.now() fiat_chrysler = DataReader("FCAU","yahoo",start,end) print type(fiat_chrysler) print fiat_chrysler # Just to see what do we get... |
This is what the code snippet returns…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
class 'pandas.core.frame.DataFrame' Open High Low Close Volume Adj Close Date 2015-05-01 14.76 14.93 14.40 14.65 7622400 14.65 2015-05-04 14.98 15.00 14.73 14.80 5167800 14.80 2015-05-05 15.01 15.06 14.63 14.70 6005900 14.70 2015-05-06 14.73 14.78 14.19 14.33 9511800 14.33 2015-05-07 14.66 14.85 14.62 14.83 9338600 14.83 2015-05-08 14.99 15.25 14.95 15.13 3712600 15.13 2015-05-11 14.94 15.07 14.90 14.98 4128300 14.98 2015-05-12 15.13 15.13 14.75 14.80 5867000 14.80 2015-05-13 15.14 15.24 14.99 15.04 3652000 15.04 2015-05-14 15.23 15.35 15.19 15.31 4696700 15.31 2015-05-15 15.52 15.57 15.39 15.45 2852600 15.45 2015-05-18 15.40 15.63 15.34 15.47 4846400 15.47 2015-05-19 15.53 15.63 15.47 15.60 3292100 15.60 2015-05-20 15.63 15.82 15.58 15.75 5311000 15.75 [14 rows x 6 columns] |
Wow. the DataReader class connects to yahoo finance and gets me the stock data for my selected security in the time interval that I specified! Furthermore, I get the data nicely organized in a 2D structure. Let us see what we can do with this and what we have seen previously with mathplotlib. Guess what, I will be plotting directly from the fiat_chrysler structure. And this time I will also add a legend to my plot…
1 2 3 4 5 6 |
plt.plot(fiat_chrysler.High,'g', label="High") plt.plot(fiat_chrysler.Low,'r', label="Low") plt.plot(fiat_chrysler.Close,'b',label="Closing") plt.title("High, Low and Closing for FCAU") plt.legend(loc=4) plt.savefig("FCAU.png") |
That what this gives:
The loc parameter I passed to the plot legend is the corner where I want the legend to appear. We may not like so many lines on the same graph, we may like it splitted in columns, like this:
1 2 3 4 5 6 7 8 9 10 |
fig, axes = plt.subplots(nrows=1, ncols=3) for ax in axes: ax.set_xlabel('x') ax.set_ylabel('y') ax.set_title('title') axes[0].plot(fiat_chrysler.High, 'g') axes[1].plot(fiat_chrysler.Low, 'r') axes[2].plot(fiat_chrysler.Close, 'b') plt.savefig("3Columns.png") |
This is the result. No fancy stuff, this is just an example of the power of these libraries.
A final word of thanks to Olivier Ricou, of EPITA, whose MOOC about Scientific Python I mention in the useful links, and also to Kevin Sheppard from Oxford University, for his Python for Econometrics course. Another, extremely interesting resource, is the site by J.R. Johansson Introduction to scientific programming with Python