The marvels of Python’s Matplotlib and Pandas

Today I will tell you a bit about my trip to the world of Python for scientific usage, and about two of the libraries that I found more amazing in general: Mathplotlib and Pandas.

Matplotlib is an amazing 2D and 3D graphics library for generating matematical plots.It includes:

  • Support for LATEX formatted labels and texts
  • Great control of every element in a figure, including figure size and DPI.
  • High-quality output in several formats, including PNG, PDF, SVG, EPS, and PGF

Pandas is the library of  reference for structures to perfom scientific data analysis. It has three major types of data structures  :

  • 1D : The Series (numerically indexed 0,1,2… by default)
  • 2D : The DataFrame, something that you can see as a spreadsheed
  • 3D : The Panel That you can see as a set of spreadsheets

Note: I will use IPython notebooks and I will talk later about that. But I use the crayon plugin to present the code snippets in this blog, so this does not look like the real thing. More on that later.

A quick look at matplotlib, later we will see some magics in pandas: The first thing that I like to do to my IPython notebooks is to tell that I want all generated plots inline. To do this, you need to use the following magic:

to have all your plots inline in the notebook.   So, let’s have a bit of simple fun by plotting a cosine…

Another quick example: Use another structure (numpy.ndarray),  slightly different, add a title and change the colour of the line being plotted.

And this is the result:

Let us now have a look at Pandas and see what it proposes. Among the other things, it has a class called DataReader that I did not mention before but that I believe you will like a lot.

This is what the code snippet returns…

Wow. the DataReader class connects to yahoo finance and gets me the stock data for my selected security in the time interval that I specified! Furthermore, I get the data nicely organized in a 2D structure. Let us see what we can do with this and what we have seen previously with mathplotlib. Guess what, I will be plotting directly from the fiat_chrysler structure. And this time I will also add a legend to my plot…

That what this gives:
The loc parameter  I passed to the plot legend is the corner where I want the legend to appear. We may not like so many lines on the same graph, we may like it splitted in columns, like this:

This is the result. No fancy stuff, this is just an example of the power of these libraries.

A final word of thanks to Olivier Ricou, of EPITA, whose MOOC about Scientific Python I mention in the useful links, and also to Kevin Sheppard from Oxford University, for his Python for Econometrics course. Another, extremely interesting resource, is the site by J.R. Johansson Introduction to scientific programming with Python