Book Review: Data Science from Scratch (Joel Grus)

datasciencefromscratchThis book by Joel Grus will give you an insight into the main topics of the Data Scientist daily work, in 25 chapters and around 300 pages. The book syllabus is available here, and you can explore it. If you buy from O’Reilly you can use the discount code on his web page (too bad I did not use it as I bought the book directly from Amazon. I liked the idea of this book too much). There are materials from the book available on github and on Joel’s page, and links to various resources.

After the initial and mandatory introduction Joel takes you through your days at Datasciencester where you have just been hired as a Data Scientist. In fact Datasciencester IS the social network of the data scientists. Joel takes you through your daily tasks during your first day at work, he uses Python  to explain in a simple and clear way the algorithms that you will need to use and it leads you through the code. If you do not know Python, he has designed Chapter 2 for you (A Crash Course in Python). However, already during chapter one you have performed some not-so-trivial tasks with the language. Chapter three is a quick introduction to Data Visualization, I find it a bit essential, and I would recommend to go to the mathplotlib web page  for very detailed information and examples. (this is version 1.3.1 that should be OK for Python 2.7 used in the book, there are some changes and slight incompatibilities in the latest version of the library, 1.4.3).

Joel then tackles Linear Algebra (Chapter 4) to then dive into Statistics (Chapter 5) and Probability (Chapter 6). All of these are extremely well presented and I think are forming the launch platform for the rest of the book.  From that point on is a growing catalogue of recipes, tools and concepts that culminate in chapter 25: Go Forth and do Data Science. I do not want to repeat what is already presented in the Index, but I must say that all major concepts and Python libraries and ways to implement them find a place in this book, a must have for any Data Scientist to be. This book is great stuff. And if for any reasons you write to Joel, please remember that he does not like R so much 🙂

Leave a Reply