The Analytics Edge MOOC on Edx

After the MOOC “Data Visualization”, I joined a discussion on “What MOOC are you  going to take next”. A participant to the discussion mentioned the  MIT’s “The Analytics Edge” course, on edx.org, as being one of the nicest he had taken. The name was intriguing, the description too. I decided to have a look.

Dates: The course I joined was due to finish only a few days after I joined, on 19 August 2015. However, the material remained free to access except the assignments. A re-run of the course is expected in Spring 2016, and I am really looking forwards to it. In the mean time I could not help but watch the videos, and answer the quick questions because the curiosity kept growing.

The course is indeed a very nice one. It is structured in units, every unit contains videos inter mixed with quick quick questions’ sessions.  At the end of the units there is a recitation, this time with no quick questions’ sessions. As the last thing to do in the unit there is the  assignment, and this is accessible only during active sessions.

The modules are structured around real life examples. Theory is explained and then aIl the development of the lesson is performed in R. I cannot help but mention Unit 3, where you learn about Logistic Regression and you do this exploring “The Farmingham Heart Study”, which has revolutionised the field of prevention of Cardio-Vascular Diseases. Also in Unit, the lesson on “Moneyball” is memorable. All the data that subject of a lesson is downloadable and all the R commands are collected in a downloadable script. I prefer to type the commands together with the instructor as they are explained.

Unit 4 is about Classification Trees, also very interesting. Another great point of the course is the “Kaggle Competition” where students enrolled in the course can compete to build the best logistic regression model on data provided for this purpose. I have participated to similar competitions in the “Text Mining and Analytics” and “Text Retrieval and Search Engines” courses and I am not particularly fond of these types of competitions as often it results in a frantic series of submissions of scripts where the only difference is the tuning of one or more parameters until you make it close enough to the top of the table.  However it is anyway a tool to learn how the particular parameters affect the performances of the models (learning the hard way).

The units are 9, with 8 being Data Visualization and 9 being Integer Optimization. I believe that this course has been put together with  an exceptional quality and it is of a great value to whoever wants to approach the world of Data Science.