This MOOC is conceptually one of the most interesting ones that I have taken to date. It is based around the implementation of the concept of “Literate Programming”, introduced by Donald Knuth in his 1992 book, in which basically a system where documentation and “live” source code are presented in the same document. In the course, this concept is presented through “knitr“, a library which is really well integrated into RStudio. Another implementation of this concept which I have explored in this blog (and outside) is the jupyter project, which allows to do similar things in a web environment and giving the possibility to select in a vast choice of programming languages. In knitr the document is written in R markdown (with the possibility of html or Latex), while the code snippets are in R, in jupyter the document bits, in this case the are called cells, are markdown (I have not explored anything else) and the code cells depend on the selected script engine.
The concept of Reproducible research is of a fundamental importance in today’s world. The impact of an error in a study can be devastating in economic terms (economic policies: see this) and even in human health or human lives (medicament trials: see this). Making your research documented and available to other research teams makes sure not only the findings are proven by independent teams, but also that if the findings are important, they will proceed to the next levels in a faster way.
Reproducible research is the process in which you (in short):
- Describe what is your purpose
- Document your data sources (what, where, when)
- Document your data transformations
- Document your research
- Describe your findings
- And provide the code for all of these
So that another team (or even yourself some time later) can easily resume from where you left.
I do not see any reason for any research paper that is published today to not have these characteristics, unless data protection is needed – copyrights, property or security/safety reasons exist.