A Great R Resource

Bookdown is a free and open-source R package built on top of R Markdown to make it really easy to write books and articles/reports. The website bookdown.org is a service to host books. It is free to publish the static output files of the books, and the author retains the copyright on the books. As …

More

Statistical Learning

This resource is totally free, and consists in a course based on a book which is itself totally free and available. I got to know it while browsing  the  forums discussions on Coursera Data Science Discussions. Somebody in a discussion compared this course to the Machine Learning Course by Andrew Ng, and added that this …

More

Forecasting using R

Rob J. Hyndman is Professor of Statistics in the Department of Econometrics and Business Statistics at Monash University. He, together with George Athanasopoulos, has published the freely available book “Forecasting: Principles and Practice”, that can be found here or bought in its paper version at Amazon (amazon.com, amazon.com.uk, amazon.fr) or in its electronic version at …

More

Working with SQLite in R

In the words of its creators, SQLite is a self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine. And apparently is the most used in the world. Libraries exist for interfacing R with SLQLite, the minimum requirement being DBI (A Common Database Interface) and RSQLite (SQLite interface for R). The keyword here is “embedded”. You do …

More

Data Science Specialization it is!

I just went through the experience of completing the Johns Hopkins University Data Science Specialization on Coursera. The last course of this specialization was the Capstone project, which consists basically in learning about a new subject, Natural Language Processing (or NLP in short) and producing a Shiny application hosted on Shinyapps.io that predicts the next word a …

More

R Rants…

This JH Data Science capstone project is transforming into a nightmare, especially because of R and tm, which do not do what they are supposed to do. True I have changed architecture and PC in the middle, but this is not the problem.  R and R packages like tm evolve fast and sometimes too fast …

More

Tabular output in R

R provides several libraries to format tabular output. As I have had the problem of finding one that worked well in all occasions and without  too much hassle, I would like to compare, from the point of view of aesthetic and ease of use.  I will not use any of the options and parameters. Just …

More

NLP: Language Detection in R

I have played a bit with two language detection libraries in R, without going too much in the details of how they work.  These are:   textcat CLRD The second package does not seem actively maintained, as the last update (version 1.1.0) is now over three years old. It can be however obtained and installed …

More

Speak Like a doctor: Basic NPL in R

I am now aiming at the Capstone project in the Coursera’s Data Science Specialization from Johns Hopkins, to finish the Specialization. The project is focused on predicting the next word that somebody is going to type, based on several databases to be used to build up the prediction algorithm. There is a lot of previous …

More