Tabular output in R

R provides several libraries to format tabular output. As I have had the problem of finding one that worked well in all occasions and without  too much hassle, I would like to compare, from the point of view of aesthetic and ease of use.  I will not use any of the options and parameters. Just …

More

NLP: Language Detection in R

I have played a bit with two language detection libraries in R, without going too much in the details of how they work.  These are:   textcat CLRD The second package does not seem actively maintained, as the last update (version 1.1.0) is now over three years old. It can be however obtained and installed …

More

Speak Like a doctor: Basic NPL in R

I am now aiming at the Capstone project in the Coursera’s Data Science Specialization from Johns Hopkins, to finish the Specialization. The project is focused on predicting the next word that somebody is going to type, based on several databases to be used to build up the prediction algorithm. There is a lot of previous …

More

Tutorial Review: How to Build a Text Mining, Machine Learning Document Classification System in R!

This tutorial by Tim D’Auria on Youtube, is shorter than 30 minutes. Without pretending too much background it gives you the basic tools and knowledge to build a basic document classification system. The classifier uses a simple KNN  classification algorithm and text mining techniques to learn to distinguish the candidate who pronounced the speeches of …

More

A bit of rest…

Here it comes, summertime. Last year I was particularly active with this Blog and my quest to learn as much as possible about Data Science, Python, R etc. Many MOOCs and books later, and with only the Capstone project that separates me from achieving the Data Science Specialization, I have the need to slow down …

More

NLP – Natural Language Processing

The Coursera JH Data Science Specialization closes with a Capstone Project based on Natural Language Processing. This course is in the references and its lessons are  still available for preview but only until the 30 June clicking on the following URL: https://class.coursera.org/nlp/lecture. The lessons in PDF format are still available from Dan Jurafsky at the …

More

Happy First Birthday!

I almost did not notice, but last 15th May was my site’s become one year old! Wow! In one year, this is the basic traffic data: Number of visitors as of now : 3730 Number of visits: 26085 My free hosting profile does not allow me to know from where you visit, I choose to …

More

GoogleVis and R – Tutorial

During the first week of the “Developing Data Products” MOOC on Coursera, one of the lessons deals with GoogleVis. This is one of the ways to publish and animate your R charts. In practice GoogleVis provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without …

More