Scraping a Wordcloud with Python

This article has the objective of scraping a web site with the purpose of generating a wordcloud out of its text. The wordcould visualization of text is something that is becoming more and more popular, and you find it more and more  also in common TV shows, where some presenters have had the idea to …

More

TF-IDF

TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a method to find out how important a term or a set of terms is in a collection of documents or, as defined in wikipedia, in a “corpus”. I have first met it in the MOOC Text Retrieval and Search Engines,  but I have also retrieved …

More

How to begin…

This is a lucky time in the history of the human kind. Sure there are so many bad things out there but there are tons of opportunities as well.  The web is full of resources about Data Mining, Search Engines, Retrieval Systems and the likes. However, for a total newbie I would recommend to take …

More