Ok, this is not Data Science. This is just for fun. I have to admit, I tend to mix up things. Here I am mixing up the German class I am taking with one of the visualizations that I prefer, the wordcloud. Not because it conveys a message about data, in this case is just for the mere aesthetics, for the fact that it looks nice. I just needed a cover for my essay on Marlene Dietrich. (However, there seem to be businesses that are built on this visualization nowadays).
So, this is how I built a Marlene Dietrich wordcloud, based on the text of two of her most famous songs (“Lili Marleen” and “Sag mir wo die Blumen sind”).
I used the concepts that I explained in two old articles that you can find on this site:
In the first one I scraped this site, and gave a shape to the wordcloud with a mask, then used a custom font thanks to the excellent webcloud generator by Amueller. In the second post I actually used a text file, a custom mask and a manipulation of the stop words to adapt to an ancient language. This time the language is current German, so I do not manipulate the stop words at all, but I use the Gimp to merge two layers consisting of the wordcloud and of the original styled image that I used to build the mask. A couple of final touches with Gimp to add the title of Marlene’s biography “Ich bin, Gott sey Dank, eine Berlinerin”
The styled portrait as found on the Internet
The mask used to generate the wordcloud
The code used to read the wordcloud text from a file where the text of the two songs have been merged (sequencially):
1 2 3 4 5 6 7 8 9 10 |
from pytagcloud import create_tag_image, make_tags from pytagcloud.lang.counter import get_tag_counts words=' ' count =0 f = open('dietrich.txt', 'r') for line in f: words=words= words + line f.close print "Done joining.." |
This second bit of code actually generates the first version of the wordcloud.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
from wordcloud import WordCloud import matplotlib.pyplot as plt from scipy.misc import imread from stop_words import get_stop_words stop_words = get_stop_words('de') %matplotlib inline dante_mask = imread("MarleneDietrich.png") wordcloud = WordCloud( font_path='CabinSketch-Bold.ttf', stopwords=stop_words, background_color='black', mask=dante_mask, max_words=500, width=670, height=1000 ).generate(words) plt.imshow(wordcloud) plt.axis('off') plt.savefig('./wordcloud_1.png', dpi=300) |
This version looks as follows:
As I explained before, I did the rest with GIMP.