This JH Data Science capstone project is transforming into a nightmare, especially because of R and tm, which do not do what they are supposed to do. True I have changed architecture and PC in the middle, but this is not the problem. R and R packages like tm evolve fast and sometimes too fast to be coordinated with each other. Furthermore, R seems to be affected too much by the size of the objects you can feed to it, and it can become extremely picky about the contents of the text objects that you submit to the transformation functions. Things that work with a size of 10000 start misbehaving at about 100000, even if all the memory allocation is well inferior to the 4G addressable by R. A number of examples that can be found on the net are academics and work perfectly when pasted as-is in Rstudio. However, as soon as these are applied to real-life everything breaks out and you have to start looking for remedies on the net. The result is that you cannot make the things flow from mind to code as it normally should be and you are slowed down by the tool so much that it becomes a real pain. And not always there are answers on stackoverflow! My next Data Science courses will definitely be one where they use something else than R!