Working with SQLite in R

In the words of its creators, SQLite is a self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine. And apparently is the most used in the world.

Libraries exist for interfacing R with SLQLite, the minimum requirement being DBI (A Common Database Interface) and RSQLite (SQLite interface for R).

The keyword here is “embedded”. You do not need any other external library or program to access your data. The library is the database engine and that becomes part of your program.

Database Creation

I will not enter in the discussion of why in some cases we would prefer a database instead of a R save files, this is beyond the scope of this article.  I will also not discuss the theory of relational databases, there is plenty of literature about that. What I would like to show instead is a specific use case, using R. In particular, I would like to focus on the simple tasks of Selecting data, Inserting or updating data. To create my database, I use a tool, the excellent and free DB Browser for SQLite, which not only allows you to create databases, tables and indexes but also allows to save the database “schema” in a text file, like for instance the following, which is taken from my Data Science Capstone project.

Another way of creating a RSQLite table is to write a R dataframe to a RSQLite table. An operation that looks like this :

This is taken directly from the DBI package initial manual page.

Disadvantages:

  • No primary key defined
  • No indexes defined

In practice this approach works, but for any big size / real life application database, some indexes to optimize query speed are needed and some primary key should be defined, and I prefer to do this with an external tool. Here it is a screenshot of the tool, after the creation of the table with the dbWriteTable command. Notice the “Create Table” statement by the side of the table name.

Querying a database and inserting rows from R

The SQL command SELECT behaves as expected. As in other languages, you can build up text strings that you can then send to the database via the dbGetQuery command. Attention must be put on the quotes (single and double) especially when querying text fields. Examples:

Now, res is the return value from the database query and it is a list of lists (easily seen using typeof(res)). To know how many records you have received back from your query, the list can be interrogated in many ways. Here below they are resumed:

The insertion and update of a record into a SQLite database have been improved. Beyond the INSERT and the UPDATE commands, the REPLACE command (which did not exist in my database times), works as follows: If the record being updated exists, then is simply replaced, otherwise it is inserted. This is very useful when updating fields, you can use a single SQL command and either use update counts or simply the new values, so in the simplest cases you can use it as an INSERT command. Here a few examples of how you can build a query. Notice that the queries can be tested for correctness with the DB Browser for SQLite tool before implementation in your script.

And that is all. As usual, very basic stuff, but I would like to say that the methodology is here the key, to know when to work from within R or RStudio and when to craft the characteristics of the database from outside with and external tool. If you want a bit more complex example of a shiny application entirely based on this technique (and not more than what I just explained), you can visit this page, which is the final submission for my Data Science Capstone project. The source code is available on the third tab of the application.