Why Querying Using R Studio

Jyoti Kumar
2 min readMar 27, 2020

I have written three posts with some tricks to use R Studio while querying a database.

  1. R for Querying Database — SQL not required — Part I
  2. R for Querying Database — Tricks — SQL not required — Part II
  3. R for Querying Database — Tricks — SQL not required — Part III

But why should we use R Studio to query? Are there any specific benefits to use R for querying a database. I can think of a few reasons as I use it regularly for my work.

Pros of R Studio for querying a database

  1. No need to learn SQL — dplyr skills can be used to query a database. Same five verbs can be used to query a database.
  2. There is no need to install different querying tools for different type of database — I have connected MySQL, PostgreSQL, Redshift, SQLite using R Studio. The list of databases and dplyr support can be found here.
  3. Initial data exploration is easy — After connecting to a database any initial data exploration is easy.
    e.g. i) The name of all the tables in the database can be found using function like dbListTables, src_tbls etc. for different connections.
    ii) To get column names for a table a common function colnames() can be used.
    For PostgreSQL database
    Using R Studio — tablename %>% colnames() (After the table reference has been made)
    Using Other SQL tool— From StackExchange
    SELECT *
    FROM information_schema.columns
    WHERE table_schema = 'your_schema'
    AND table_name = 'your_table'
    ;
  4. Similar syntax for the database types I have encountered — The syntax to query database is similar across the types I have encountered.

5. Simple syntax — Same dplyr verb filter is used for both WHERE and HAVING.

e.g i) table %>% filter(column ≥ 100) (WHERE is used in SQL)
ii) table %>% group_by(column1) %>%
summarise(col2_sum = sum(column2)) %>%
filter(col2_sum ≥ 100) (HAVING is used in SQL)

6. Calculated column names can be used in next line of the query — While querying calculated columns can be used in next line to calculate a new column.

e.g. table %>% mutate(calcol1 = col1 + col2) %>% 
mutate(calcol2 = calcol1*2)

In the code above, calcol1 is used to calculate next column calcol2. To achieve the same in SQL following code is required.

SELECT col1+col2 as calcol1, (col1+col2)*2 as calcol2 
FROM table

In the code above, to calculate calcol2 the expression to calculate calcol1 is again written.

7. Additional analysis on the data can be done using other libraries in R e.g Regression, Machine Learning, Shiny Apps etc.

Cons of R Studio for querying a database
1. In interviews SQL will be asked even if you can query a database using R Studio
2. Advanced SQL query using window functions may not be possible to replicate using R Studio.

I prefer R Studio for querying and it serves my purpose. After collecting data from a query I can do other transformations as well required for analysis, plotting and modelling.

--

--

Jyoti Kumar

I have experience in Predictive Modelling and Dashboards. I have rich working experience on various tools and software like Python, R, Tableau, Power BI and SQL