R for Querying Database — Tricks — SQL not required — Part III

Jyoti Kumar
3 min readFeb 23, 2020

In the previous posts R for Querying Database — SQL not required — Part I and R for Querying Database -Tricks — SQL not required — Part II, I have discussed how R can be used to query a MySQL database and some tricks to write a complex query.

In this blog I will discuss some more tips that will help in writing queries. I will cover the following sub topics in this blog.

4. Adding columns from date column

5. Filtering rows using %like%

6. Adding columns using %like%

Database: The database is financial and is available for download using MySQL Workbench

Requirements — Familiarity with the dplyr

Trick 4: Adding columns from date column

A column for month or year can be added from date column in a database. In the code below I will add a date and a year column to the table while querying and then I will filter using these columns.
Let us assume that account_ids are required for the the accounts that have done transactions of amount ≥ 40000 in Nov, 1996 and amount ≥ 60000 in Nov, 1997.

The result can be obtained by adding month and year column while querying and using semi join for two data frames for the period Nov, 1996 and Nov, 1997.

The R code for the query is as under:

d1 <- trans %>% 
# Filter for amount >= 40000
filter(amount >= 40000) %>%
# Creating month and year columns
mutate(month = month(date),
year = year(date)) %>%
# Filter for Nov, 1996
filter(month == 11 & year == 1996) %>%
distinct(account_id) %>%
# Semi join to select account ids from Nov, 1996 that are present in Nov, 1997
semi_join(trans %>%
# Filter for amount >= 60000
filter(amount >= 60000) %>%
# Creating month and year columns
mutate(month = month(date),
year = year(date)) %>%
# Filter for Nov, 1997
filter(month == 11 & year == 1997) %>%
distinct(account_id), by = “account_id”) %>%
collect()

The result should have 6 account_ids 390, 2219, 2930, 3038, 8533, 9833

Trick 5: Filtering rows using %like%

Rows can be filtered while querying using %like% on a column with character values. In the code below I will filter rows using %like% on operation column.
Let us assume that a count of records for each year for operation = VYBER or VYBER KARTOU is required.

The result can be obtained by using filter to filter rows where operation column has values that start with VYBER. Count is used as shortcut to mutate and count for each combination of year and operation.

The R code for the query is as under:

d2 <- trans %>% 
# Filter rows that start with VYBER
# to get rows for operation = VYBER or VYBER KARTOU
filter(operation %like% ‘VYBER%’) %>%
# Using count to mutate and count
count(year = year(date), operation) %>%
collect()

The result should have a data frame with 11 rows and 3 columns.

Trick 6: Adding columns using %like%

Columns can be added while querying using %like% with ifelse on a column with character values. In the code below I will add column using %like% with ifelse.
The result for the count in Trick 5 is obtained using the mutate, filter and count function here.

The result can be obtained by using mutate to add a column vyber with value 1 for rows where operation starts with VYBER. Rows with vyber == 1 is filtered and count is used to get the final result.

The R code for the query is as under:

d3 <- trans %>% 
# Mutate column vyber
# for rows where operation = VYBER or VYBER KARTOU
mutate(vyber = ifelse(operation %like% ‘VYBER%’, 1, 0)) %>%
# Filter vyber == 1
filter(vyber == 1) %>%
# Using count to mutate and count
count(year = year(date), operation) %>%
collect()

The result above will be same as d2.

In the next part of the blog I will discuss pros of using dplyr and R for querying.

Also, if you are familiar with SQL please write the SQL versions of the query as comment to the post below.

--

--

Jyoti Kumar

I have experience in Predictive Modelling and Dashboards. I have rich working experience on various tools and software like Python, R, Tableau, Power BI and SQL