Logical Joins in R

Jyoti Kumar
3 min readAug 1, 2021

--

Joins are an essential part of analytics. Joins can be achieved in R using many ways. I use dplyr mostly for joining data frames in R. It serves the purpose all the time.

Recently, I came across a SQL question where the solution was obtained using join with an where clause. I tried the same using R by creating two data frame and I was not successful using dpyr.

Campaign Table

Data: There is a campaign table where start and end date of each campaign is stored in different columns.

Question: List the dates from 1st Jan, 2020 to 31st May, 2020 when there were no campaigns.

The SQL server code to solve this problem is as under:

--- Declaring start and end date
declare @startdate date = '2020-01-01',
@enddate date = '2020-05-31';
--- Recursive CTE to generate a table with dates from start and end --- datewith datetab as
(
select @startdate d
union all
select DATEADD(day, 1, d)
from datetab
where dateadd(Day,1,d) <= @enddate
)
--- Query to get the resultselect datetab.d as non_camp, c.camp_namefrom datetableft join Campaign c
--- Logical Join
on datetab.d between c.start_d and c.end_dWHERE camp_name is null;

The result will be like this

non_camp camp_name2020-01-01 NULL2020-01-02 NULL2020-01-03 NULL2020-01-04 NULL2020-01-05 NULL2020-01-06 NULL2020-01-07 NULL2020-01-08 NULL2020-01-09 NULL2020-01-10 NULL2020-01-11 NULL2020-01-12 NULL2020-01-13 NULL2020-01-14 NULL2020-01-26 NULL2020-01-27 NULL2020-01-28 NULL2020-01-29 NULL...

To achieve the same in R, Let’s create the tables first

Campaign table is created using the code below

campaign <- data.frame(camp_name = c("A", "B", "C"), 
start_d = as.Date(c("2020/1/15", "2020/2/10", "2020/03/11")),
end_d = as.Date(c("2020/1/25", "2020/2/11", "2020/03/21")))

The dates table is created as under with dates from 1st Jan, 2020 to 31st May, 2020

period <- seq(as.Date("2020/1/1"), as.Date("2020/5/31"), "days")period_df <- data.frame(d = period)

Let’s try dplyr package to do logical joins

date_camp <- period_df %>% 
left_join(campaign, by = c("d" >= "start_d", "d" <= "end_d"))

The code above does not work and throws the error below

Error: `by` must be a (named) character vector, list, or NULL, not a logical vector. Run `rlang::last_error()` to see where the error occurred.

Let’s try with data.table package now

date_camp_DT <- setDT(campaign)[period_df, on = .(start_d <= d, end_d >= d)][is.na(camp_name),]

The screenshot of the result is attached.

List of dates with no campaign

For normal joins dplyr is good. For logical joins data.table does the work.

If there is a better solution in R to the problem please share it in comments.

--

--

Jyoti Kumar
Jyoti Kumar

Written by Jyoti Kumar

I have experience in Predictive Modelling and Dashboards. YouTube : LearnToPlayWithData (https://www.youtube.com/channel/UChAOvvDYCu7zHcQq4O3Ov3Q)

Responses (1)