Group by and Summarize: R and python

Jyoti Kumar
2 min readFeb 24, 2022

One of the most frequently used step to analyze data is by grouping data using one or more variable and then summarizing using functions like count, distinct count, sum, mean etc. It can be done using R or python in a few lines of codes. In this blog I will create a data frame and summarize it in R and python both.

Using R: The data frame is created and summarized using the code below

R code to create data frame and summarize

The data is summarized using the dplyr package. The functions used in summarise is as under:
- n_distinct() — to get distinct count
- n() — to get count
- sum() — to get sum

The summarized result is:

R Output

Using python: The data frame is created and summarized using the code below

The data is summarised using pd.NamedAgg function from pandas package. The column to be summarised is selected and then aggregate function is specified. The aggregate functions used are as follows:
- count — to get count
- lambda x: x.nunique() — to get distinct count
- sum — to get sum

The summarized result is:

python Output

It can be observed that in R dplyr package can be used to summarise and create new columns using the summarised columns at the same place.

In python, summarisation is done using one step and then new columns are created.

Please share the blog if you like it.

--

--

Jyoti Kumar

I have experience in Predictive Modelling and Dashboards. I have rich working experience on various tools and software like Python, R, Tableau, Power BI and SQL