Group by and Summarize: R and python
One of the most frequently used step to analyze data is by grouping data using one or more variable and then summarizing using functions like count, distinct count, sum, mean etc. It can be done using R or python in a few lines of codes. In this blog I will create a data frame and summarize it in R and python both.
Using R: The data frame is created and summarized using the code below
The data is summarized using the dplyr package. The functions used in summarise is as under:
- n_distinct() — to get distinct count
- n() — to get count
- sum() — to get sum
The summarized result is:
Using python: The data frame is created and summarized using the code below
The data is summarised using pd.NamedAgg function from pandas package. The column to be summarised is selected and then aggregate function is specified. The aggregate functions used are as follows:
- count — to get count
- lambda x: x.nunique() — to get distinct count
- sum — to get sum
The summarized result is:
It can be observed that in R dplyr package can be used to summarise and create new columns using the summarised columns at the same place.
In python, summarisation is done using one step and then new columns are created.
Please share the blog if you like it.