Python-Pandas cheat sheet: 30 functions-methods

4 min readAug 5, 2022

In this blog, I have mentioned 30 useful python-pandas functions/methods used frequently in data analysis. The following activities can be done using the functions/methods mentioned in the blog:

Loading and understanding data
Selecting columns/rows or both
Filtering data using one or multiple columns
Sorting data and dropping a column
Summarizing data

The data used is Airlines.csv from Kaggle.

We will start by loading the pandas package

import pandas as pd

The functions/methods are as follows:

Loading a csv file

# 1. Loading the data from a csv filedf = pd.read_csv("Airlines.csv")

2. Getting the count of rows and columns in the dataframe

# 2. Shape of a dataframedf.shape

3. Getting first/last few rows of the dataframe

# 3. Head and Tail of the data framedf.head(n=10)df.tail(n=10)

4. Getting data types of the columns to understand if the columns are loaded properly with correct types

# 4. data types of the columnsdf.dtypes

5. Getting the column names as list

# 5. Getting column namesdf.columns.tolist()

6. Getting the summary stats for numeric columns

# 6. Summary statsdf.describe()

7. Getting the count of NA values in the columns

# 7. Checking NA values in columnsdf.isna().sum()

8. Selecting columns with data type as object/int64

# 8. Selecting columns with data type as objectdf.select_dtypes(include = 'object').columns# Selecting columns with data type as int64df.select_dtypes(include = 'int64').columns

9. Getting count of values in a column

# 9. Getting value counts from the columnsdf['Airline'].value_counts(ascending=True)

10. Getting unique values in a column

# 10. Getting unique names of values in a columndf['Airline'].unique()

11. Selecting a few columns

# 11. Select a few columns from dfdf[['id', 'Airline', 'Flight']]

12. Selecting a few rows using .iloc

# 12. Select a few rowsdf.iloc[:10,]

13. Selecting a few rows and columns together using .loc

# 13. Select a few rows and columnsdf.loc[:5, ['id', 'Airline', 'Flight']]

14. Filtering the data using a column

# 14. Filter the data using a columndf[df['Airline'] == 'US']

15. Filtering the data using multiple columns

# 15. Filter the data using multiple columnsdf[(df['Airline'] == 'US') & (df['AirportFrom'] == 'PHX') & (df['DayOfWeek'] == 1)]

16. Filtering the data using OR conditions

# 16. Filter data using OR conditionsdf[(df['Airline'] == 'US') | (df['AirportFrom'] == 'PHX')]

17. Filtering the data using a list

# 17. Filter data using a listairline_list = ['DL','US']df[df['Airline'].isin(airline_list)]

18. Filtering for the data not in the list

# 18. Filter data not in listairline_list = ['DL','US']df[~df['Airline'].isin(airline_list)]

19. Sorting the data using a column

# 19. Sort the datadf.sort_values(by='Airline',ascending=False)

20. Rename columns

# 20. Rename a columndf.rename(columns={"Airline": "Airline_Code", "AirportFrom":"Airport_From"})

21. Summarise data using groupby

# 21. Summarise using groupbydf.groupby(['Airline','AirportFrom','AirportTo'], as_index=False)['id'].agg('count')

22. Summarise and sort

# 22. Summarise and sortdf_summ = df.groupby(['Airline','AirportFrom','AirportTo'], as_index=False)['id'].agg('count')df_summ.sort_values(by='id', ascending = False)

23. Summarise for multiple values

# 23. Summarise for multiple valuesdf.groupby(['Airline','AirportFrom','AirportTo'])['Time'].agg(['sum','count']).reset_index()

24. Summarise for multiple columns and values

# 24. Summarise for multiple columns and valuesdf.groupby(['Airline','AirportFrom','AirportTo']).aggregate({'id':'count','Time':'sum'}).reset_index()

groupby multiple columns and values ouput

25. Adding a new column

# 25. Adding a new columndf['Country'] = 'USA'

26. Adding a column using existing columns

# 26. Adding a column using existing columnsdf['CO_SFO'] = (df['Airline'] =='CO') & (df['AirportFrom'] == 'SFO')

27. Dropping a column

# 27. Dropping a Columndf.drop(['CO_SFO'], axis = 1)

28. Summarise using pivot_table with aggregation

# 28. Summarise Using pivot_tabledf.pivot_table(index = ['Airline','AirportFrom','AirportTo'],values = ['Time'], aggfunc=['sum','count']).reset_index(col_level=1)

29. Pivot data using pivot_table where unique rows are not available for index columns

# 29. Pivot data using pivot_table when unique rows are not available for index columnsdf.pivot_table(index = 'Airline', columns='DayOfWeek', values='id',aggfunc='count', fill_value = 0).reset_index()

30. Summarise using groupby and pivot

# 30. Summarise using groupby and pivotdf1 = df.groupby(['Airline','DayOfWeek'], as_index=False)['id'].agg('count').reset_index()df1.pivot(index = ['Airline'], columns = 'DayOfWeek', values = 'id').reset_index()

The corresponding notebook is published at JetBrains Datalore: A powerful environment for Jupyter notebooks.

Please share the blog if you like it.

Python-Pandas cheat sheet: 30 functions-methods

Written by Jyoti Kumar

Responses (2)