Python-Pandas cheat sheet: 30 functions-methods

In this blog, I have mentioned 30 useful python-pandas functions/methods used frequently in data analysis. The following activities can be done using the functions/methods mentioned in the blog:

  1. Loading and understanding data
  2. Selecting columns/rows or both
  3. Filtering data using one or multiple columns
  4. Sorting data and dropping a column
  5. Summarizing data

The data used is Airlines.csv from Kaggle.

We will start by loading the pandas package

import pandas as pd

The functions/methods are as follows:

  1. Loading a csv file
# 1. Loading the data from a csv filedf = pd.read_csv("Airlines.csv")

2. Getting the count of rows and columns in the dataframe

# 2. Shape of a dataframedf.shape

3. Getting first/last few rows of the dataframe

# 3. Head and Tail of the data framedf.head(n=10)df.tail(n=10)

4. Getting data types of the columns to understand if the columns are loaded properly with correct types

# 4. data types of the columnsdf.dtypes

5. Getting the column names as list

# 5. Getting column namesdf.columns.tolist()

6. Getting the summary stats for numeric columns

# 6. Summary statsdf.describe()
describe output

7. Getting the count of NA values in the columns

# 7. Checking NA values in columnsdf.isna().sum()
isna().sum() output

8. Selecting columns with data type as object/int64

# 8. Selecting columns with data type as objectdf.select_dtypes(include = 'object').columns# Selecting columns with data type as int64df.select_dtypes(include = 'int64').columns

9. Getting count of values in a column

# 9. Getting value counts from the columnsdf['Airline'].value_counts(ascending=True)

10. Getting unique values in a column

# 10. Getting unique names of values in a columndf['Airline'].unique()

11. Selecting a few columns

# 11. Select a few columns from dfdf[['id', 'Airline', 'Flight']]

12. Selecting a few rows using .iloc

# 12. Select a few rowsdf.iloc[:10,]

13. Selecting a few rows and columns together using .loc

# 13. Select a few rows and columnsdf.loc[:5, ['id', 'Airline', 'Flight']]

14. Filtering the data using a column

# 14. Filter the data using a columndf[df['Airline'] == 'US']

15. Filtering the data using multiple columns

# 15. Filter the data using multiple columnsdf[(df['Airline'] == 'US') & (df['AirportFrom'] == 'PHX') & (df['DayOfWeek'] == 1)]

16. Filtering the data using OR conditions

# 16. Filter data using OR conditionsdf[(df['Airline'] == 'US') | (df['AirportFrom'] == 'PHX')]

17. Filtering the data using a list

# 17. Filter data using a listairline_list = ['DL','US']df[df['Airline'].isin(airline_list)]

18. Filtering for the data not in the list

# 18. Filter data not in listairline_list = ['DL','US']df[~df['Airline'].isin(airline_list)]

19. Sorting the data using a column

# 19. Sort the datadf.sort_values(by='Airline',ascending=False)

20. Rename columns

# 20. Rename a columndf.rename(columns={"Airline": "Airline_Code", "AirportFrom":"Airport_From"})

21. Summarise data using groupby

# 21. Summarise using groupbydf.groupby(['Airline','AirportFrom','AirportTo'], as_index=False)['id'].agg('count')
groupby output

22. Summarise and sort

# 22. Summarise and sortdf_summ = df.groupby(['Airline','AirportFrom','AirportTo'], as_index=False)['id'].agg('count')df_summ.sort_values(by='id', ascending = False)
groupby and sort output

23. Summarise for multiple values

# 23. Summarise for multiple valuesdf.groupby(['Airline','AirportFrom','AirportTo'])['Time'].agg(['sum','count']).reset_index()
groupby multiple values output

24. Summarise for multiple columns and values

# 24. Summarise for multiple columns and valuesdf.groupby(['Airline','AirportFrom','AirportTo']).aggregate({'id':'count','Time':'sum'}).reset_index()
groupby multiple columns and values ouput

25. Adding a new column

# 25. Adding a new columndf['Country'] = 'USA'

26. Adding a column using existing columns

# 26. Adding a column using existing columnsdf['CO_SFO'] = (df['Airline'] =='CO') & (df['AirportFrom'] == 'SFO')

27. Dropping a column

# 27. Dropping a Columndf.drop(['CO_SFO'], axis = 1)

28. Summarise using pivot_table with aggregation

# 28. Summarise Using pivot_tabledf.pivot_table(index = ['Airline','AirportFrom','AirportTo'],values = ['Time'], aggfunc=['sum','count']).reset_index(col_level=1)
pivot_table to summarise output

29. Pivot data using pivot_table where unique rows are not available for index columns

# 29. Pivot data using pivot_table when unique rows are not available for index columnsdf.pivot_table(index = 'Airline', columns='DayOfWeek', values='id',aggfunc='count', fill_value = 0).reset_index()
pivot_table to pivot output

30. Summarise using groupby and pivot

# 30. Summarise using groupby and pivotdf1 = df.groupby(['Airline','DayOfWeek'], as_index=False)['id'].agg('count').reset_index()df1.pivot(index = ['Airline'], columns = 'DayOfWeek', values = 'id').reset_index()
groupby and pivot output

The corresponding notebook is published at JetBrains Datalore: A powerful environment for Jupyter notebooks.

Please share the blog if you like it.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jyoti Kumar

Jyoti Kumar

I have experience in Predictive Modelling and Dashboards. I have rich working experience on various tools and software like Python, R, Tableau, Power BI and SQL