Python Seaborn for plots
In this blog, I will discuss frequently used plots in exploratory data analysis such as scatterplot, bar plot, box plot and histogram using Seaborn library.
Dataset — Brain Stroke Dataset from Kaggle
Resource — Seaborn Documentation
First, we will import the libraries
import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt
Loading the data
df = pd.read_csv("brain_stroke.csv")df.head(n=10)
1. Creating a scatterplot
- x : variable on the x-axis
- y : variable on the y-axis
- data : dataframe
- hue : variable to be used for color
- style : variable to be used for marker style
- legend : control legend position
sns.scatterplot(x = ‘age’, y = ‘avg_glucose_level’, data = df, color = ‘magenta’)
Scatterplot with hue, style and legend adjustment
sns.scatterplot(x = 'age', y = 'avg_glucose_level', data = df, alpha = 0.2, hue='gender', style='work_type')plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)
2. Creating a barplot
- x : categorical variable
- y : numeric value (mean is calculated by default)
- data : dataframe
- color : bar color
- hue : variable to be used for color
sns.barplot(x = 'work_type', y = 'bmi', data = df, color = 'skyblue')
Horizontal barplot with x and y axis changed and orient = ‘h
sns.barplot(x = 'bmi', y = 'work_type', data = df, color = 'blue', orient = 'h')
Barplot with variable as color using hue
sns.barplot(x = 'work_type', y = 'bmi', data = df, hue = 'gender')
Barplot with label and title
- plt.xlabel — text for xlable
- plt.ylable — text for y lable
- plt.title — text for title
sns.barplot(x = 'gender', y = 'hypertension', data = df, hue = 'smoking_status')plt.legend(bbox_to_anchor=(1.4, 1), loc='upper right', title='Smoking Status')plt.xlabel('Gender')plt.title('Hypertension % in Gender by Smoking Status')
Plotting barplot with summary other than mean
- Summarise the data using groupby
- plot the sumamrised data
df_summ = df.groupby(['work_type', 'gender'], as_index=False)['avg_glucose_level'].agg('sum').sort_values(by='avg_glucose_level', ascending = False)sns.barplot(x = 'work_type', y = 'avg_glucose_level', data = df_summ, hue = 'gender')plt.title('Sum of avg_glucose_level by work_type and gender')
3. Creating a box plot
- x : categorical variable
- y : numeric variable
- kind : ‘box’, ‘swarm’ etc.
- data : dataframe
sns.catplot(x='hypertension', y='avg_glucose_level', kind = 'box', data=df)plt.title('Distribution of avg_glucose_level by hypertension')
Box plot with facets
- col : facets in columns
- row : facets in rows
sns.catplot(x='hypertension', y='avg_glucose_level', kind = 'box', data=df, col = 'stroke', row = 'gender')
4. Histogram
- sns.histplot(data = df, x = ‘variable’, bins = n)
- sns.displot(data = df, x = ‘variable’, kind = ‘hist’, bins = n)
- hue : variable for color
- binwidth : width of the bin (either bins or binwidth is to be used)
sns.histplot(data = df, x = 'avg_glucose_level', bins=20)
Histogram using sns.displot
sns.displot(data = df, x = 'avg_glucose_level', kind = 'hist', bins = 20)
Histogram with variable as color using hue
sns.displot(data = df, x = 'avg_glucose_level', kind = 'hist', hue = 'stroke', alpha = 0.3)
5. Density plot
- sns.kdeplot(data = df, x = ‘variable’)
- sns.displot(data = df, x = ‘variable’, kind = ‘kde’)
- hue : variable for color
sns.displot(data = df, x = 'avg_glucose_level', kind = 'kde', hue='gender', alpha = 0.5)
6. Jointplot
Scatterplot and histogram both in the same plot
sns.jointplot(data = df, x = 'avg_glucose_level', y = 'age')
7. Pairplot
Scatterplots and individual distribution for all the numerical columns in the same plot
sns.pairplot(data = df)
The notebook can be downloaded from Python-seaborn (Static report) (jetbrains.com) and data from Brain Stroke Dataset from Kaggle to recreate the plots.
Please share the blog if you like it.