Python Seaborn for plots

Jyoti Kumar
4 min readAug 7, 2022

In this blog, I will discuss frequently used plots in exploratory data analysis such as scatterplot, bar plot, box plot and histogram using Seaborn library.

Dataset — Brain Stroke Dataset from Kaggle

Resource — Seaborn Documentation

First, we will import the libraries

import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt

Loading the data

df = pd.read_csv("brain_stroke.csv")df.head(n=10)
Raw data

1. Creating a scatterplot

  • x : variable on the x-axis
  • y : variable on the y-axis
  • data : dataframe
  • hue : variable to be used for color
  • style : variable to be used for marker style
  • legend : control legend position
sns.scatterplot(x = ‘age’, y = ‘avg_glucose_level’, data = df, color = ‘magenta’)
Scatterplot

Scatterplot with hue, style and legend adjustment

sns.scatterplot(x = 'age', y = 'avg_glucose_level', data = df, alpha = 0.2, hue='gender', style='work_type')plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)
Scatterplot with hue, style

2. Creating a barplot

  • x : categorical variable
  • y : numeric value (mean is calculated by default)
  • data : dataframe
  • color : bar color
  • hue : variable to be used for color
sns.barplot(x = 'work_type', y = 'bmi', data = df, color = 'skyblue')
barplot

Horizontal barplot with x and y axis changed and orient = ‘h

sns.barplot(x = 'bmi', y = 'work_type', data = df, color = 'blue', orient = 'h')
Horizontal barplot

Barplot with variable as color using hue

sns.barplot(x = 'work_type', y = 'bmi', data = df, hue = 'gender')
Barplot with hue

Barplot with label and title

  • plt.xlabel — text for xlable
  • plt.ylable — text for y lable
  • plt.title — text for title
sns.barplot(x = 'gender', y = 'hypertension', data = df, hue = 'smoking_status')plt.legend(bbox_to_anchor=(1.4, 1), loc='upper right', title='Smoking Status')plt.xlabel('Gender')plt.title('Hypertension % in Gender by Smoking Status')
Barplot with label and title

Plotting barplot with summary other than mean

  • Summarise the data using groupby
  • plot the sumamrised data
df_summ = df.groupby(['work_type', 'gender'], as_index=False)['avg_glucose_level'].agg('sum').sort_values(by='avg_glucose_level', ascending = False)sns.barplot(x = 'work_type', y = 'avg_glucose_level', data = df_summ, hue = 'gender')plt.title('Sum of avg_glucose_level by work_type and gender')
Barplot usingsummarised data

3. Creating a box plot

  • x : categorical variable
  • y : numeric variable
  • kind : ‘box’, ‘swarm’ etc.
  • data : dataframe
sns.catplot(x='hypertension', y='avg_glucose_level', kind = 'box', data=df)plt.title('Distribution of avg_glucose_level by hypertension')
Boxplot

Box plot with facets

  • col : facets in columns
  • row : facets in rows
sns.catplot(x='hypertension', y='avg_glucose_level', kind = 'box', data=df, col = 'stroke', row = 'gender')
Boxplot with facet

4. Histogram

  • sns.histplot(data = df, x = ‘variable’, bins = n)
  • sns.displot(data = df, x = ‘variable’, kind = ‘hist’, bins = n)
  • hue : variable for color
  • binwidth : width of the bin (either bins or binwidth is to be used)
sns.histplot(data = df, x = 'avg_glucose_level', bins=20)
Histogram

Histogram using sns.displot

sns.displot(data = df, x = 'avg_glucose_level', kind = 'hist', bins = 20)
Histogram using sns.displot

Histogram with variable as color using hue

sns.displot(data = df, x = 'avg_glucose_level', kind = 'hist', hue = 'stroke', alpha = 0.3)
Histogram with hue

5. Density plot

  • sns.kdeplot(data = df, x = ‘variable’)
  • sns.displot(data = df, x = ‘variable’, kind = ‘kde’)
  • hue : variable for color
sns.displot(data = df, x = 'avg_glucose_level', kind = 'kde', hue='gender', alpha = 0.5)
Density plot

6. Jointplot

Scatterplot and histogram both in the same plot

sns.jointplot(data = df, x = 'avg_glucose_level', y = 'age')
Jointplot

7. Pairplot

Scatterplots and individual distribution for all the numerical columns in the same plot

sns.pairplot(data = df)
Pairplot

The notebook can be downloaded from Python-seaborn (Static report) (jetbrains.com) and data from Brain Stroke Dataset from Kaggle to recreate the plots.

Please share the blog if you like it.

--

--

Jyoti Kumar

I have experience in Predictive Modelling and Dashboards. I have rich working experience on various tools and software like Python, R, Tableau, Power BI and SQL