Python: Add a column quickly
Adding a column is an integral part of any Analytics/ Data Science project. A new column can be added in any of the following cases:
- Adding a column using a single column
- Adding a column using multiple columns
- Adding a column after group by
There are many ways to achieve the same in python. I will discuss a few cases using the following:
- comparisons
- apply and lambda function (simple, if else and nested if else)
- map with dictionary and map with lambda function
- np.where
- np.select
- row function
- group by and transform
Loading the libraries
import pandas as pdimport numpy as np
Creating a sample dataframe
df
1. Adding a column using one column
# Adding a flag column using one columndf['one_col_flag'] = df['type'] == 'AA'df
Using np.where
It is similar to if else:
- first argument is condition,
- second is value if condition is True and
- third is value when condition is False
df['np_where'] = np.where(df.value > 10, df.value*2, df.value**2)df
Using Lambda Function to add a column
Learn more about the lambda function from the documentation
# Adding a new column using lambda functiondf['value_lambda'] = df['value'].apply(lambda x: x**2)df
# Adding a new column using if else in lambda functiondf['value_lambda2'] = df['value'].apply(lambda x: x*2 if x >= 20 else x**2)df
# Adding a new column using multiple if else in lambda functiondf['value_lambda3'] = df['value'].apply(lambda x: x*10 if x >= 20 else (x**2 if x >10 else x))df
Using map function
# Using map to map all values in a column to another value using a dictionarydf['map_dict'] = df['value'].map({10:'ten', 20:'twenty'})df
# Using map and lambda functiondf['map_lambda'] = df['value'].map(lambda x: x*10 if x >= 20 else (x**2 if x >10 else x))df
Dropping the previously created columns
# Dropping all the previously created columndf.drop(['one_col_flag', 'np_where', 'value_lambda', 'value_lambda2', 'value_lambda3', 'map_dict', 'map_lambda'], axis = 1, inplace = True)
2. Adding a column using two or more columns
# Adding a flag column using two columndf['two_col_flag'] = (df['type'] == 'AA') & (df['value'] >=20)df
# Adding a flag column using two column and OR conditiondf['two_col_flag2'] = ((df['type'] == 'AA') | (df['value'] <20))*1df
Using np.where on two columns
df['np_where2'] = np.where((df.value > 10)&(df.type=='AA'), 1, df.value**2)df
Using np.select
np.select takes two list as first two arguments
- conditions list — A list with conditions where conditions using one or multiple columns can be added
- choice list — A list with choices for the conditions mentioned in the conditions list
- default — Default value for all other cases
conditions = [(df['type'] == 'AA') & (df['value'] > 10), (df['type'] == 'BB') & (df['value'] >= 20)]choice = [1,2]df['np_select'] = np.select(conditions, choice, default = -1)df
Using apply and row function
It can be used in all the cases and almost any complex column addition can be achieved using row function
Defining a sample row function
def type_event(row): if (row['type'] == 'AA') & (row['value'] > 10):
val = 1
elif (row['type'] == 'BB') & (row['value'] >= 20):
val = 2
else:
val = 0 return val
Using apply to apply row function type_event
df['row_fun'] = df.apply(type_event, axis = 1)df
Additional: Adding a column after groupby
Transform is used with function sum to add a column with sum of values for each type group
# sum can be replaced with min, maxdf['value_sum'] = df.groupby(['type'])['value'].transform(sum)df.sort_values(by='type')
I have tried to cover all the cases that I have encountered during my projects.
Please share the blog if you like it. Please follow as it motivates me to write more.