Pandas : compare – Checking differences between DataFrames

When comparing DataFrames, compare is here to help.

Imagine you have two different methods and you want to check the differences in results by comparing tables.

import pandas as pd
import numpy as np

# Lets create two dataframes

df1 = pd.DataFrame(np.array([[101, 102, 103], [201, 202, 203], [301, 302, 303]]), columns=['Value1', 'Value2', 'Value3'], index=['A1',"A2","A3"])

df1
df2 = pd.DataFrame(np.array([[101, 102.5, 103], [201, 202, 203], [301, 304, 303]]), columns=['Value1', 'Value2', 'Value3'],index=['A1',"A2","A3"])

df2

Now lets use compare:

df1.compare(df2)

We can see that only in Value2, A1 and A3 are different, being self = df1 and other = df2.

It is also possible to make use of keep_equal and keep_shape arguments to change the way results are displayed:

keep_shape - "If true, all rows and columns are kept. Otherwise, only the ones with different values are kept."

keep_equal - "If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs."


df1.compare(df2,keep_equal=True,keep_shape=True)
df1.compare(df2,keep_shape=True,keep_equal=False)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s