DataComPy¶
DataComPy is a package to compare two DataFrames (or tables) such as Pandas, Spark, Polars, and
even Snowflake. Originally it was created to be something of a replacement
for SAS’s PROC COMPARE for Pandas DataFrames with some more functionality than
just Pandas.DataFrame.equals(Pandas.DataFrame) (in that it prints out some stats,
and lets you tweak how accurate matches have to be). Supported types include:
Pandas
Polars
Spark
Snowflake
[!IMPORTANT] datacompy has released
v1. Thev0.19.xline is no longer supported — users should upgrade tov1going forward. Thesupport/0.19.xbranch is archived and will only receive critical security fixes on a best-effort basis; no new features or regular maintenance will be provided. All active development targets thev1branches (developandmain).
Quick Installation¶
pip install datacompy
or
conda install datacompy
Installing extras¶
If you would like to use Spark or any other backends please make sure you install via extras:
pip install datacompy[spark]
pip install datacompy[snowflake]
Supported backends¶
Pandas: (See documentation)
Spark: (See documentation)
Polars: (See documentation)
Snowflake/Snowpark: (See documentation)
Contributors¶
We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to sign the Contributor License Agreement (CLA).
This project adheres to the Open Source Code of Conduct. By participating, you are expected to honor this code.
Roadmap¶
Roadmap details can be found here