View this notebook on GitHub

Merge List of Profiles

This is an example of a new utils in the dataprofiler for distributed merging of profile objects. This assumes the user is providing a list of profile objects to the utils function for merging all the profiles together.

Imports

Let’s start by importing the necessary packages…

[ ]:
import os
import sys
import json

import pandas as pd
import tensorflow as tf

try:
    sys.path.insert(0, '..')
    import dataprofiler as dp
    from dataprofiler.profilers.profiler_utils import merge_profile_list
except ImportError:
    import dataprofiler as dp
    from dataprofiler.profilers.profiler_utils import merge_profile_list

# remove extra tf loggin
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

Setup the Data and Profiler

This section shows the basic example of the Data Profiler.

  1. Instantiate a Pandas dataframe with dummy data

  2. Pass the dataframe to the Profiler and instantiate two separate profilers in a list

[ ]:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

list_of_profiles = [dp.Profiler(df), dp.Profiler(df)]

Take a look at the list of profiles…

[ ]:
list_of_profiles

Run Merge on List of Profiles

Now let’s merge the list of profiles into a single_profile

[ ]:
single_profile = merge_profile_list(list_of_profiles=list_of_profiles)

And check out the .report on the single profile:

[ ]:
single_profile.report()