.. _install: Install ******* To install the full package from pypi: .. code-block:: console pip install DataProfiler[ml] If the ML requirements are too strict (say, you don't want to install tensorflow), you can install a slimmer package. The slimmer package disables the default sensitive data detection / entity recognition (labler) Install from pypi: .. code-block:: console pip install DataProfiler Snappy Installation =================== This is required to profile parquet/avro datasets MacOS with homebrew: .. code-block:: console brew install snappy Linux install: .. code-block:: console sudo apt-get -y install libsnappy-dev Build From Scratch ================== NOTE: Installation for python3 virtualenv install: .. code-block:: console python3 -m pip install virtualenv Setup virtual env: .. code-block:: console python3 -m virtualenv --python=python3 venv3 source venv3/bin/activate Install requirements: .. code-block:: console pip3 install -r requirements.txt Install labeler dependencies: .. code-block:: console pip3 install -r requirements-ml.txt Install via the repo -- Build setup.py and install locally: .. code-block:: console python3 setup.py sdist bdist bdist_wheel pip3 install dist/DataProfiler*-py3-none-any.whl If you see: .. code-block:: console ERROR: Double requirement given:dataprofiler==X.Y.Z from dataprofiler/dist/DataProfiler-X.Y.Z-py3-none-any.whl (already in dataprofiler==X2.Y2.Z2 from dataprofiler/dist/DataProfiler-X2.Y2.Z2-py3-none-any.whl, name='dataprofiler') This means that you have multiple versions of the DataProfiler distribution in the dist folder. To resolve, either remove the older one or delete the folder and rerun the steps above. Install via github: .. code-block:: console pip3 install git+https://github.com/capitalone/dataprofiler.git#egg=dataprofiler Testing ======= For testing, install test requirements: .. code-block:: console pip3 install -r requirements-test.txt To run all unit tests, use: .. code-block:: console DATAPROFILER_SEED=0 python3 -m unittest discover -p "test*.py" To run file of unit tests, use form: .. code-block:: console DATAPROFILER_SEED=0 python3 -m unittest discover -p test_profile_builder.py To run a file with Pytest use: .. code-block:: console DATAPROFILER_SEED=0 pytest dataprofiler/tests/data_readers/test_csv_data.py -v To run individual of unit test, use form: .. code-block:: console DATAPROFILER_SEED=0 python3 -m unittest dataprofiler.tests.profilers.test_profile_builder.TestProfiler