dataprofiler.data_readers.json_data module

class dataprofiler.data_readers.json_data.JSONData(input_file_path=None, data=None, options=None)

Bases: dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin, dataprofiler.data_readers.base_data.BaseData

SpreadsheetData class to save and load spreadsheet data

Data class for loading datasets of type JSON. Can be specified by passing in memory data or via a file path. Options pertaining the JSON may also be specified using the options dict parameter. Possible Options:

options = dict(
    data_format= type: str, choices: "dataframe", "records", "json"
    selected_keys= type: list(str)
)

data_format: user selected format in which to return data can only be of specified types selected_keys: keys being selected from the entire dataset

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

data_type = 'json'
property selected_keys
classmethod is_match(file_path, options=None)

Test the first 1000 lines of a given file to check if the file has valid JSON format or not. At least 60 percent of the lines in the first 1000 lines have to be valid json.

Parameters
  • file_path (str) – path to the file to be examined

  • options (dict) – json read options

Returns

is file a json file or not

Return type

bool

reload(input_file_path=None, data=None, options=None)

Reload the data class with a new dataset. This erases all existing data/options and replaces it with the input data/options.

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

property data
property data_format
property file_encoding
get_batch_generator(batch_size)
info = None