JSON Data

Contains class to save and load json data.

class dataprofiler.data_readers.json_data.JSONData(input_file_path=None, data=None, options=None)

Bases: dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin, dataprofiler.data_readers.base_data.BaseData

SpreadsheetData class to save and load spreadsheet data.

Initialize Data class for loading datasets of type JSON.

Can be specified by passing in memory data or via a file path. Options pertaining the JSON may also be specified using the options dict parameter. Possible Options:

options = dict(
    data_format= type: str, choices: "dataframe", "records", "json",
     "flattened_dataframe"
    selected_keys= type: list(str)
    payload_keys= type: Union[str, list(str)]
)

data_format: user selected format in which to return data can only be of specified types selected_keys: keys being selected from the entire dataset payload_keys: list of dictionary keys that determine the payload

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

data_type = 'json'
property selected_keys

Return selected keys.

property metadata

Return a data frame that contains the metadata.

property data_and_metadata

Return a data frame that joins the data and the metadata.

property is_structured

Determine compatibility with StructuredProfiler.

classmethod is_match(file_path, options=None)

Test whether first 1000 lines of file has valid JSON format or not.

At least 60 percent of the lines in the first 1000 lines have to be valid json.

Parameters
  • file_path (str) – path to the file to be examined

  • options (dict) – json read options

Returns

is file a json file or not

Return type

bool

property data

Return data.

property data_format

Return data format.

property file_encoding

Return file encoding.

get_batch_generator(batch_size)

Get batch generator.

info = None
property length

Return the length of the dataset which is loaded.

Returns

length of the dataset

reload(input_file_path=None, data=None, options=None)

Reload the data class with a new dataset.

This erases all existing data/options and replaces it with the input data/options.

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None