JSON Data

class dataprofiler.data_readers.json_data.JSONData(input_file_path=None, data=None, options=None)

Bases: dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin, dataprofiler.data_readers.base_data.BaseData

SpreadsheetData class to save and load spreadsheet data

Data class for loading datasets of type JSON. Can be specified by passing in memory data or via a file path. Options pertaining the JSON may also be specified using the options dict parameter. Possible Options:

options = dict(
    data_format= type: str, choices: "dataframe", "records", "json",
     "flattened_dataframe"
    selected_keys= type: list(str)
    payload_keys= type: Union[str, list(str)]
)

data_format: user selected format in which to return data can only be of specified types selected_keys: keys being selected from the entire dataset payload_keys: list of dictionary keys that determine the payload

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

data_type = 'json'
property selected_keys
property metadata

Returns a data frame that contains the metadata

property data_and_metadata

Returns a data frame that joins the data and the metadata.

property is_structured

Determines compatibility with StructuredProfiler

classmethod is_match(file_path, options=None)

Test the first 1000 lines of a given file to check if the file has valid JSON format or not. At least 60 percent of the lines in the first 1000 lines have to be valid json.

Parameters
  • file_path (str) – path to the file to be examined

  • options (dict) – json read options

Returns

is file a json file or not

Return type

bool

property data
property data_format
property file_encoding
get_batch_generator(batch_size)
info = None
property length

Returns the length of the dataset which is loaded.

Returns

length of the dataset

reload(input_file_path=None, data=None, options=None)

Reload the data class with a new dataset. This erases all existing data/options and replaces it with the input data/options.

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None