dataprofiler.data_readers.json_data module¶

class dataprofiler.data_readers.json_data.JSONData(input_file_path=None, data=None, options=None)¶

Bases: dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin, dataprofiler.data_readers.base_data.BaseData

SpreadsheetData class to save and load spreadsheet data

Data class for loading datasets of type JSON. Can be specified by passing in memory data or via a file path. Options pertaining the JSON may also be specified using the options dict parameter. Possible Options:

options = dict(
    data_format= type: str, choices: "dataframe", "records", "json"
    selected_keys= type: list(str)
)

data_format: user selected format in which to return data can only be of specified types selected_keys: keys being selected from the entire dataset

Parameters

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns

None

data_type = 'json'¶

property selected_keys¶

classmethod is_match(file_path, options=None)¶

Test the first 1000 lines of a given file to check if the file has valid JSON format or not. At least 60 percent of the lines in the first 1000 lines have to be valid json.

Parameters

file_path (str) – path to the file to be examined
options (dict) – json read options

Returns

is file a json file or not

Return type

bool

reload(input_file_path=None, data=None, options=None)¶

Reload the data class with a new dataset. This erases all existing data/options and replaces it with the input data/options.

Parameters

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns

None

property data¶

property data_format¶

property file_encoding¶

get_batch_generator(batch_size)¶

info = None¶