JSON Data¶
Contains class to save and load json data.
- class dataprofiler.data_readers.json_data.JSONData(input_file_path: Optional[str] = None, data: Optional[Union[pandas.core.frame.DataFrame, str]] = None, options: Optional[Dict] = None)¶
Bases:
dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin
,dataprofiler.data_readers.base_data.BaseData
SpreadsheetData class to save and load spreadsheet data.
Initialize Data class for loading datasets of type JSON.
Can be specified by passing in memory data or via a file path. Options pertaining the JSON may also be specified using the options dict parameter. Possible Options:
options = dict( data_format= type: str, choices: "dataframe", "records", "json", "flattened_dataframe" selected_keys= type: list(str) payload_keys= type: Union[str, list(str)] )
data_format: user selected format in which to return data can only be of specified types selected_keys: keys being selected from the entire dataset payload_keys: list of dictionary keys that determine the payload
- Parameters
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns
None
- data_type: str = 'json'¶
- property selected_keys: Optional[List[str]]¶
Return selected keys.
- property metadata: Optional[pandas.core.frame.DataFrame]¶
Return a data frame that contains the metadata.
- property data_and_metadata: Optional[pandas.core.frame.DataFrame]¶
Return a data frame that joins the data and the metadata.
- property is_structured¶
Determine compatibility with StructuredProfiler.
- classmethod is_match(file_path: Union[str, _io.StringIO], options: Optional[Dict] = None) bool ¶
Test whether first 1000 lines of file has valid JSON format or not.
At least 60 percent of the lines in the first 1000 lines have to be valid json.
- Parameters
file_path (str) – path to the file to be examined
options (dict) – json read options
- Returns
is file a json file or not
- Return type
bool
- property data¶
Return data.
- property data_format: Optional[str]¶
Return data format.
- property file_encoding: Optional[str]¶
Return file encoding.
- get_batch_generator(batch_size: int) Generator[Union[pandas.core.frame.DataFrame, List], None, None] ¶
Get batch generator.
- info: Optional[str] = None¶
- property length: int¶
Return the length of the dataset which is loaded.
- Returns
length of the dataset
- reload(input_file_path: Optional[str] = None, data: Optional[Union[pandas.core.frame.DataFrame, str]] = None, options: Optional[Dict] = None) None ¶
Reload the data class with a new dataset.
This erases all existing data/options and replaces it with the input data/options.
- Parameters
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns
None
- options: Optional[Dict]¶