dataprofiler.data_readers.avro_data module¶
Contains class for saving and loading spreadsheet data.
- class dataprofiler.data_readers.avro_data.AVROData(input_file_path: str | None = None, data: Any | None = None, options: Dict | None = None)¶
-
AVROData class to save and load spreadsheet data.
Initialize Data class for loading datasets of type AVRO.
Can be specified by passing in memory data or via a file path. Options pertaining to AVRO may also be specified using options dict param. Possible Options:
options = dict( data_format= type: str, choices: "dataframe", "records", "avro" selected_keys= type: list(str) )
data_format: user selected format can only be of specified types selected_keys: keys being selected from the entire dataset
- Parameters:
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns:
None
- data_type: str = 'avro'¶
- property file_encoding: str | None¶
Set file encoding to None since not detected for avro.
- classmethod is_match(file_path: str | StringIO | BytesIO, options: Dict | None = None) bool ¶
Test the given file to check if the file has valid AVRO format or not.
- Parameters:
file_path (str) – path to the file to be examined
options (dict) – avro read options
- Returns:
is file a avro file or not
- Return type:
bool
- property data¶
Return data.
- property data_and_metadata: DataFrame | None¶
Return a data frame that joins the data and the metadata.
- property data_format: str | None¶
Return data format.
- get_batch_generator(batch_size: int) Generator[DataFrame | List, None, None] ¶
Get batch generator.
- info: str | None = None¶
- property is_structured¶
Determine compatibility with StructuredProfiler.
- property length: int¶
Return the length of the dataset which is loaded.
- Returns:
length of the dataset
- property metadata: DataFrame | None¶
Return a data frame that contains the metadata.
- reload(input_file_path: str | None = None, data: DataFrame | str | None = None, options: Dict | None = None) None ¶
Reload the data class with a new dataset.
This erases all existing data/options and replaces it with the input data/options.
- Parameters:
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns:
None
- property selected_keys: List[str] | None¶
Return selected keys.