Avro Data¶

Contains class for saving and loading spreadsheet data.

class dataprofiler.data_readers.avro_data.AVROData(input_file_path: Optional[str] = None, data: Optional[Any] = None, options: Optional[Dict] = None)¶

Bases: dataprofiler.data_readers.json_data.JSONData, dataprofiler.data_readers.base_data.BaseData

AVROData class to save and load spreadsheet data.

Initialize Data class for loading datasets of type AVRO.

Can be specified by passing in memory data or via a file path. Options pertaining to AVRO may also be specified using options dict param. Possible Options:

options = dict(
    data_format= type: str, choices: "dataframe", "records", "avro"
    selected_keys= type: list(str)
)

data_format: user selected format can only be of specified types selected_keys: keys being selected from the entire dataset

Parameters

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns

None

data_type: str = 'avro'¶

property file_encoding: Optional[str]¶: Set file encoding to None since not detected for avro.

classmethod is_match(file_path: Union[str, _io.StringIO, _io.BytesIO], options: Optional[Dict] = None) → bool¶

Test the given file to check if the file has valid AVRO format or not.

Parameters

file_path (str) – path to the file to be examined
options (dict) – avro read options

Returns

is file a avro file or not

Return type

bool

property data¶: Return data.

property data_and_metadata: Optional[pandas.core.frame.DataFrame]¶: Return a data frame that joins the data and the metadata.

property data_format: Optional[str]¶: Return data format.

get_batch_generator(batch_size: int) → Generator[Union[pandas.core.frame.DataFrame, List], None, None]¶: Get batch generator.

info: Optional[str] = None¶

property is_structured¶: Determine compatibility with StructuredProfiler.

property length: int¶

Return the length of the dataset which is loaded.

Returns: length of the dataset

property metadata: Optional[pandas.core.frame.DataFrame]¶: Return a data frame that contains the metadata.

reload(input_file_path: Optional[str] = None, data: Optional[Union[pandas.core.frame.DataFrame, str]] = None, options: Optional[Dict] = None) → None¶

Reload the data class with a new dataset.

This erases all existing data/options and replaces it with the input data/options.

Parameters

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns

None

property selected_keys: Optional[List[str]]¶: Return selected keys.