dataprofiler.data_readers.csv_data module¶

Contains class that saves and loads spreadsheet data.

class dataprofiler.data_readers.csv_data.CSVData(input_file_path: str | None = None, data: DataFrame | None = None, options: Dict | None = None)¶

Bases: SpreadSheetDataMixin, BaseData

SpreadsheetData class to save and load spreadsheet data.

Initialize Data class for loading datasets of type CSV.

Can be specified by passing in memory data or via a file path. Options pertaining to CSV may also be specified using options dict param. Possible Options:

options = dict(
    delimiter= type: str
    data_format= type: str, choices: "dataframe", "records"
    record_samples_per_line= type: int (only for "records")
    selected_columns= type: list(str)
    header= type: any
)

delimiter: delimiter used to decipher the csv input file data_format: user selected format in which to return data can only be of specified types: ``` dataframe - (default) loads the dataset as a pandas.DataFrame records - loads the data as rows of text values, the extra parameter

“record_samples_per_line” determines how many rows are combined into a single line

``` selected_columns: columns being selected from the entire dataset header: location of the header in the file quotechar: quote character used in the delimited file

Parameters:

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns:

None

data_type: str = 'csv'¶

property selected_columns: List[str]¶: Return selected columns.

property delimiter: str | None¶: Return delimiter.

property quotechar: str | None¶: Return quotechar.

property header: str | int | None¶: Return header.

property sample_nrows: int | None¶: Return sample_nrows.

property is_structured: bool¶: Determine compatibility with StructuredProfiler.

property data¶: Return data.

property data_format: str | None¶: Return data format.

property file_encoding: str | None¶: Return file encoding.

get_batch_generator(batch_size: int) → Generator[DataFrame | List, None, None]¶: Get batch generator.

info: str | None = None¶

classmethod is_match(file_path: str, options: Dict | None = None) → bool¶

Check if first 1000 lines of given file has valid delimited format.

Parameters:

file_path (str) – path to the file to be examined
options (dict) – delimiter read options dict(delimiter=”,”)

Returns:

is file a csv file or not

Return type:

bool

property length: int¶

Return the length of the dataset which is loaded.

Returns:: length of the dataset

options: Dict | None¶

reload(input_file_path: str | None = None, data: DataFrame | None = None, options: Dict | None = None)¶

Reload the data class with a new dataset.

This erases all existing data/options and replaces it with the input data/options.

Parameters:

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns:

None