dataprofiler.data_readers.csv_data module¶
Contains class that saves and loads spreadsheet data.
- class dataprofiler.data_readers.csv_data.CSVData(input_file_path: str | None = None, data: DataFrame | None = None, options: Dict | None = None)¶
Bases:
SpreadSheetDataMixin
,BaseData
SpreadsheetData class to save and load spreadsheet data.
Initialize Data class for loading datasets of type CSV.
Can be specified by passing in memory data or via a file path. Options pertaining to CSV may also be specified using options dict param. Possible Options:
options = dict( delimiter= type: str data_format= type: str, choices: "dataframe", "records" record_samples_per_line= type: int (only for "records") selected_columns= type: list(str) header= type: any )
delimiter: delimiter used to decipher the csv input file data_format: user selected format in which to return data can only be of specified types: ``` dataframe - (default) loads the dataset as a pandas.DataFrame records - loads the data as rows of text values, the extra parameter
“record_samples_per_line” determines how many rows are combined into a single line
``` selected_columns: columns being selected from the entire dataset header: location of the header in the file quotechar: quote character used in the delimited file
- Parameters:
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns:
None
- data_type: str = 'csv'¶
- property selected_columns: List[str]¶
Return selected columns.
- property delimiter: str | None¶
Return delimiter.
- property quotechar: str | None¶
Return quotechar.
- property header: str | int | None¶
Return header.
- property sample_nrows: int | None¶
Return sample_nrows.
- property is_structured: bool¶
Determine compatibility with StructuredProfiler.
- property data¶
Return data.
- property data_format: str | None¶
Return data format.
- property file_encoding: str | None¶
Return file encoding.
- get_batch_generator(batch_size: int) Generator[DataFrame | List, None, None] ¶
Get batch generator.
- info: str | None = None¶
- classmethod is_match(file_path: str, options: Dict | None = None) bool ¶
Check if first 1000 lines of given file has valid delimited format.
- Parameters:
file_path (str) – path to the file to be examined
options (dict) – delimiter read options dict(delimiter=”,”)
- Returns:
is file a csv file or not
- Return type:
bool
- property length: int¶
Return the length of the dataset which is loaded.
- Returns:
length of the dataset
- options: Dict | None¶
- reload(input_file_path: str | None = None, data: DataFrame | None = None, options: Dict | None = None)¶
Reload the data class with a new dataset.
This erases all existing data/options and replaces it with the input data/options.
- Parameters:
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns:
None