CSV Data

class dataprofiler.data_readers.csv_data.CSVData(input_file_path=None, data=None, options=None)

Bases: dataprofiler.data_readers.structured_mixins.SpreadSheetDataMixin, dataprofiler.data_readers.base_data.BaseData

SpreadsheetData class to save and load spreadsheet data

Data class for loading datasets of type CSV. Can be specified by passing in memory data or via a file path. Options pertaining the CSV may also be specified using the options dict parameter. Possible Options:

options = dict(
    delimiter= type: str
    data_format= type: str, choices: "dataframe", "records"
    record_samples_per_line= type: int (only for "records")
    selected_columns= type: list(str)
    header= type: any
)

delimiter: delimiter used to decipher the csv input file data_format: user selected format in which to return data can only be of specified types: ``` dataframe - (default) loads the dataset as a pandas.DataFrame records - loads the data as rows of text values, the extra parameter

“record_samples_per_line” determines how many rows are combined into a single line

``` selected_columns: columns being selected from the entire dataset header: location of the header in the file quotechar: quote character used in the delimited file

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

data_type = 'csv'
property selected_columns
property delimiter
property quotechar
property header
property is_structured

Determines compatibility with StructuredProfiler

property data
property data_format
property file_encoding
get_batch_generator(batch_size)
info = None
classmethod is_match(file_path, options=None)

Test the first 1000 lines of a given file to check if the file has valid delimited format or not.

Parameters
  • file_path (str) – path to the file to be examined

  • options (dict) – delimiter read options dict(delimiter=”,”)

Returns

is file a csv file or not

Return type

bool

property length

Returns the length of the dataset which is loaded.

Returns

length of the dataset

reload(input_file_path=None, data=None, options=None)

Reload the data class with a new dataset. This erases all existing data/options and replaces it with the input data/options.

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None