Graph Data¶

Contains class for identifying, reading, and loading graph data.

class dataprofiler.data_readers.graph_data.GraphData(input_file_path: Optional[str] = None, data: Optional[networkx.classes.graph.Graph] = None, options: Optional[Dict] = None)¶

Bases: dataprofiler.data_readers.base_data.BaseData

GraphData class to identify, read, and load graph data.

Initialize Data class for identifying, reading, and loading graph data.

Current implementation only accepts file path as input. An options parameter is also passed in to specify properties of the input file.

Possible Options:

options = dict(
    delimiter= type: str
    column_names= type: list(str)
    source_node= type: int
    destination_node= type: int
    target_keywords= type: list(str)
    source_keywords= type: list(str)
    header= type: any
    quotechar= type: str
)

delimiter: delimiter used to decipher the csv input file column_names: list of column names of the csv source_node: index of the source node column, range of (0,n-1) target_node: index of the target node column, range of (0,n-1) target_keywords: list of keywords to identify target/destination node col source_keywords: list of keywords to identify source node col graph_keywords: list of keywords to identify if data has graph data header: location o the header in the file quotechar: quote character used in the delimited file

Parameters

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns

None

data_type: str = 'graph'¶

classmethod csv_column_names(file_path: str, header: Optional[int], delimiter: Optional[str], encoding: str = 'utf-8') → List[str]¶: Fetch a list of column names from the csv file.

classmethod is_match(file_path: str, options: Optional[Dict] = None) → bool¶

Determine whether the file is a graph.

Current formats checked:

attributed edge list

This works by finding whether the file contains a target and a source node

check_integer(string: str) → Union[int, str]¶: Check whether string is integer and output integer.

property data¶: Return data.

property data_format: Optional[str]¶: Return data format.

property file_encoding: Optional[str]¶: Return file encoding.

get_batch_generator(batch_size: int) → Generator[Union[pandas.core.frame.DataFrame, List], None, None]¶: Get batch generator.

info: Optional[str] = None¶

property is_structured: bool¶: Determine compatibility with StructuredProfiler.

property length: int¶

Return the length of the dataset which is loaded.

Returns: length of the dataset

reload(input_file_path: Optional[str], data: Any, options: Optional[Dict]) → None¶

Reload the data class with a new dataset.

This erases all existing data/options and replaces it with the input data/options.

Parameters

input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type

Returns

None

options: Optional[Dict]¶