Graph Data

Contains class for identifying, reading, and loading graph data.

class dataprofiler.data_readers.graph_data.GraphData(input_file_path: Optional[str] = None, data: Optional[networkx.classes.graph.Graph] = None, options: Optional[Dict] = None)

Bases: dataprofiler.data_readers.base_data.BaseData

GraphData class to identify, read, and load graph data.

Initialize Data class for identifying, reading, and loading graph data.

Current implementation only accepts file path as input. An options parameter is also passed in to specify properties of the input file.

Possible Options:

options = dict(
    delimiter= type: str
    column_names= type: list(str)
    source_node= type: int
    destination_node= type: int
    target_keywords= type: list(str)
    source_keywords= type: list(str)
    header= type: any
    quotechar= type: str
)

delimiter: delimiter used to decipher the csv input file column_names: list of column names of the csv source_node: index of the source node column, range of (0,n-1) target_node: index of the target node column, range of (0,n-1) target_keywords: list of keywords to identify target/destination node col source_keywords: list of keywords to identify source node col graph_keywords: list of keywords to identify if data has graph data header: location o the header in the file quotechar: quote character used in the delimited file

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

data_type: str = 'graph'
classmethod csv_column_names(file_path: str, header: Optional[int], delimiter: Optional[str], encoding: str = 'utf-8') List[str]

Fetch a list of column names from the csv file.

classmethod is_match(file_path: str, options: Optional[Dict] = None) bool

Determine whether the file is a graph.

Current formats checked:
  • attributed edge list

This works by finding whether the file contains a target and a source node

check_integer(string: str) Union[int, str]

Check whether string is integer and output integer.

property data

Return data.

property data_format: Optional[str]

Return data format.

property file_encoding: Optional[str]

Return file encoding.

get_batch_generator(batch_size: int) Generator[Union[pandas.core.frame.DataFrame, List], None, None]

Get batch generator.

info: Optional[str] = None
property is_structured: bool

Determine compatibility with StructuredProfiler.

property length: int

Return the length of the dataset which is loaded.

Returns

length of the dataset

reload(input_file_path: Optional[str], data: Any, options: Optional[Dict]) None

Reload the data class with a new dataset.

This erases all existing data/options and replaces it with the input data/options.

Parameters
  • input_file_path (str) – path to the file being loaded or None

  • data (multiple types) – data being loaded into the class instead of an input file

  • options (dict) – options pertaining to the data type

Returns

None

options: Optional[Dict]