Graph Data¶
Contains class for identifying, reading, and loading graph data.
- class dataprofiler.data_readers.graph_data.GraphData(input_file_path: Optional[str] = None, data: Optional[networkx.classes.graph.Graph] = None, options: Optional[Dict] = None)¶
Bases:
dataprofiler.data_readers.base_data.BaseData
GraphData class to identify, read, and load graph data.
Initialize Data class for identifying, reading, and loading graph data.
Current implementation only accepts file path as input. An options parameter is also passed in to specify properties of the input file.
Possible Options:
options = dict( delimiter= type: str column_names= type: list(str) source_node= type: int destination_node= type: int target_keywords= type: list(str) source_keywords= type: list(str) header= type: any quotechar= type: str )
delimiter: delimiter used to decipher the csv input file column_names: list of column names of the csv source_node: index of the source node column, range of (0,n-1) target_node: index of the target node column, range of (0,n-1) target_keywords: list of keywords to identify target/destination node col source_keywords: list of keywords to identify source node col graph_keywords: list of keywords to identify if data has graph data header: location o the header in the file quotechar: quote character used in the delimited file
- Parameters
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns
None
- data_type: str = 'graph'¶
- classmethod csv_column_names(file_path: str, header: Optional[int], delimiter: Optional[str], encoding: str = 'utf-8') List[str] ¶
Fetch a list of column names from the csv file.
- classmethod is_match(file_path: str, options: Optional[Dict] = None) bool ¶
Determine whether the file is a graph.
- Current formats checked:
attributed edge list
This works by finding whether the file contains a target and a source node
- check_integer(string: str) Union[int, str] ¶
Check whether string is integer and output integer.
- property data¶
Return data.
- property data_format: Optional[str]¶
Return data format.
- property file_encoding: Optional[str]¶
Return file encoding.
- get_batch_generator(batch_size: int) Generator[Union[pandas.core.frame.DataFrame, List], None, None] ¶
Get batch generator.
- info: Optional[str] = None¶
- property is_structured: bool¶
Determine compatibility with StructuredProfiler.
- property length: int¶
Return the length of the dataset which is loaded.
- Returns
length of the dataset
- reload(input_file_path: Optional[str], data: Any, options: Optional[Dict]) None ¶
Reload the data class with a new dataset.
This erases all existing data/options and replaces it with the input data/options.
- Parameters
input_file_path (str) – path to the file being loaded or None
data (multiple types) – data being loaded into the class instead of an input file
options (dict) – options pertaining to the data type
- Returns
None
- options: Optional[Dict]¶