This repository contains the Big Exchange HIN data files as supplementary data to the CAA 2025 publication.
The dataset contains the files nodes.csv, node_types.csv, edges.csv, edge_types.csv, schema.csv and Geolocations_ESRI102017.csv. The format and content of these files is described in greater detail below.
Each line contains a node type ID as integer.
The node IDs, starting implicitly with 0, are sorted in ascending order, and are continuous.
For example, if line 813 (assuming the first line is numbered with 0) contains 5\n, this signifies that the 814th node (node ID 813) has the sixth node type (as numbering of node type IDs also starts with 0), which is explained in node_types.csv.
Each line contains a node type label.
The node type IDs implicitly start with 0, are sorted in ascending order, and are continuous.
For example, line 5 Geolocation\ (assuming the first line is numbered with 0) signifies that the sixth node type (node type ID 5) has the label 'Geolocation'.
Hence, the node with ID 813 from the previous example is a Geolocation-type node.
Each line contains a source node ID, an edge type ID, and destination node ID, all encoded as integers and separated by comma.
The edge IDs are implicitly assumed in ascending, continuous fashion, starting with 0, and sorted by source node ID primarily and destination node ID secondly.
For example, line 1694 819,17,811\n signifies that the 1695th edge (edge ID 1694) of the edge type ID 17 points from the node with ID 819 to the node with ID 811. Nice.
Each line contains an edge type label.
The type IDs implicitly start with 0, are sorted in ascending order, and are continuous.
For example, line 17 find_material\n (assuming the first line is numbered with 0) signifies that the 18th edge type (edge type ID 17) has the label 'find_material'.
Hence, edge 1694 from the previous example is a find_material edge that connects Find 819 to Material 811.
Each line contains a source node type ID, and a destination type ID, separated by comma.
The lines are sorted by the edge type ID primarily, and secondly by source node type (in case of symmetric relations split into two edge types).
For example, the first line (i.e. line 0) 1,2\n signifies that the first edge type (ID 0) points from nodes of type 1 to nodes of type 2.
Each line contains the ESRI102017 (WGS 1984 Lambert Azimuthal EqArea North Pole) coordinates of a Geolocation node, where the first line contains the coordinates of the first Geolocation node occurring in the nodes.csv file (i.e., the first node of node type 5). This coordinate information is used to compute the spatial proximity of pairs of Geolocations.