Given a set of enterprises, their properties, a link prediction function, derive a network of enterprises.
WORK IN PROGRESS / IN HIGH FLUX
flowchart TD
A[Enterprise Dataset<br>500k Enterprises + Features] --> B1
subgraph SG1[Candidate Pair Generation]
B1[Blocking Rules<br>Sector / Geography / Size]
%% B --> B2[ANN Search<br>Embeddings + FAISS/HNSW]
C2 --> C1[Candidate Edge Set<br>~100M pairs]
B1 --> C2[Derive dyadic Features<br>From Enterprise Features]
%% B2 --> C
end
D[Load Link Prediction Model]
C1 --> D
D --> E[Raw Link Probabilities<br>p_ij]
E --> H[Expected Degree Estimation]
H --> H1[Expected Suppliers per Enterprise<br>k_in_i]
H --> H2[Expected Users per Enterprise<br>k_out_j]
H1 --> F[Probability Calibration]
%% H --> F[Probability Calibration]
%% F --> F1[Platt Scaling]
%% F --> F2[Calibration<br> _Beta Calibration_ ]
%% F --> F3[Isotonic Regression]
F --> G[Calibrated Probabilities]
%% F1 --> G[Calibrated Probabilities]
%% F2 --> G[Calibrated Probabilities]
%% F3 --> G
G --> J[Edge Sampling Procedure]
%% I --> I1[Fitness Scaling<br>Enterprise Size / Revenue]
%% I --> I2[Sector Block Matrix<br>Input-Output Constraints]
%% I --> I3[Geographic Decay, from model]
%% I1 --> J
%% I2 --> J
%% I3 --> J
J --> J1[Supplier Sampling per Enterprise]
%% J --> J2[Probabilistic Edge Sampling]
J1 --> K[Construct Network]
%% J2 --> K
H1 --> L
H2 --> L
K --> L[Structural Constraints]
L --> L1[Remove Impossible Sector Links]
L --> L2[Limit Supplier / User Degree]
L --> L3[Remove Reciprocal Loops]
L1 --> M[Reconstructed Enterprise Network]