Skip to content

SNStatComp/AIML4OS_WP11_reconstruction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AIML4OS_WP11_reconstruction

Given a set of enterprises, their properties, a link prediction function, derive a network of enterprises.

WORK IN PROGRESS / IN HIGH FLUX

flowchart TD


A[Enterprise Dataset<br>500k Enterprises + Features] --> B1

subgraph SG1[Candidate Pair Generation]

B1[Blocking Rules<br>Sector / Geography / Size]
%% B --> B2[ANN Search<br>Embeddings + FAISS/HNSW]

C2 --> C1[Candidate Edge Set<br>~100M pairs]

B1 --> C2[Derive dyadic Features<br>From Enterprise Features]
%% B2 --> C
end

D[Load Link Prediction Model]
C1 --> D
D --> E[Raw Link Probabilities<br>p_ij]

E --> H[Expected Degree Estimation]

H --> H1[Expected Suppliers per Enterprise<br>k_in_i]
H --> H2[Expected Users per Enterprise<br>k_out_j]

H1 --> F[Probability Calibration]

%% H --> F[Probability Calibration]

%% F --> F1[Platt Scaling]
%% F --> F2[Calibration<br> _Beta Calibration_ ]
%% F --> F3[Isotonic Regression]

F --> G[Calibrated Probabilities]
%% F1 --> G[Calibrated Probabilities]
%% F2 --> G[Calibrated Probabilities]
%% F3 --> G

G --> J[Edge Sampling Procedure]

%% I --> I1[Fitness Scaling<br>Enterprise Size / Revenue]
%% I --> I2[Sector Block Matrix<br>Input-Output Constraints]
%% I --> I3[Geographic Decay, from model]

%% I1 --> J
%% I2 --> J
%% I3 --> J

J --> J1[Supplier Sampling per Enterprise]
%% J --> J2[Probabilistic Edge Sampling]

J1 --> K[Construct Network]
%% J2 --> K

H1 --> L
H2 --> L

K --> L[Structural Constraints]

L --> L1[Remove Impossible Sector Links]
L --> L2[Limit Supplier / User Degree]
L --> L3[Remove Reciprocal Loops]

L1 --> M[Reconstructed Enterprise Network]
Loading

About

Reconstruction pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors