This project provides a multi-data structure backed approach to tackle efficient fuzzy matching in tables; with its engine backed by HashMaps, Tokenization, Jaccard, and Damerau plus small group sorting this approach is capable of creating its own database for fuzzy look up; which is then used to query itself in order to solve the most optimal mapping between entries of two different tables.
Let’s say we have the following tables: 1. Fuzzy_Table and the 2. Reference_Table. The first table is the one we want to "clean" or fill based on the second table that provides the expected actual values per entry. In this case, we attempt to find the ID each entry based on the values of other columns. The challenge here is that for some entry each column may or may not have data necessary to perform an "ordinary" V Look Up by itself; additionally, it may contain data but this text could be not exact matching to the valid one. Therefore, our objective here is per each entry (row) use all the non-empty columns as reference points.
Without getting into deeper details (check documentation for more) the functionality of this program is to perform a many-to-many (in regards to the columns) weighted fuzzy matching per each incomplete entry against a complete or valid reference table.
Verify java version > 1.8
java -versionInstall maven
brew install mavenThe package dependencies may be found under Engine/pom.xml
You may build the program (JAR) from source code as a Maven project.
Continuing our example from the use case. Let's say we want to find the Customer_ID for each entry in the Fuzzy_Table. However, if we see our table there might be values that appear in the incorrect columns. Therefore, we plot the mapping between columns across both tables as below.
Graph 1: Default mapping configuration
For a more detailed guide please check the User Manual.
git clone https://github.com/luislascano01/AI_VLookUpcd AI_VLookUpcd Engine
mvn clean installReference and edit sample configuration according to excel tables to be processed.
This configuration includes header mappings –as seen on the image– as well as excel workbooks paths, operating directory path, and secondary data columnwise RegEx set.
To view sample from terminal:
cat Engine/src/main/resources/header_configuration.yamljava -jar Engine/target/ai_vlookup-0.0.1-SNAPSHOT.jar Engine/src/main/resources/header_configuration.yamlFind output as
./OperatingDir/results.csv Modify the header_configuration.yaml according to your needs. Refer to Graph 1 to understand the mapping – soft-max is applied to weights. Such graph corresponds to the mapping of the sample YAML configuration; specifically, the "BackboneConfiguration".
java -jar Engine/target/ai_vlookup-0.0.1-SNAPSHOT.jar custom_config.yaml© [2025] [Luis Lascano]. All rights reserved.
Open to use as is through instructed installation for personal use. No permission authorized to copy, modify, or distribute this software (or part of it) and its documentation for any purpose without the express written permission of the copyright holder.
