AI VLookUp

This project provides a multi-data structure backed approach to tackle efficient fuzzy matching in tables; with its engine backed by HashMaps, Tokenization, Jaccard, and Damerau plus small group sorting this approach is capable of creating its own database for fuzzy look up; which is then used to query itself in order to solve the most optimal mapping between entries of two different tables.

Use case

Let’s say we have the following tables: 1. Fuzzy_Table and the 2. Reference_Table. The first table is the one we want to "clean" or fill based on the second table that provides the expected actual values per entry. In this case, we attempt to find the ID each entry based on the values of other columns. The challenge here is that for some entry each column may or may not have data necessary to perform an "ordinary" V Look Up by itself; additionally, it may contain data but this text could be not exact matching to the valid one. Therefore, our objective here is per each entry (row) use all the non-empty columns as reference points.

Without getting into deeper details (check documentation for more) the functionality of this program is to perform a many-to-many (in regards to the columns) weighted fuzzy matching per each incomplete entry against a complete or valid reference table.

Dependencies and Installation

Verify java version > 1.8

java -version

Install maven

brew install maven

The package dependencies may be found under Engine/pom.xml

You may build the program (JAR) from source code as a Maven project.

Example usage

Continuing our example from the use case. Let's say we want to find the Customer_ID for each entry in the Fuzzy_Table. However, if we see our table there might be values that appear in the incorrect columns. Therefore, we plot the mapping between columns across both tables as below.

Graph 1: Default mapping configuration

Set-up

For a more detailed guide please check the User Manual.

Download code

git clone https://github.com/luislascano01/AI_VLookUp

cd AI_VLookUp

Build JAR

cd Engine
mvn clean install

Custom run configuration

Reference and edit sample configuration according to excel tables to be processed.

This configuration includes header mappings –as seen on the image– as well as excel workbooks paths, operating directory path, and secondary data columnwise RegEx set.

To view sample from terminal:

cat Engine/src/main/resources/header_configuration.yaml

Execution

Sample execution

java -jar Engine/target/ai_vlookup-0.0.1-SNAPSHOT.jar Engine/src/main/resources/header_configuration.yaml

Find output as

./OperatingDir/results.csv

Custom execution

Modify the header_configuration.yaml according to your needs. Refer to Graph 1 to understand the mapping – soft-max is applied to weights. Such graph corresponds to the mapping of the sample YAML configuration; specifically, the "BackboneConfiguration".

java -jar Engine/target/ai_vlookup-0.0.1-SNAPSHOT.jar custom_config.yaml

Copyright Notice

Open to use as is through instructed installation for personal use. No permission authorized to copy, modify, or distribute this software (or part of it) and its documentation for any purpose without the express written permission of the copyright holder.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Documentation		Documentation
Engine		Engine
OperatingDir		OperatingDir
Research		Research
Sample_Dataset		Sample_Dataset
COPYRIGHT.md		COPYRIGHT.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI VLookUp

Use case

Dependencies and Installation

Example usage

Set-up

Download code

Build JAR

Custom run configuration

Execution

Sample execution

Custom execution

Copyright Notice

About

Uh oh!

Releases

Packages

Uh oh!

Languages

luislascano01/AI_VLookUp

Folders and files

Latest commit

History

Repository files navigation

AI VLookUp

Use case

Dependencies and Installation

Example usage

Set-up

Download code

Build JAR

Custom run configuration

Execution

Sample execution

Custom execution

Copyright Notice

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages