Skip to content

cu-library/NZDedupeCheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

NZDedupeCheck

Checks quality of matching between records in CF NZ Dedupe Report

Inputs:
•NZ Dedupe Report in .csv sorted by Identifier!
NOTE: Requires columns for Title, Publication Date, Language Of Cataloging, Author, ISBN (Normalized), Edition, and Publisher, along with standard columns. Use KB - NZ Dedupe Report with Comparison Fields template available in NZ Analytics instance.

Outputs:
•Report with confidence next to all records with matching values in Identifier column

Process:
•Prompts for file using tkinter filedialog
•Compares adjacent rows on value in Identifier column (file must be sorted by Identifier to ensure matches are adjacent)
•If a match is found, compares key fields (Title, Publication Date, Language Of Cataloging, Author, ISBN (Normalized), Edition, and Publisher) using fuzz.WRatio
•Adds a column for Similarity, populated with the average of all comparison fields or 0 if no matching record found
•Prompt user to select a directory for the output file
•Saves the output as a file with a unique name using date and time

Dependencies:
•Pandas
•FuzzyWuzzy
•NumPy
•DateTime
•TKinter
•Time

About

Checks quality of matching between records in CF NZ Dedupe Report

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages