Checks quality of matching between records in CF NZ Dedupe Report
Inputs:
•NZ Dedupe Report in .csv sorted by Identifier!
NOTE: Requires columns for Title, Publication Date, Language Of Cataloging, Author, ISBN (Normalized), Edition, and Publisher, along with standard columns. Use KB - NZ Dedupe Report with Comparison Fields template available in NZ Analytics instance.
Outputs:
•Report with confidence next to all records with matching values in Identifier column
Process:
•Prompts for file using tkinter filedialog
•Compares adjacent rows on value in Identifier column (file must be sorted by Identifier to ensure matches are adjacent)
•If a match is found, compares key fields (Title, Publication Date, Language Of Cataloging, Author, ISBN (Normalized), Edition, and Publisher) using fuzz.WRatio
•Adds a column for Similarity, populated with the average of all comparison fields or 0 if no matching record found
•Prompt user to select a directory for the output file
•Saves the output as a file with a unique name using date and time
Dependencies:
•Pandas
•FuzzyWuzzy
•NumPy
•DateTime
•TKinter
•Time
-
Notifications
You must be signed in to change notification settings - Fork 0
cu-library/NZDedupeCheck
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Checks quality of matching between records in CF NZ Dedupe Report
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published