Skip to content

Collide a vector of strings with itself to identify similarities (Record Linkage & Record Deduplication)

Notifications You must be signed in to change notification settings

leerssej/RecCollideR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RecCollideR

Collide a Record Vector with Itself to Identify Internal Similarities; Record Linkage Deduplication

Code Snippet that uses the Record Linkage Package to Rank Similarities of Records to each other
  • thresholds can be set to speed up linking of highly related text values, and allow for rapid focus on the sector with greatest density of edge cases.
  • A good limit to seperate the uncertain from certain is often around 0.92

About

Collide a vector of strings with itself to identify similarities (Record Linkage & Record Deduplication)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages