An R based GeoSemantic Approach to Master Data Management
A powerpoint presentation details the challenge, and the solution coded in R and Tidyverse to resolve the record linking challenges faced in deduplicating a major global pharmaceutical company's collection of all clinical trials sites which it had compiled through the help of thousands of individual contributors making free text entries over many years.
This was from an ad-hoc, proof of concept project so the functional, production readiness of this code was not desired. The main goal of this code was to employ complex data science solutions as rapidly and effectively as possible.