- generate every (string) similarity measure score for the names of two subjects - threshold the mean of every score (to reduce data) - export duplicate candidates to a file in the following way: - for every new subject there should be a list of subjects in the knowledge base that can be linked - the grouped subjects should be ranked by the mean of the scores in descending order - every subject should contain all the subject data, the similarity measure scores and the mean of the scores - export format should be json - a python script should then use the exported data and enable manually annotating it