Skip to content

OCR finds words that are not in the figure #18

@khanspers

Description

@khanspers

I found several results where PFOCR reported diseases that are not in the figure. Here's a list of PFOCR urls, with the false positive diseases listed for each, meaning these are reported in the Disease mentions table, and in the rds.

On the PFOCR website, the diseases are listed ("Word" column), but are not mapped. In the rds, "doid" is non-blank for these diseases. This may be related to #16.

https://pfocr.wikipathways.org/figures/PMC3093193__nihms289177f2.html: Melanoma. Potentially misread of melanocyte, melanogenesis or melanogenic.
https://pfocr.wikipathways.org/figures/PMC2744676__1478-811X-7-20-2.html: Cancer, Cardiomyopathy, Lung cancer, Melanoma, Noonan syndrome
https://pfocr.wikipathways.org/figures/PMC4851838__ajcr0006-0577-f1.html: Cancer, Cardiomyopathy, Lung cancer, Melanoma, Noonan syndrome, Cholangiocarcinoma
https://pfocr.wikipathways.org/figures/PMC4505740__nihms655432f1.html: Cancer, Lung cancer, Melanoma
https://pfocr.wikipathways.org/figures/PMC3251651__nihms339102f4.html: Cancer, Melanoma
https://pfocr.wikipathways.org/figures/PMC5385394__WJG-23-2276-g001.html: Cancer, Melanoma

These were found from reviewing PFOCR-augmented BTE results, from two queries related to Melanoma and Lung cancer. If I find other cases, I will add them here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions