Skip to content

Conversation

@ZhaoqingCui
Copy link

Hi,

I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much!

Zhaoqing

@iross
Copy link
Member

iross commented Feb 13, 2020 via email

@ZhaoqingCui
Copy link
Author

Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at:   #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi,

Yes, All of the comma-separated terms within a line are synonyms for one entity.
Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms.

Zhaoqing

@iross
Copy link
Member

iross commented Feb 13, 2020 via email

@ZhaoqingCui
Copy link
Author

Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at:   #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Awesome! Updated. Thank you very much!

@ZhaoqingCui
Copy link
Author

Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi Ian,

I am not sure if you have started the matching already. I am sorry that I may have to change my request.

Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong.

Best,
Zhaoqing Cui

@iross
Copy link
Member

iross commented Feb 17, 2020 via email

@ZhaoqingCui
Copy link
Author

Sure, we can do the overlap between the two sets of documents. I’ve started the process using the new definitions (separated “virus” and “cell_line” terms). On February 16, 2020 at 2:34:04 PM, Zachary Cui (notifications@github.com) wrote: Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Hi Ian, I am not sure if you have started the matching already. I am sorry that I may have to change my request. Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong. Best, Zhaoqing Cui — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi Ian,
Thanks for starting the process. I am wondering when I can receive some results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants