Zhaoqing's Dictionary #3

ZhaoqingCui · 2020-02-13T03:22:17Z

Hi,

I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much!

Zhaoqing

iross · 2020-02-13T16:22:31Z

Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ZhaoqingCui · 2020-02-13T16:29:22Z

Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi,

Yes, All of the comma-separated terms within a line are synonyms for one entity.
Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms.

Zhaoqing

iross · 2020-02-13T16:41:17Z

Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ZhaoqingCui · 2020-02-13T17:00:07Z

Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Awesome! Updated. Thank you very much!

ZhaoqingCui · 2020-02-16T20:33:59Z

Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi Ian,

I am not sure if you have started the matching already. I am sorry that I may have to change my request.

Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong.

Best,
Zhaoqing Cui

iross · 2020-02-17T15:03:59Z

Sure, we can do the overlap between the two sets of documents. I’ve started the process using the new definitions (separated “virus” and “cell_line” terms). On February 16, 2020 at 2:34:04 PM, Zachary Cui (notifications@github.com) wrote: Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Hi Ian, I am not sure if you have started the matching already. I am sorry that I may have to change my request. Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong. Best, Zhaoqing Cui — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ZhaoqingCui · 2020-02-20T04:22:02Z

Sure, we can do the overlap between the two sets of documents. I’ve started the process using the new definitions (separated “virus” and “cell_line” terms). On February 16, 2020 at 2:34:04 PM, Zachary Cui (notifications@github.com) wrote: Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Hi Ian, I am not sure if you have started the matching already. I am sorry that I may have to change my request. Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong. Best, Zhaoqing Cui — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Hi Ian,
Thanks for starting the process. I am wondering when I can receive some results.

ZhaoqingCui added 6 commits February 12, 2020 21:14

Update config.yaml

d110221

Add files via upload

5490b09

Add files via upload

3015bd7

Add files via upload

8ca243f

Delete dictionary.csv

bde8db3

Add files via upload

1986484

Add files via upload

b3df63a

ZhaoqingCui added 2 commits February 15, 2020 15:20

Delete dictionary.csv

e34762b

Add files via upload

f5fb3d3

Add files via upload

70d3ac6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zhaoqing's Dictionary #3

Zhaoqing's Dictionary #3

Uh oh!

ZhaoqingCui commented Feb 13, 2020

Uh oh!

iross commented Feb 13, 2020 via email

Uh oh!

ZhaoqingCui commented Feb 13, 2020

Uh oh!

iross commented Feb 13, 2020 via email

Uh oh!

ZhaoqingCui commented Feb 13, 2020

Uh oh!

ZhaoqingCui commented Feb 16, 2020

Uh oh!

iross commented Feb 17, 2020 via email

Uh oh!

ZhaoqingCui commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zhaoqing's Dictionary #3

Are you sure you want to change the base?

Zhaoqing's Dictionary #3

Uh oh!

Conversation

ZhaoqingCui commented Feb 13, 2020

Uh oh!

iross commented Feb 13, 2020 via email

Uh oh!

ZhaoqingCui commented Feb 13, 2020

Uh oh!

iross commented Feb 13, 2020 via email

Uh oh!

ZhaoqingCui commented Feb 13, 2020

Uh oh!

ZhaoqingCui commented Feb 16, 2020

Uh oh!

iross commented Feb 17, 2020 via email

Uh oh!

ZhaoqingCui commented Feb 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants