-
Notifications
You must be signed in to change notification settings - Fork 8
Zhaoqing's Dictionary #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi—
Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning?
Thanks,
Ian Ross
System Integration Developer
University of Wisconsin-Madison Computer Science Department
Center for High-Throughput Computing
On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote:
Hi,
I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much!
Zhaoqing
You can view, comment on, or merge this pull request online at:
#3
Commit Summary
Update config.yaml
Add files via upload
Add files via upload
Add files via upload
Delete dictionary.csv
Add files via upload
File Changes
M config.yaml (8)
M dictionary.csv (123200)
Patch Links:
https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch
https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Zhaoqing |
|
Hi Zhaoqing—
We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries:
#15310-LN
15310-LN
TER461
TER-461
Ter 461
TER479
TER-479
Ter 479
Extract 519
If you update it that way, I can start the matching today.
On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote:
Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi,
Yes, All of the comma-separated terms within a line are synonyms for one entity.
Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms.
Zhaoqing
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Awesome! Updated. Thank you very much! |
Hi Ian, I am not sure if you have started the matching already. I am sorry that I may have to change my request. Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong. Best, |
|
Sure, we can do the overlap between the two sets of documents. I’ve started the process using the new definitions (separated “virus” and “cell_line” terms).
On February 16, 2020 at 2:34:04 PM, Zachary Cui (notifications@github.com) wrote:
Hi Zhaoqing— We don’t currently have any internal mechanism for handling synonyms this way. I think the best approach at the moment is to treat each synonym as its own term and scan the literature that way. So the first line would become 9 separate entries: #15310-LN 15310-LN TER461 TER-461 Ter 461 TER479 TER-479 Ter 479 Extract 519 If you update it that way, I can start the matching today. On February 13, 2020 at 10:29:30 AM, Zachary Cui (notifications@github.com) wrote: Hi— Could you explain the column structure of the CSV? Are all of the comma-separated terms within a line synonyms for one entity? Or do they carry some other kind of meaning? Thanks, Ian Ross System Integration Developer University of Wisconsin-Madison Computer Science Department Center for High-Throughput Computing On February 12, 2020 at 9:22:22 PM, Zachary Cui (notifications@github.com) wrote: Hi, I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much! Zhaoqing You can view, comment on, or merge this pull request online at: #3 Commit Summary Update config.yaml Add files via upload Add files via upload Add files via upload Delete dictionary.csv Add files via upload File Changes M config.yaml (8) M dictionary.csv (123200) Patch Links: https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.patch https://github.com/UW-Deepdive-Infrastructure/dictionary_example/pull/3.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Hi, Yes, All of the comma-separated terms within a line are synonyms for one entity. Each line only contains one term. For each line, the first one is the most commonly used one. The words following within in that line are synonyms. Zhaoqing — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Ian,
I am not sure if you have started the matching already. I am sorry that I may have to change my request.
Previously, I supplied only a single dictionary with both cell line terms and virus terms. However, what we really want at the end is publications that contain at least one keyword from BOTH cell line terms AND virus terms. Thus, I have split dictionary.csv into dictionary_cell_line.csv and dictionary_virus.csv. i.e. ultimately, we want articles that have at least one word from BOTH above dictionaries. I think you might be able to pipeline the matching process by using one dictionary first and then using the other one on the result obtained through the first dictionary. Please correct me if I am wrong.
Best,
Zhaoqing Cui
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi Ian, |
Hi,
I am an undergraduate from UW- Madison who is working on my honor thesis. I need to use DeepDive-Infrasctrecture to filter out publications that contain the words in my dictionary. I have updated the relevant files. Please let me know if you have an questions. Thank you very much!
Zhaoqing