-
Notifications
You must be signed in to change notification settings - Fork 2
IN 1333 - limit article feed by date via env var #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
65eb590
Update dependencies and some linting
ghukill 10ae819
Add carbon run failure reason to logging
ghukill 8e45d04
Move aws cli env var into command
ghukill 4e22ce8
Support incremental articles run based on env var
ghukill 1bbce7c
Instantiate config when needed and support optional env vars
ghukill File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,4 +14,5 @@ vendor/ | |
| .env | ||
| *.log | ||
| output/ | ||
| .vscode/ | ||
| .vscode/ | ||
| .idea | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,9 +5,10 @@ | |
| from typing import IO, Any, ClassVar | ||
|
|
||
| from lxml import etree as ET | ||
| from sqlalchemy import func, select | ||
| from sqlalchemy import func, select, text | ||
| from sqlalchemy.sql.selectable import Select | ||
|
|
||
| from carbon.config import Config | ||
| from carbon.database import DatabaseEngine, aa_articles, dlcs, orcids, persons | ||
| from carbon.helpers import ( | ||
| get_group_name, | ||
|
|
@@ -107,13 +108,35 @@ class ArticlesXmlFeed(BaseXmlFeed): | |
| """Articles XML feed class.""" | ||
|
|
||
| root_element_name = "ARTICLES" | ||
| query = ( | ||
| select(aa_articles) | ||
| .where(aa_articles.c.ARTICLE_ID.is_not(None)) | ||
| .where(aa_articles.c.ARTICLE_TITLE.is_not(None)) | ||
| .where(aa_articles.c.DOI.is_not(None)) | ||
| .where(aa_articles.c.MIT_ID.is_not(None)) | ||
| ) | ||
|
|
||
| @property | ||
| def query(self) -> Select: # type: ignore[override] | ||
| """Build data warehouse query for Articles. | ||
|
|
||
| If the env var "ARTICLES_PUBLISH_DAYS_PAST" is set, filter the query to rows | ||
| where PUBLISH_DATE is >= than this many days old. Notethat the PUBLISH_DATE can | ||
| be in the future, so an article may be included multiple times in the XML output | ||
| until its future date has passed. | ||
| """ | ||
| config = Config() | ||
|
|
||
| query_object = ( | ||
| select(aa_articles) | ||
| .where(aa_articles.c.ARTICLE_ID.is_not(None)) | ||
| .where(aa_articles.c.ARTICLE_TITLE.is_not(None)) | ||
| .where(aa_articles.c.DOI.is_not(None)) | ||
| .where(aa_articles.c.MIT_ID.is_not(None)) | ||
| ) | ||
|
|
||
| if config.ARTICLES_PUBLISH_DAYS_PAST: | ||
| query_object = query_object.where( | ||
| text( | ||
| "TO_DATE(PUBLISH_DATE, 'MM/DD/YYYY') >= " | ||
| f"SYSDATE - {int(config.ARTICLES_PUBLISH_DAYS_PAST)}" | ||
| ) | ||
| ) | ||
|
|
||
| return query_object | ||
|
|
||
| def _add_element(self, record: dict[str, Any]) -> ET._Element: | ||
| """Create an XML element representing an article. | ||
|
|
@@ -163,119 +186,121 @@ class PeopleXmlFeed(BaseXmlFeed): | |
| attribute of the root 'records' element when serialized. | ||
| """ | ||
|
|
||
| areas: tuple[str, ...] = ( | ||
| "ARCHITECTURE & PLANNING AREA", | ||
| "ENGINEERING AREA", | ||
| "HUMANITIES, ARTS, & SOCIAL SCIENCES AREA", | ||
| "SCIENCE AREA", | ||
| "SLOAN SCHOOL OF MANAGEMENT AREA", | ||
| "VP RESEARCH", | ||
| "CHANCELLOR'S AREA", | ||
| "OFFICE OF PROVOST AREA", | ||
| "PROVOST AREA", | ||
| ) | ||
| ps_codes: tuple[str, ...] = ( | ||
| "CFAN", | ||
| "CFAT", | ||
| "CFEL", | ||
| "CSRS", | ||
| "CSRR", | ||
| "COAC", | ||
| "COAR", | ||
| "L303", | ||
| ) | ||
| titles: tuple[str, ...] = ( | ||
| "ADJUNCT ASSOCIATE PROFESSOR", | ||
| "ADJUNCT PROFESSOR", | ||
| "AFFILIATED ARTIST", | ||
| "ASSISTANT PROFESSOR", | ||
| "ASSOCIATE PROFESSOR", | ||
| "ASSOCIATE PROFESSOR (NOTT)", | ||
| "ASSOCIATE PROFESSOR (WOT)", | ||
| "ASSOCIATE PROFESSOR OF THE PRACTICE", | ||
| "INSTITUTE OFFICIAL - EMERITUS", | ||
| "INSTITUTE PROFESSOR (WOT)", | ||
| "INSTITUTE PROFESSOR EMERITUS", | ||
| "INSTRUCTOR", | ||
| "LECTURER", | ||
| "LECTURER II", | ||
| "POSTDOCTORAL ASSOCIATE", | ||
| "POSTDOCTORAL FELLOW", | ||
| "PRINCIPAL RESEARCH ASSOCIATE", | ||
| "PRINCIPAL RESEARCH ENGINEER", | ||
| "PRINCIPAL RESEARCH SCIENTIST", | ||
| "PROFESSOR", | ||
| "PROFESSOR (NOTT)", | ||
| "PROFESSOR (WOT)", | ||
| "PROFESSOR EMERITUS", | ||
| "PROFESSOR OF THE PRACTICE", | ||
| "RESEARCH ASSOCIATE", | ||
| "RESEARCH ENGINEER", | ||
| "RESEARCH FELLOW", | ||
| "RESEARCH SCIENTIST", | ||
| "RESEARCH SPECIALIST", | ||
| "SENIOR LECTURER", | ||
| "SENIOR POSTDOCTORAL ASSOCIATE", | ||
| "SENIOR POSTDOCTORAL FELLOW", | ||
| "SENIOR RESEARCH ASSOCIATE", | ||
| "SENIOR RESEARCH ENGINEER", | ||
| "SENIOR RESEARCH SCIENTIST", | ||
| "SENIOR RESEARCH SCIENTIST (MAP)", | ||
| "SPONSORED RESEARCH TECHNICAL STAFF", | ||
| "SPONSORED RESEARCH TECHNICAL SUPERVISOR", | ||
| "STAFF AFFILIATE", | ||
| "TECHNICAL ASSISTANT", | ||
| "TECHNICAL ASSOCIATE", | ||
| "VISITING ASSISTANT PROFESSOR", | ||
| "VISITING ASSOCIATE PROFESSOR", | ||
| "VISITING ENGINEER", | ||
| "VISITING LECTURER", | ||
| "VISITING PROFESSOR", | ||
| "VISITING RESEARCH ASSOCIATE", | ||
| "VISITING SCHOLAR", | ||
| "VISITING SCIENTIST", | ||
| "VISITING SENIOR LECTURER", | ||
| "PART-TIME FLEXIBLE/LL", | ||
| ) | ||
|
|
||
| symplectic_elements_namespace: str = "http://www.symplectic.co.uk/hrimporter" | ||
| namespace_mapping: ClassVar[dict] = {None: symplectic_elements_namespace} | ||
|
|
||
| root_element_name: str = str(ET.QName(symplectic_elements_namespace, tag="records")) | ||
| query = ( | ||
| select( | ||
| persons.c.MIT_ID, | ||
| persons.c.KRB_NAME_UPPERCASE, | ||
| persons.c.FIRST_NAME, | ||
| persons.c.MIDDLE_NAME, | ||
| persons.c.LAST_NAME, | ||
| persons.c.EMAIL_ADDRESS, | ||
| persons.c.DATE_TO_FACULTY, | ||
| persons.c.ORIGINAL_HIRE_DATE, | ||
| dlcs.c.DLC_NAME, | ||
| persons.c.PERSONNEL_SUBAREA_CODE, | ||
| persons.c.APPOINTMENT_END_DATE, | ||
| orcids.c.ORCID, | ||
| dlcs.c.ORG_HIER_SCHOOL_AREA_NAME, | ||
| dlcs.c.HR_ORG_LEVEL5_NAME, | ||
|
|
||
| @property | ||
| def query(self) -> Select: # type: ignore[override] | ||
|
Comment on lines
+193
to
+194
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved this into a |
||
| areas: tuple[str, ...] = ( | ||
| "ARCHITECTURE & PLANNING AREA", | ||
| "ENGINEERING AREA", | ||
| "HUMANITIES, ARTS, & SOCIAL SCIENCES AREA", | ||
| "SCIENCE AREA", | ||
| "SLOAN SCHOOL OF MANAGEMENT AREA", | ||
| "VP RESEARCH", | ||
| "CHANCELLOR'S AREA", | ||
| "OFFICE OF PROVOST AREA", | ||
| "PROVOST AREA", | ||
| ) | ||
| .select_from(persons) | ||
| .outerjoin(orcids) | ||
| .join(dlcs) | ||
| .where(persons.c.EMAIL_ADDRESS.is_not(None)) | ||
| .where(persons.c.LAST_NAME.is_not(None)) | ||
| .where(persons.c.KRB_NAME_UPPERCASE.is_not(None)) | ||
| .where(persons.c.KRB_NAME_UPPERCASE != "UNKNOWN") | ||
| .where(persons.c.MIT_ID.is_not(None)) | ||
| .where(persons.c.ORIGINAL_HIRE_DATE.is_not(None)) | ||
| .where( | ||
| persons.c.APPOINTMENT_END_DATE # noqa: SIM300 | ||
| >= datetime(2009, 1, 1) # noqa: DTZ001 | ||
| ps_codes: tuple[str, ...] = ( | ||
| "CFAN", | ||
| "CFAT", | ||
| "CFEL", | ||
| "CSRS", | ||
| "CSRR", | ||
| "COAC", | ||
| "COAR", | ||
| "L303", | ||
| ) | ||
| titles: tuple[str, ...] = ( | ||
| "ADJUNCT ASSOCIATE PROFESSOR", | ||
| "ADJUNCT PROFESSOR", | ||
| "AFFILIATED ARTIST", | ||
| "ASSISTANT PROFESSOR", | ||
| "ASSOCIATE PROFESSOR", | ||
| "ASSOCIATE PROFESSOR (NOTT)", | ||
| "ASSOCIATE PROFESSOR (WOT)", | ||
| "ASSOCIATE PROFESSOR OF THE PRACTICE", | ||
| "INSTITUTE OFFICIAL - EMERITUS", | ||
| "INSTITUTE PROFESSOR (WOT)", | ||
| "INSTITUTE PROFESSOR EMERITUS", | ||
| "INSTRUCTOR", | ||
| "LECTURER", | ||
| "LECTURER II", | ||
| "POSTDOCTORAL ASSOCIATE", | ||
| "POSTDOCTORAL FELLOW", | ||
| "PRINCIPAL RESEARCH ASSOCIATE", | ||
| "PRINCIPAL RESEARCH ENGINEER", | ||
| "PRINCIPAL RESEARCH SCIENTIST", | ||
| "PROFESSOR", | ||
| "PROFESSOR (NOTT)", | ||
| "PROFESSOR (WOT)", | ||
| "PROFESSOR EMERITUS", | ||
| "PROFESSOR OF THE PRACTICE", | ||
| "RESEARCH ASSOCIATE", | ||
| "RESEARCH ENGINEER", | ||
| "RESEARCH FELLOW", | ||
| "RESEARCH SCIENTIST", | ||
| "RESEARCH SPECIALIST", | ||
| "SENIOR LECTURER", | ||
| "SENIOR POSTDOCTORAL ASSOCIATE", | ||
| "SENIOR POSTDOCTORAL FELLOW", | ||
| "SENIOR RESEARCH ASSOCIATE", | ||
| "SENIOR RESEARCH ENGINEER", | ||
| "SENIOR RESEARCH SCIENTIST", | ||
| "SENIOR RESEARCH SCIENTIST (MAP)", | ||
| "SPONSORED RESEARCH TECHNICAL STAFF", | ||
| "SPONSORED RESEARCH TECHNICAL SUPERVISOR", | ||
| "STAFF AFFILIATE", | ||
| "TECHNICAL ASSISTANT", | ||
| "TECHNICAL ASSOCIATE", | ||
| "VISITING ASSISTANT PROFESSOR", | ||
| "VISITING ASSOCIATE PROFESSOR", | ||
| "VISITING ENGINEER", | ||
| "VISITING LECTURER", | ||
| "VISITING PROFESSOR", | ||
| "VISITING RESEARCH ASSOCIATE", | ||
| "VISITING SCHOLAR", | ||
| "VISITING SCIENTIST", | ||
| "VISITING SENIOR LECTURER", | ||
| "PART-TIME FLEXIBLE/LL", | ||
| ) | ||
|
|
||
| return ( | ||
| select( | ||
| persons.c.MIT_ID, | ||
| persons.c.KRB_NAME_UPPERCASE, | ||
| persons.c.FIRST_NAME, | ||
| persons.c.MIDDLE_NAME, | ||
| persons.c.LAST_NAME, | ||
| persons.c.EMAIL_ADDRESS, | ||
| persons.c.DATE_TO_FACULTY, | ||
| persons.c.ORIGINAL_HIRE_DATE, | ||
| dlcs.c.DLC_NAME, | ||
| persons.c.PERSONNEL_SUBAREA_CODE, | ||
| persons.c.APPOINTMENT_END_DATE, | ||
| orcids.c.ORCID, | ||
| dlcs.c.ORG_HIER_SCHOOL_AREA_NAME, | ||
| dlcs.c.HR_ORG_LEVEL5_NAME, | ||
| ) | ||
| .select_from(persons) | ||
| .outerjoin(orcids) | ||
| .join(dlcs) | ||
| .where(persons.c.EMAIL_ADDRESS.is_not(None)) | ||
| .where(persons.c.LAST_NAME.is_not(None)) | ||
| .where(persons.c.KRB_NAME_UPPERCASE.is_not(None)) | ||
| .where(persons.c.KRB_NAME_UPPERCASE != "UNKNOWN") | ||
| .where(persons.c.MIT_ID.is_not(None)) | ||
| .where(persons.c.ORIGINAL_HIRE_DATE.is_not(None)) | ||
| .where( | ||
| persons.c.APPOINTMENT_END_DATE # noqa: SIM300 | ||
| >= datetime(2009, 1, 1) # noqa: DTZ001 | ||
| ) | ||
| .where(func.upper(dlcs.c.ORG_HIER_SCHOOL_AREA_NAME).in_(areas)) | ||
| .where(persons.c.PERSONNEL_SUBAREA_CODE.in_(ps_codes)) | ||
| .where(func.upper(persons.c.JOB_TITLE).in_(titles)) | ||
| ) | ||
| .where(func.upper(dlcs.c.ORG_HIER_SCHOOL_AREA_NAME).in_(areas)) | ||
| .where(persons.c.PERSONNEL_SUBAREA_CODE.in_(ps_codes)) | ||
| .where(func.upper(persons.c.JOB_TITLE).in_(titles)) | ||
| ) | ||
|
|
||
| def _add_element(self, record: dict[str, Any]) -> ET._Element: | ||
| """Create an XML element representing a person. | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving into a
@propertyallows for two things:Config()object after the testing harness and env vars are setupThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what do you mean by "testing harness"? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could have been more clear.
We set env vars in
conftest.pyas part of the_test_envfixture, but these aren't set until after imports take place in our files.Originally, I had a:
at the top of
feed.py, but this failed because the "testing harness" -- which includes the fixtures and generally anything else you'd expect to be "ready" for testing -- was not fully ready, and so the required env vars weren't set.It worked locally when I had env vars set, and it would have worked in prod where they are also set, but not for testing.