-
Notifications
You must be signed in to change notification settings - Fork 0
05 participants #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
05 participants #26
Conversation
…ter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit. Updating branch to match latest state of project.
neverett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some feedback, and i assume you'll be adding to & cleaning up the testing code a bit more, but overall this looks great, ty!
tasks/05_participants/task.py
Outdated
| AND c.participants_raw <> '' | ||
| AND e.participants_parse_error IS NULL | ||
| OR e.participants_parse_error = true | ||
| limit 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this limit in place for testing? if so, you'll want to remove it before merging
tasks/05_participants/task.py
Outdated
| c = cnx.cursor() | ||
| c.execute("select count(*) from pages;") | ||
| t = time.time() - t1 | ||
| part_rate = round((n - c.rowcount) / t, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlite3 connection objects don't have a rowcount attribute, so you'll need to check the db type here, similar to line 33 above
| def html_raw_participants(html_str: str) -> list: | ||
| """ | ||
| Reads in an html string from the `raw_text` column in the `pages` table, | ||
| finds the participants table, collects the rows in the table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor suggestion, but you may want to distinguish between database and HTML tables in comments for clarity, e.g. "Reads in an html string from the raw_text column in the pages database table, finds the participants HTML table in the string, collects the rows in the HTML table...."
| def add_participant_row(case_id: int, r: list): | ||
| # insert relevant info to participants table in the db | ||
| try: | ||
| if db_config.db_type == "sqlite": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if/elif block that sets the query can be moved above and outside of the try/except block, since it's unlikely to cause an exception
tasks/05_participants/task.py
Outdated
|
|
||
| def main(): | ||
| participants_query = """ | ||
| SELECT c.id as case_id, c.case_number, c.participants_raw, e.participants_parse_error, p.raw_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putting a note in after our conversation today: we probably only need the raw_text column from the pages column, so references to the cases table and participants_raw column can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up changing this query. Instead of referencing the cases table, it now selects only from the pages table where the html in the raw_text column contains a participants table element.
|
I've made some revisions to most aspects of the task, especially in the testing process. I ran this on the full
Here's a subsample of the |
A draft for the task in #4 . Three main places need attention:
task.pyis not currently using the error table implemented in Move error logging out of cases table #22 .