Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
62b9070
Create rvdss_historic.py
cchuong Sep 13, 2024
073aac9
Create rvdss_update.py
cchuong Sep 13, 2024
01af95f
create utils.py for common functions
cchuong Sep 14, 2024
6a002e0
create constants.py and update utils
cchuong Sep 16, 2024
714455c
Update rvdss_historic.py
cchuong Sep 16, 2024
6ee8bb7
Update rvdss_update.py
cchuong Sep 16, 2024
8814554
fix typo and add missing abbreviation to constants
cchuong Sep 17, 2024
d7905c8
fix typo
cchuong Sep 18, 2024
08f908a
add missing geo
cchuong Sep 18, 2024
07ed998
Update constants.py
cchuong Sep 18, 2024
fd5bf15
Revert "Update constants.py"
cchuong Sep 18, 2024
678b468
Revert "add missing geo"
cchuong Sep 18, 2024
4bfc933
fix geo and virus abbreviation
cchuong Sep 20, 2024
e8957c3
remove "province of" from geo_values
cchuong Sep 20, 2024
7720a24
construct urls automatically
nmdefries Sep 23, 2024
59f79bf
comment constants
nmdefries Sep 23, 2024
e70b0e9
note historic urls don't need to be updated
nmdefries Sep 23, 2024
72d1906
be stricter about importing local fns
nmdefries Sep 23, 2024
bf51bd3
move dashboard file names to constants
nmdefries Sep 23, 2024
ee3cadf
move run-the-whole-pipeline code into main()
nmdefries Sep 23, 2024
180e67f
add code to calculate number of positive tests back in
cchuong Sep 24, 2024
6bd6e24
update abbreviate_geo to remove periods and other spelling
cchuong Sep 24, 2024
a7666b8
fix lab name missing province
cchuong Sep 24, 2024
503165e
comment historic script
nmdefries Sep 24, 2024
256e697
move output file names to constants
nmdefries Sep 24, 2024
cd83087
replace boolean comparisons with pythonic "not"
nmdefries Sep 24, 2024
969295b
actually put csv names in constants
nmdefries Sep 25, 2024
00f3f9a
break more helper functions and add doctsrings
cchuong Oct 2, 2024
ecca542
add more comments
cchuong Oct 4, 2024
31ec961
calculate update dates in a new function
cchuong Oct 10, 2024
0be5f08
combine different spellings of labs
cchuong Oct 13, 2024
5696636
change slash to underscore in constants and move more processing code…
cchuong Oct 13, 2024
30f3df6
rvdss interface and new fn layout so current/historical data can be e…
nmdefries Nov 22, 2024
49e67a9
test outline
nmdefries Nov 22, 2024
b0fa747
add regex as dependency
nmdefries Nov 22, 2024
e6d9053
add unidecode as dependency
nmdefries Nov 22, 2024
c7a4203
import relative to delphi.epidata
nmdefries Nov 22, 2024
8c6b555
switch all rvdss tests from unittest to pytest; basic abbr virus tests
nmdefries Nov 22, 2024
c0742fc
Update test_utils.py
cchuong Jan 24, 2025
b2e5013
Add tests and testdata
cchuong Mar 7, 2025
6353b18
Add utils tests and move test data
cchuong Apr 2, 2025
e503e97
Add testdata of historic reports
cchuong Apr 4, 2025
5662a79
Add additional tests and testdata
cchuong Apr 24, 2025
22c8589
Update sql table definitions and add extra na values to historic data…
cchuong Apr 25, 2025
9d22911
Add extra values that should be read as NA and counts with spaces in …
cchuong Apr 25, 2025
901b2fe
update sql keys
cchuong Apr 25, 2025
1214e70
Update pull_historic.py
cchuong Apr 25, 2025
4e5528b
Update rvdss.sql
cchuong May 8, 2025
908df83
remove unused code and table definition
cchuong May 13, 2025
02de2eb
add extra edge cases, and duplicate checking
cchuong May 15, 2025
dc14c71
remove saving to csv
cchuong May 15, 2025
4d283a6
remove scripts for manual testing
cchuong May 15, 2025
58a1478
add extra duplication checks and check if tables exists
cchuong May 16, 2025
f96d926
add logger
cchuong May 25, 2025
7cf0116
add basic integration tests
cchuong May 30, 2025
38bdd50
Combine multiple tables into one
cchuong Jul 7, 2025
5ba4fd5
stop scraping unused table
cchuong Jul 8, 2025
277c44b
skeleton integration tests
cchuong Jul 18, 2025
4e5a051
Merge pull request #1561 from cmu-delphi/ndefries/rvdss-tests
cchuong Jul 18, 2025
f1fde56
Check for duplicate indexes and use merge to combine tables
cchuong Jul 19, 2025
914862f
test extra edge case
cchuong Jul 19, 2025
ffba810
Clean up old comments
cchuong Jul 19, 2025
6932653
cleanup old code to do with unused tables
cchuong Jul 19, 2025
a482227
update tests for new table structure
cchuong Jul 19, 2025
48a1b5a
Update unit tests and remove unused functions/tests
cchuong Aug 1, 2025
9641f02
Update functions to combine tables
cchuong Aug 1, 2025
d757ed5
Update unit tests to check dtype and column order
cchuong Aug 3, 2025
c2389bd
historic report data is a single dict not a list
cchuong Aug 7, 2025
cf49a54
pass both years in the season instead of the startyear
cchuong Aug 7, 2025
34a6a20
Remove unused variables and move logger
cchuong Aug 27, 2025
cf6c95a
Change http to https
cchuong Aug 27, 2025
df8c38d
Merge branch 'dev' into add_rvdss_indicator
nmdefries Aug 28, 2025
9368b63
add RVDSS endpoint and python client support
nmdefries Aug 28, 2025
d44a079
Fix 2012 being used instead of 2013 in the first two weeks of the 201…
cchuong Aug 29, 2025
47f5210
Update patch function to take multiple seasons
cchuong Sep 5, 2025
2563cd4
Add extra NA string and fix combining table function
cchuong Sep 5, 2025
7b70f5f
review feedback. simplify and clean up rvdss server code
nmdefries Sep 16, 2025
35f04ea
export rvdss endpoint
nmdefries Sep 16, 2025
7de9698
update actions/cache version
nmdefries Sep 16, 2025
39c3fc0
Add more edge cases, fix incorrect time_values
cchuong Sep 18, 2025
b2a8053
Convert epiweek and date types
cchuong Sep 18, 2025
a4dd6b1
Insert columns only if they are in the data being inserted
cchuong Sep 25, 2025
aeae17e
Fix geo_type typo
cchuong Sep 25, 2025
ecadfc4
Add time_type and fix typo
cchuong Sep 25, 2025
41a1f52
Combine different lab names that refer to the same lab
cchuong Oct 3, 2025
e39e671
Update categorization of geo_types
cchuong Oct 3, 2025
24c6b9f
Update test data and tests to reflect new geo_type labelling
cchuong Oct 3, 2025
f137320
Merge branch 'ndefries/rvdss-endpoint' into rvdss-integration-tests
cchuong Oct 3, 2025
9483093
add integration test skeletons
cchuong Oct 17, 2025
924e788
move alias to variable
nmdefries Oct 22, 2025
704bde0
remove not null constraint from val columns and switch time_value and…
nmdefries Oct 22, 2025
68f1735
typo in fake credentials
nmdefries Oct 22, 2025
79f7429
switch rvdss to filter on epiweek col instead of time_value
nmdefries Oct 22, 2025
9fc8343
change tests and percent positive to floats and reorder columns
cchuong Oct 24, 2025
61a199a
Change issue and time_value to integers instead of dates, and updated…
cchuong Oct 31, 2025
41b6fc7
change data to dict and update integration tests
cchuong Nov 5, 2025
6b272b6
fix integration test connection mock
nmdefries Nov 7, 2025
a12f696
replace nan with None to match expected output
cchuong Nov 26, 2025
9c07c8a
replace nans with Nones. reset_index to get index cols into SQL
nmdefries Dec 5, 2025
9425d52
Add tests requesting lists and single values of parameters
cchuong Dec 18, 2025
d531cec
remove unused code
cchuong Dec 19, 2025
a40b3ee
Merge pull request #1700 from cmu-delphi/rvdss-integration-tests
nmdefries Dec 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
docker network create --driver bridge delphi-net
docker run --rm -d -p 13306:3306 --network delphi-net --name delphi_database_epidata --cap-add=sys_nice delphi_database_epidata
docker run --rm -d -p 6379:6379 --network delphi-net --env "REDIS_PASSWORD=1234" --name delphi_redis delphi_redis


- run: |
wget https://raw.githubusercontent.com/eficode/wait-for/master/wait-for
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def setUp(self):
cur.execute('truncate table covid_hosp_facility_key')
cur.execute('truncate table covid_hosp_meta')
cur.execute('delete from api_user')
cur.execute('insert into api_user(api_key, email) values ("key", "emai")')
cur.execute('insert into api_user(api_key, email) values ("key", "email")')

@freeze_time("2021-03-16")
def test_acquire_dataset(self):
Expand Down
4 changes: 4 additions & 0 deletions integrations/acquisition/rvdss/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import sys
import os

sys.path.append(os.getcwd())
221 changes: 221 additions & 0 deletions integrations/acquisition/rvdss/test_scenarios.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
"""Integration tests for acquisition of rvdss data."""
# standard library
import unittest
from unittest.mock import MagicMock, patch
from copy import copy

# first party
from delphi.epidata.client.delphi_epidata import Epidata
from delphi.epidata.acquisition.rvdss.database import update, rvdss_cols, get_num_rows
import delphi.operations.secrets as secrets
from delphi_utils import get_structured_logger

# third party
import mysql.connector
from mysql.connector.errors import IntegrityError
import pandas as pd
import numpy as np
from pathlib import Path
import pdb

# py3tester coverage target (equivalent to `import *`)
# __test_target__ = 'delphi.epidata.acquisition.covid_hosp.facility.update'

NEWLINE="\n"

class AcquisitionTests(unittest.TestCase):
logger = get_structured_logger()

def setUp(self):
"""Perform per-test setup."""

# configure test data
# self.test_utils = UnitTestUtils(__file__)

# use the local instance of the Epidata API
Epidata.BASE_URL = 'http://delphi_web_epidata/epidata'
Epidata.auth = ('epidata', 'key')

# use the local instance of the epidata database
secrets.db.host = 'delphi_database_epidata'
secrets.db.epi = ('user', 'pass')

# clear relevant tables
epidata_cnx = mysql.connector.connect(
user='user',
password='pass',
host='delphi_database_epidata',
database='epidata')
epidata_cur = epidata_cnx.cursor()

epidata_cur.execute('truncate table rvdss')
epidata_cur.execute('DELETE from api_user')
epidata_cur.execute('INSERT INTO api_user(api_key, email) VALUES ("key", "email")')
epidata_cnx.commit()
epidata_cur.close()
#epidata_cnx.close()

# make connection and cursor available to test cases
self.cnx = epidata_cnx
self.cur = epidata_cnx.cursor()

def tearDown(self):
"""Perform per-test teardown."""
self.cur.close()
self.cnx.close()

@patch("mysql.connector.connect")
def test_rvdss_repiratory_detections(self, mock_sql):
connection_mock = MagicMock()

TEST_DIR = Path(__file__).parent.parent.parent.parent
detection_data = pd.read_csv(str(TEST_DIR) + "/testdata/acquisition/rvdss/RVD_CurrentWeekTable_Formatted.csv")
detection_data['time_type'] = "week"

# get the index of the subset of data we want to use
subset_index = detection_data[(detection_data['geo_value'].isin(['nl', 'nb'])) &
(detection_data['time_value'].isin([20240831, 20240907]))].index


# change issue so the data has more than one
detection_data.loc[subset_index,"issue"] = 20250227

# take a small subset just for testing insertion
detection_subset = detection_data.loc[subset_index]

# get the expected response when calling the API
# the dataframe needs to add the missing columns and replace nan with None
# since that is what is returned from the API
df = detection_subset.reindex(rvdss_cols,axis=1)
df = df.replace({np.nan: None}).sort_values(by=["epiweek","geo_value"])
df = df.to_dict(orient = "records")

expected_response = {"epidata": df,
"result": 1,
"message": "success",
}

# get another subset of the data not in the subset to test more calling options
detection_subset2 = detection_data[(detection_data['geo_value'].isin(['nu', 'nt'])) & (detection_data['time_value'].isin([20240831, 20240907])) ]

df2 = detection_subset2.reindex(rvdss_cols,axis=1)
df2 = df2.replace({np.nan: None}).sort_values(by=["epiweek","geo_value"])
df2 = df2.to_dict(orient = "records")

expected_response2 = {"epidata": df2,
"result": 1,
"message": "success",
}

# get another subset of the data for a single geo_value with multiple issues
subset_index2 = detection_data[(detection_data['geo_value'].isin(['ouest du québec'])) &
(detection_data['time_value'].isin([20240831, 20240907]))].index

detection_data.loc[subset_index2,"issue"] = [20250220,20250227]
detection_data.loc[subset_index2,"epiweek"] = [202435,202435]
detection_data.loc[subset_index2,"time_value"] = [20240831,20240831]

detection_subset3 = detection_data.loc[subset_index2]
df3 = detection_subset3.reindex(rvdss_cols,axis=1)
df3 = df3.replace({np.nan: None}).sort_values(by=["epiweek","geo_value"])
df3 = df3.to_dict(orient = "records")

expected_response3 = {"epidata": df3,
"result": 1,
"message": "success",
}

# make sure the data does not yet exist
with self.subTest(name='no data yet'):
response = Epidata.rvdss(geo_type='province',
time_values= [202435, 202436],
geo_value = ['nl','nb'])
self.assertEqual(response['result'], -2, response)

# acquire sample data into local database
with self.subTest(name='first acquisition'):
# When the MagicMock connection's `cursor()` method is called, return
# a real cursor made from the current open connection `cnx`.
connection_mock.cursor.return_value = self.cnx.cursor()
# Commit via the current open connection `cnx`, from which the cursor
# is derived
connection_mock.commit = self.cnx.commit
mock_sql.return_value = connection_mock

update(detection_subset, self.logger)

response = Epidata.rvdss(geo_type='province',
time_values= [202435, 202436],
geo_value = ['nl','nb'])

self.assertEqual(response,expected_response)

with self.subTest(name='duplicate aquisition'):
# The main run function checks if the update has already been fetched/updated
# so it should never run twice, and duplocate aquisitions should never
# occur. Running the update twice will result in an error

# When the MagicMock connection's `cursor()` method is called, return
# a real cursor made from the current open connection `cnx`.
connection_mock.cursor.return_value = self.cnx.cursor()
# Commit via the current open connection `cnx`, from which the cursor
# is derived
connection_mock.commit = self.cnx.commit
mock_sql.return_value = connection_mock

with self.assertRaises(mysql.connector.errors.IntegrityError):
update(detection_subset, self.logger)

# Request with exact column order
with self.subTest(name='exact column order'):
rvdss_cols_subset = [col for col in detection_subset2.columns if col in rvdss_cols]
ordered_cols = [col for col in rvdss_cols if col in rvdss_cols_subset]
ordered_df = detection_subset2[ordered_cols]

connection_mock.cursor.return_value = self.cnx.cursor()
connection_mock.commit = self.cnx.commit
mock_sql.return_value = connection_mock

update(ordered_df, self.logger)

response = Epidata.rvdss(geo_type='province',
time_values= [202435, 202436],
geo_value = ['nt','nu'])

self.assertEqual(response,expected_response2)


# request by issue
with self.subTest(name='issue request'):
response = Epidata.rvdss(geo_type='province',
time_values= [202435, 202436],
geo_value = ['nl','nb'],
issues = 20250227)

self.assertEqual(response,expected_response)


# check requesting lists vs single values
with self.subTest(name='duplicate aquisition'):
# * with geo_value, single geo_type, time_value, issue
connection_mock.cursor.return_value = self.cnx.cursor()
connection_mock.commit = self.cnx.commit
mock_sql.return_value = connection_mock

update(detection_subset3, self.logger)

response = Epidata.rvdss(geo_type='province',
time_values= [202435, 202436],
geo_value = "*",
issues = 20250227)

response2 = Epidata.rvdss(geo_type='lab',
time_values= 202435,
geo_value = 'ouest du québec',
issues = [20250220,20250227])

self.assertEqual(response,expected_response)
self.assertEqual(response2,expected_response3)



74 changes: 74 additions & 0 deletions integrations/server/test_rvdss.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# first party
from delphi.epidata.common.integration_test_base_class import DelphiTestBase


class rvdssTest(DelphiTestBase):
"""Basic integration tests for rvdss endpoint."""

def localSetUp(self):
self.truncate_tables_list = ["rvdss"]

def test_rvdss_repiratory_detections(self):
"""Basic integration test for rvdss endpoint"""
self.cur.execute(
"INSERT INTO `rvdss` (`epiweek`, `time_value`,`time_type`, `issue`, `geo_type`, `geo_value`, `sarscov2_tests`, `sarscov2_positive_tests`, `sarscov2_pct_positive`, `flu_tests`, `flu_positive_tests`, `flu_pct_positive`, `fluah1n1pdm09_positive_tests`, `fluah3_positive_tests`, `fluauns_positive_tests`, `flua_positive_tests`, `flua_tests`, `flua_pct_positive`, `flub_positive_tests`, `flub_tests`, `flub_pct_positive`, `rsv_tests`, `rsv_positive_tests`, `rsv_pct_positive`, `hpiv_tests`, `hpiv1_positive_tests`, `hpiv2_positive_tests`, `hpiv3_positive_tests`, `hpiv4_positive_tests`, `hpivother_positive_tests`, `hpiv_positive_tests`, `hpiv_pct_positive`, `adv_tests`, `adv_positive_tests`, `adv_pct_positive`, `hmpv_tests`, `hmpv_positive_tests`, `hmpv_pct_positive`, `evrv_tests`, `evrv_positive_tests`, `evrv_pct_positive`, `hcov_tests`, `hcov_positive_tests`, `hcov_pct_positive`, `year`) VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",
(201212,20120412,"week",20120417,'province','on',10.0,1.0,10.0,20.0, 10.0, 50.0, 1.0, 1.0, 1.0, 5.0, 20.0, 25.0, 4.0, 20.0, 20.0, 30.0, 3.0, 10.0, 40.0, 2.0, 2.0, 2.0, 1.0, 1.0, 8.0, 20.0, 40.0, 16.0, 40.0, 10.0, 2.0, 20.0, 1.0, 0.0, 0.0, 24.0, 3.0, 12.5, 2012),
)
self.cnx.commit()

response = self.epidata_client.rvdss(geo_type="province", time_values = 201212,geo_value="on")
self.assertEqual(
response,
{
"epidata": [
{ "geo_type":"province",
"geo_value":"on",
"time_type":"week",
"epiweek":201212,
"time_value":20120412,
"issue":20120417,
"year":2012,
"adv_pct_positive":40.0,
"adv_positive_tests":16.0,
"adv_tests":40.0,
"evrv_pct_positive":0.0,
"evrv_positive_tests":0.0,
"evrv_tests":1.0,
"flu_pct_positive":50.0,
"flu_positive_tests":10.0,
"flu_tests":20.0,
"flua_pct_positive":25.0,
"flua_positive_tests":5.0,
"flua_tests":20.0,
"fluah1n1pdm09_positive_tests":1.0,
"fluah3_positive_tests":1.0,
"fluauns_positive_tests":1.0,
"flub_pct_positive":20.0,
"flub_positive_tests":4.0,
"flub_tests":20.0,
"hcov_pct_positive":12.5,
"hcov_positive_tests":3.0,
"hcov_tests":24.0,
"hmpv_pct_positive":20.0,
"hmpv_positive_tests":2.0,
"hmpv_tests":10.0,
"hpiv1_positive_tests":2.0,
"hpiv2_positive_tests":2.0,
"hpiv3_positive_tests":2.0,
"hpiv4_positive_tests":1.0,
"hpiv_pct_positive":20.0,
"hpiv_positive_tests":8.0,
"hpiv_tests":40.0,
"hpivother_positive_tests":1.0,
"rsv_pct_positive":10.0,
"rsv_positive_tests":3.0,
"rsv_tests":30.0,
"sarscov2_pct_positive":10.0,
"sarscov2_positive_tests":1.0,
"sarscov2_tests":10.0
}
],
"result": 1,
"message": "success",
},
)
2 changes: 2 additions & 0 deletions requirements.api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@ pandas==1.2.3
python-dotenv==0.15.0
pyyaml
redis==3.5.3
regex
requests==2.32.4
scipy==1.10.0
sentry-sdk[flask]
SQLAlchemy==1.4.40
structlog==22.1.0
tenacity==7.0.0
typing-extensions
unidecode
werkzeug==3.0.6
3 changes: 3 additions & 0 deletions requirements.dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,6 @@ selenium==4.7.2
sqlalchemy-stubs>=0.3
tenacity==7.0.0
xlrd==2.0.1
bs4
mock
requests_file
Loading
Loading