This project processes clinical data from https://clinicaltrials.gov/, specifically focusing on returning estimated or actual enrollment for studies. It can also search for studies within a specific date range. This project can be used as a tool for analyzing trends, assessing study participation, and gaining insights into the scale of clinical trials over time. The project was developed using PyCharm.
- Import all necessary libraries (
BeautifulSoup,webdriver,pandas). - Create a list of study IDs containing the specific identifiers for each study (e.g., 'NCT01714739', 'NCT05844007').
- Set up Chrome WebDriver for headless browsing.
- Loop through each
nct_idin the study IDs list. - For each ID, generate the specific URL and use Selenium to load the page.
- Parse the HTML page with BeautifulSoup to find the section about enrollment information.
- Extract and return the estimated/actual enrollment.
- Save the data to an Excel file (
output.xlsx). - In the Excel sheet, add a column that calculates the difference between the actual and estimated enrollment.
- Export the final Excel file.
-
Download our code as a zip archive or clone our github repository.
-
Find the folder where the code is stored in File Explorer (look for a folder called clinical-trials and contains this file) once found click on the search bar on the top press ctrl + c to copy the path to the repository (it may start with
C:). -
Type these commands including any spaces but not including brackets, and replace
[PathToRepository]with the path you identified previously."[NameOfAuthor]" [StartDate] [EndDate]with the name of the author, and a date range (e.g.,"Bristol-Myers" 01/01/2023 06/12/2024)
cd [PathToRepository] # this takes power shell to where the files you want to run are stored
python -m venv venv # this opens up a virtual environment
Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope Process # this allow you to run scripts the -scope process part means the execution policy will one be unresticted for the instance of powershell meaning when you close out or open a new tab in powershell it will return to normal
.\venv\Scripts\activate #now you will be running scripts
py -m pip install --upgrade pip # if it says requirment already satisified after typeing this that is fine
py -m pip install -r requirements.txt # installs requirments for program
py -i cts_scraping.py "[NameOfAuthor]" [StartDate] [EndDate] # this will actually run the program This last command may take some time to process, up to several minutes
- Open the output file in the clinical trials folder and it should have all the data filled in.
python3 -m venv venv
venv/bin/pip install -r requirements.txt
venv/bin/python -i cts_scraping.py "Bristol-Myers" 01/01/2024 06/12/2024