Status: 🚧 Work in Progress
This project aims to build a non-partisan, data-driven election prediction model using historical polling data. The goal is to use data from sources like FiveThirtyEight to train machine learning models that can predict election outcomes, starting with the 2024 U.S. presidential election.
- Experiment with more complex machine learning models
- Implement election_model.py
- fininsh modeling.py
- Implement cross-validation and other evaluation techniques
- Create a user interface to visulize predictions
- [] Develop documentation and contribution guidelines.
The project is broken down into several key steps:
-
Data Collection:
- I used polling data from FiveThirtyEight ( You can find that data yourself Here) to collect historical polling information for different elections. (NOTE: I will be adding more resources as time goes on. FiveThirtyEight is just a jumping off point.)
- The raw data is stored in a CSV file, which is then cleaned and processed.
-
Data Cleaning:
- Unnecessary columns are removed, missing values are handled, and data types are converted.
- The cleaned data is stored in a separate CSV file for use in model training.
-
Database Integration:
- The cleaned data is loaded into a PostgreSQL database using SQLAlchemy for easy querying and management.
-
Model Training:
- Data is extracted from the database and preprocessed for model training.
- The model is trained using various machine learning algorithms (e.g., Linear Regression) to make predictions.
- The model's performance is evaluated using metrics like Mean Squared Error (MSE) and R-squared.
-
Future Enhancements (To Do):
- Implement other models like Decision Trees, Random Forests, or Neural Networks.
- Expand the dataset to include more features such as demographics, economic indicators, and more.
- Create a visual interface for users to explore predictions.
- Implement more robust evaluation and cross-validation techniques.
Before running the project, make sure you have the following installed:
- Python 3.8+
- PostgreSQL
- Required Python packages (install using
pip install -r requirements.txt):pandassqlalchemypsycopg2scikit-learnjoblibmatplotlibnumpyfastapiuvicornstreamlit
project-directory/
│
├── data/
│ ├── raw/ # Directory for raw data files
│ ├── cleaned/ # Directory for cleaned data files
│ └── polling_data.csv # Example of raw polling data file
│
├── src/
│ ├── data_collection.py # Script to collect data from APIs and sources
│ ├── data_cleaning.py # Script to clean the data
│ ├── database.py # Script to load data into the database
│ ├── modeling.py # Script to train and evaluate the model
│ └── __init__.py # Initialize the src module
│
└── README.md # Project README file
git clone https://github.com/keithpotz/Election-Perdiction.gitpip install -r requirements.txtCreate Database and then connect in config.py
export DB_CONNECTION_STRING='postgresql://username:password@localhost:5432/your_database_name'Make sure that you change your USERNAME and PASSWORD and the DATABASE_NAME to what you have setup on your machine.
python src/data_collection.pypython src/data_cleaning.pypython src/database.pypython src/modeling.pyThe project currently uses historical polling data up to the 2020 election. Further data collection and preprocessing are required to improve prediction accuracy.
The model currently uses simple linear regression. Future versions will explore more complex models and feature engineering techniques.
This project is open source and contributions are welcome! Please feel free to fork the repository, make improvements, and submit a pull request.