This project, titled "Crowd Sourced Mapping," is a comprehensive machine learning study aimed at classifying land cover based on geographical data. Our team developed a model using Multivariate Classification techniques to analyze a dataset comprising 10,545 entries with 29 features, primarily focusing on 'max_ndvi' and other temporal data.
The dataset for this project is designed to derive training data from crowd-sourced polygons, which is essential for the automated classification of satellite images into various land cover categories. This project showcases our team's capabilities in statistical analysis and machine learning techniques to derive meaningful insights from environmental and geographical data.
The data, sourced from the UCI Machine Learning repository, combines crowdsourced polygons with Landsat satellite imagery. It includes diverse categories like impervious surfaces, farms, forests, grasslands, orchards, and water bodies, with a focus on climate and environment.
The project employs various machine learning models, including logistic regression and neural networks, to categorize vegetation cover. We have used techniques like SMOTE and RandomUnderSampler for handling class imbalance and implemented PCA for dimensionality reduction.
Our study's outcome includes the application of logistic regression and neural networks for land cover classification. We have also analyzed the bias and variance tradeoff in our models to optimize performance.
The study underscores the potential of machine learning in geospatial analysis, offering essential insights for comprehending environmental patterns and fluctuations.