GitHub - nickdavila/LAE-ML-Classification: A program that uses Machine Learning to classify Lyman Alpha emitting galaxies (LAE).

Classifying High-Redshift Galaxies from the HETDEX Survey Using a Random Forest Classifier

https://sites.utexas.edu/vip/

Created by Nick Davila with the help from the awesome people at GEVIP
Explore the secondary research for this project»

Table of Contents

About Of The Project
Prerequisites
Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

One of GEVIP’s (Galaxy Evolution Vertically Integrated Project) themes is working with HETDEX (Hobby-Eberly Telescope Dark Energy Experiment). HETDEX is an unbiased spectroscopic survey using the 10m Hobby Eberly Telescope (HET) and its VIRUS integral-field unit (IFU) spectrograph. HETDEX is in the process of discovering distant galaxies on the basis of their strong Lyman-α emissions. In some GEVIP projects, we use the discovered Lyman-α emitting galaxies with the goal of understanding how the Milky Way galaxy was formed. In order to get usable data, we work on classifying Lyman-α emitting galaxies from large sets of data which contain different astronomical objects. To classify, we divide astronomical objects into groups based on their visual appearance. However, data in astronomy is getting larger and more complex, so we are turning to machine learning algorithms that can adapt to increasingly large sets of data. Therefore, this project aims to train a Random Forest Classifier to classify astronomical spectra and differentiate between noise spectra and high-redshift galaxy spectra.

In order to maximize our discovery space, we need to push our detections to low signal-to-noise (very noisy data), so we need to find a robust way to differentiate between true astrophysical objects and noise features in the data catalog. Historically ML algorithms struggle with differentiating real spectra and noise spectra. The motivation for this project was to implement an algorithm to solve this problem specifically. This will allow for more high-redshift sources to be studied, which will help us learn more about the period of reionization in the universe.

(back to top)

Prerequisites

We used the HETDEX HDR3 internal day release 3.0.1 and the HETDEX API: https://github.com/HETDEX/hetdex_api

Import the following libraries:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns # statistical data visualization

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics 

# Can use pip install for these

The next prerequisite step is importing data. We use an internal detections catalog for our high-redshift galaxies (sources that were visually classified and vetted by many people). For the noise data I took the HDR3 catalog (specifically used photometry) and extracted in the sections in the sky where there were no detections within 200 arcseconds.

Usage

Create a sample of high-redshift galaxies and noise sources. We create a data set of 20,000 sources, 10,000 being high-redshift galaxies and 10,000 noise sources.
For binary classification you need to label the data. We chose a '1' to mean high-redshift galaxy and a '0' to mean a noise source.

(back to top)

Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request or you can also simply email me!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

Contact

Nick Davila - ndavila@utexas.edu

(back to top)

Acknowledgments

Very special thanks to Oscar A. Chavez Ortiz for guiding me throughout the entire project.
Thank you to Gene Leung and Steven Finkelstein for their expert advice along the way.
Thank you to all my peers in GEVIP for the tips and inspiration.

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
images		images
secondary-research		secondary-research
.DS_Store		.DS_Store
Data_Sampling.ipynb		Data_Sampling.ipynb
Nick-RandomForest.ipynb		Nick-RandomForest.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying High-Redshift Galaxies from the HETDEX Survey Using a Random Forest Classifier

https://sites.utexas.edu/vip/

About The Project

Prerequisites

Usage

Contributing

Contact

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

nickdavila/LAE-ML-Classification

Folders and files

Latest commit

History

Repository files navigation

Classifying High-Redshift Galaxies from the HETDEX Survey Using a Random Forest Classifier

https://sites.utexas.edu/vip/

About The Project

Prerequisites

Usage

Contributing

Contact

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages