National UFO Sightings Summary

This project was built for MPCS 53014: Big Data Application Architecture - Final Project

Project Description

This project was built using data from the National UFO Registry (https://nuforc.org/) and a lambda architecture (shown in diagram below). The project aims to summarize total UFO sightings per state. The batch layer pulls in the total dataset, which may further be summarized by City, UFO Shape, or even Text recognition in the summary.

Technologies used:

Hadoop DFS, HiveQL (HQL), HBase, Apache Spark, Apache Kafka, Scala, Node.js, Mustache, Shell, AWS EMR & EC2

Running Application Screenshots

How to Run Deployed Application

The deployed web application can be found at http://ec2-52-14-115-151.us-east-2.compute.amazonaws.com:3060/index.html, currently running in arohani/3-webapp/. More details for deployment can be found below, and additional commands will be found in the shell scripts within the folders (.sh)

A user submission form will be used for the kafka streaming layer, and can be accessed a http://ec2-52-14-115-151.us-east-2.compute.amazonaws.com:3060/submit.html

To initiate the speed layer, you must:

ssh into the kafka ec2 cluster and cd into the project folder
submit the spark job:
spark-submit --master local[2] --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/ss.log4j.properties" --class KafkaUFO uber-UFO_speed_scala-1.0-SNAPSHOT.jar b2.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092,b1.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092,b3.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092

Building the Application

Batch Layer

We scrape data from the National UFO site using the clean_ufo.py file - this outputs and saves the data to a csv within the /1-batchlayer/scrape_data folder. The scrape can be used at any time to create a new batch file.

We then ingest the data into hdfs using the ingest_ufo.sh file. The data csv was saved locally and scp'd into the hadoop cluster (/home/hadoop/arohani/ufo)

The hql tables are created using the create_hql.sh. This shell script calls on the hql files found inside /1-batchlayer/hive to to create the relevant Hive/ORC tables.

The csv is first mapped to a hive table called arohani_ufo_csv, and an ORC table is created after called arohani_ufo.

Serving Layer

The serving layer we use computes the batch views and stores them in HBase for more scalable serving. We do the following:

Inside the /2-servinglayer/ folder, we find three things:

/create_views_scala/
/hBase/
serving.sh

The serving.sh file is used to create the views, but a few steps were followed. The tables were first created in hbase with column key 'ufo'.

The files in create_views_scala create various views in Hive, all based on the batch tables arohani_ufo. Following the table creations, the files in /2-servinglayer/hbase/ will then transfer the data in the hive views to hbase.

The specific methods used can be found in serving.sh (including all table creation)

Speed Layer

The speed layer streams in data in real-time; however, in our case, the uber-jar and spark query we use only updates and streams data that comes from a user submitted form on the webapp. This method can be updated to stream new data in real-time from a website, if this exists.

The current code tracks inputted data to a kafka topic arohani_ufo_submissions - this topic tracks three things (state, city, summary) and saves the current date / year upon submission. The uber-jar found in /4-speedlayer/UFO_speed_scala/ is responsible for linking the data in the kafka topic back to our hbase table, incrementing the summarization column specified by 1.

For example, if a user submits a new sighting in Texas, this will save a new key as TX2023 and map it to our hbase table, incrementing the total count by 1.

The spark streaming job is triggered by calling this in the project folder of our kafka cluster: spark-submit --master yarn --deploy-mode client --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/ss.log4j.properties" --class StreamReports uber-UFO_speed_scala-1.0-SNAPSHOT.jar b-2.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092,b-3.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092

UI Layer

We use a few items to display our webpage, including javascript and mustache templates. We have two webpages, index.html and submit.html, that display our results and user-submission forms.

The app can be run with two different methods:

Local Host on Cluster: (accesible at localhost:3000)
- cd into the /3-webapp/ folder and run npm install
- SSH into the cluster
- Run node app.js on the specified port ssh -i 'PEM_file' -L 8070:ec2-3-131-137-149.us-east-2.compute.amazonaws.com:8070 hadoop@ec2-3-131-137-149.us-east-2.compute.amazonaws.com
Port Specified Kafka Cluster (accesible at ec2-3-143-113-170.us-east-2.compute.amazonaws.com:3060/submit.html)
- cd into the /3-webapp/ folder and run npm install
- SSH into the Kafka cluster
- Run the app node app.js 3060 ec2-3-131-137-149.us-east-2.compute.amazonaws.com 8070 b-2.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092,b-3.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014kafka.o5ok5i.c4.kafka.us-east-2.amazonaws.com:9092

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
1-batchlayer		1-batchlayer
2-servinglayer		2-servinglayer
3-webapp		3-webapp
4-speedlayer		4-speedlayer
images		images
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

National UFO Sightings Summary

Project Description

Running Application Screenshots

How to Run Deployed Application

Building the Application

Batch Layer

Serving Layer

Speed Layer

UI Layer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ashkanray/UFO-Tracking

Folders and files

Latest commit

History

Repository files navigation

National UFO Sightings Summary

Project Description

Running Application Screenshots

How to Run Deployed Application

Building the Application

Batch Layer

Serving Layer

Speed Layer

UI Layer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages