Skip to content

Latest commit

 

History

History
65 lines (36 loc) · 2.52 KB

File metadata and controls

65 lines (36 loc) · 2.52 KB

Crystal-Base

Table of Contents

  1. Protein Crystallization Challenges
  2. Dataset
  3. Architecture
  4. Web App

Protein Crystallization Challenges

Crystal-Base is an image classification pipeline that reports whether or not an image contains a protein crystal. Crystal-Base caters towards both academic and industrial researchers who are running large scale HTS protein crystallization projects who do not want to spend time on the mundane task of identifying possible protein crystals from their crystallization screens.

Image of Protein Crystal Screen

Dataset

All protein crystal data was obtained from the Marco Database

Architecture

Image of Pipeline

Setting up AWS

Crystal-base uses pegasus to setup AWS clusters with configurations in yaml files.

Run ./main.sh --setup-pegasus to install pegasus.

Run ./main.sh --setup-config to setup the bash environment

Run ./main.sh --setup-database to setup a Postgres database.

Run ./main.sh --setup-hadoop to setup a hadoop cluster.

Run ./main.sh --setup-spark to setup a spark cluster

Run ./main.sh --setup-web-server to setup a web server.

Ingestion

Crystal base ingests files from the Marco Database using bash and an EC2 instance to an S3 bucket.

Run source src/bash/ingestMarcoFiles.sh && ingestMarcosFiles to ingest files

Training

Crystal-base uses transfer learning inceptionv3 training model to identify protein drop crystals from the Marco Database.

Run python3 src/python/classifyImagesTrainer.py to train the image classifier and write to a Postgres Database.

Distributed Image Classification

Data is ingested with Spark from S3 buckets and batch processedon a distributed tensorflow cluster using executors running their own tensorflow instances.

Run ./main.sh --classify-images simple to use the simple test classifier. Results are expected to output to a Postgres database.

Web App

Crystal-base has a web interface that runs its own instance of the trained tensorflow model.

Image of Web App

Run ./main.sh --run-webs-server to run this web-server instance.

Try it out!

Upload protein crystal jpeg images at Crystal-Base