An enhanced web framework (based on Flask) for use in the capstone project. Adds robust user authentication (via Globus Auth), modular templates, and some simple styling based on Bootstrap.
Directory contents are as follows:
/web- The GAS web app files/ann- Annotator files/util- Utility scripts for notifications, archival, and restoration/aws- AWS user data files
Certainly! Here's your README content structured with Markdown formatting, including a main title and bullet points for easier readability:
This project leverages HTML and Flask, a Python web framework, to build a dynamic website that allows users to upload annotation files for processing. The process is as follows:
- When a user uploads a file:
- The frontend uploads this file to an S3 input bucket.
- Updates the DynamoDB database to mark the status as "pending".
- Sends a notification via SQS.
- The backend:
- Continuously monitors the SQS queue for messages.
- Upon receiving a message, it downloads the input file from S3 and stores it locally.
- Runs a subprocess for file processing and updates the job status in DynamoDB to "running".
- Deletes the message from the queue.
- After processing:
- A notification triggers an AWS Lambda function which sends an email notification to the user.
- Updates the job status to "completed".
- Uploads the processed file to an S3 results bucket.
- Cleans up any local files.
- Free User Processing: In
announcing/run.py, once the backend generates the result, it checks if the user is a free user. If so, it sends a message via SQS with a default delay of 5 minutes, allowing the free user to download the result file within this timeframe. - Continuous Monitoring: The script
util/archive/archive.pyruns continuously, monitoring theyanze41_glacier_archiveSQS queue. Upon receiving a message, it checks the user type:- For a free user, it downloads the result file from S3, archives it to Glacier, updates DynamoDB with an archive ID, deletes the result file from S3, and finally deletes the message from the queue.
- For premium users, it simply deletes the message without further action. This additional check ensures that any change in user status from premium to free is accounted for.
- Initiation:
- The restoration process starts when the
/subscribeendpoint inweb/views.pysends a POST request to update the user's profile. This triggers an SQS message to theyanze41_restorequeue.
- The restoration process starts when the
- Message Processing:
util/restore/restore.pywaits for messages from this queue. Upon receipt, it:- Skips processing for free users.
- For premium users, queries DynamoDB for all annotation job records submitted by the user that have been archived to Glacier, specifically those without an
s3_key_result_filebut with aresults_file_archive_id. - Initiates an archive retrieval job for each
results_file_archive_idon Glacier, attaching thejob_thawSNS Topic for notification upon completion. It defaults to expedited retrieval, switching to standard retrieval if faced withInsufficientCapacityException. - Deletes the message from the
yanze41_restorequeue after processing.
- Finalization:
thaw.pylistens on theyanze41_thawqueue, subscribed to theyanze41_thawSNS topic. When a job completion message is received, it:- Uses the Glacier
JobIdfrom the message to retrieve the unarchived file. - Locates the corresponding DynamoDB record by
results_file_archive_id, constructs thes3_key_result_filename, and uploads the file to S3. - Updates the DynamoDB record to delete the
results_file_archive_idand add thes3_key_results_file. - Deletes the message from the
yanze41_thawqueue.
- Uses the Glacier