mAIstro!

Being students ourselves, we've explored the depths of AI-powered education apps that claim to be the place to study. There are obviously AI study programs galore, but they all lack in a specific area: consistency while staying true to the content they need to teach.

Key Features:

Smart Organization: Users upload their full course materials, and Maistro intelligently segments and categorizes snippets into precise subjects (e.g., vectors in Linear Algebra, decision trees in Machine Learning). This cross-document system enables subject-specific study materials.
Adaptive Quizzes & Mastery System: Maistro tracks user understanding, generating personalized quizzes and optimizing future study sessions based on performance. It applies strategic forgetting to reinforce long-term retention 1. Over time, previously mastered topics fade, encouraging continued review and reinforcement.
Precision Study Support: The mastery system identifies weak subjects and pinpoints exact locations in the user’s study materials for targeted revision. This grounded learning approach ensures adaptability while staying true to course content.

Interface when looking at a course

Seeing all the relevant snippets from the uploaded data for this subject

Seeing all the relevant snippets from the uploaded data for another subject

Requirements

Python 3.8+
Google Cloud Platform account with Vertex AI and Storage enabled
Required Python packages (see requirements.txt)

Setup

Clone this repository
Install dependencies:
```
pip install -r requirements.txt
```
Set up GCP authentication:
- Create a service account with appropriate permissions
- Download the JSON key file
- Set the environment variable: export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
Create a .env file in the parent directory with the following variables:
```
PROJECT_ID=your-gcp-project-id
GOOGLE_CLOUD_REGION=us-central1
```

Usage

You can use the PDFProcessor class in your code:

from gcp.pdf_sectioner import PDFProcessor

# Initialize the processor with your credentials file
processor = PDFProcessor(debug=True, credentials_path="path/to/service-account-key.json")

# Process a PDF file from GCS
results = processor.process_pdf(
    bucket_name="your-gcs-bucket-name",
    pdf_blob_name="path/to/your/document.pdf"
)

# Results are saved to the same bucket with the same name but .json extension
# e.g., "path/to/your/document.json"

# View the results
print(results)

Or run the script directly:

python gcp/pdf_sectioner.py

How It Works

PDF Access: The system reads the PDF directly from Google Cloud Storage as a BytesIO object in memory.
LLM-Enhanced Text Extraction:
- Extracts title, authors, and abstract from the first page
- Identifies section titles and their content
- Creates a structured representation of the document
Subject Identification:
- Processes the structured content to identify key subjects
- Assigns importance scores and categories to each subject
- Provides contextual information for each subject
Database Integration:
- Checks if each subject already exists in the database
- Adds new subjects that don't already exist
Results Storage:
- Saves the complete results as a JSON file to the same GCS bucket
- Uses the same filename as the PDF but with a .json extension

Database Integration

The current implementation includes placeholder functions for database integration. To integrate with your database:

Modify subject_exists_in_database() to query your actual database
Modify add_subject_to_database() to insert data into your actual database

Output Format

The system generates a detailed JSON output with:

Document metadata (title, authors, abstract)
Structured sections of the document
Extracted subjects with importance scores and categories
Processing statistics

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
Backend		Backend
front-end		front-end
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mAIstro!

Requirements

Setup

Usage

How It Works

Database Integration

Output Format

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Steelcrawler/GenAIGenesis

Folders and files

Latest commit

History

Repository files navigation

mAIstro!

Requirements

Setup

Usage

How It Works

Database Integration

Output Format

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages