Being students ourselves, we've explored the depths of AI-powered education apps that claim to be the place to study. There are obviously AI study programs galore, but they all lack in a specific area: consistency while staying true to the content they need to teach.
Key Features:
-
Smart Organization: Users upload their full course materials, and Maistro intelligently segments and categorizes snippets into precise subjects (e.g., vectors in Linear Algebra, decision trees in Machine Learning). This cross-document system enables subject-specific study materials.
-
Adaptive Quizzes & Mastery System: Maistro tracks user understanding, generating personalized quizzes and optimizing future study sessions based on performance. It applies strategic forgetting to reinforce long-term retention 1. Over time, previously mastered topics fade, encouraging continued review and reinforcement.
-
Precision Study Support: The mastery system identifies weak subjects and pinpoints exact locations in the user’s study materials for targeted revision. This grounded learning approach ensures adaptability while staying true to course content.
Seeing all the relevant snippets from the uploaded data for this subject
Seeing all the relevant snippets from the uploaded data for another subject
- Python 3.8+
- Google Cloud Platform account with Vertex AI and Storage enabled
- Required Python packages (see requirements.txt)
-
Clone this repository
-
Install dependencies:
pip install -r requirements.txt -
Set up GCP authentication:
- Create a service account with appropriate permissions
- Download the JSON key file
- Set the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
-
Create a
.envfile in the parent directory with the following variables:PROJECT_ID=your-gcp-project-id GOOGLE_CLOUD_REGION=us-central1
You can use the PDFProcessor class in your code:
from gcp.pdf_sectioner import PDFProcessor
# Initialize the processor with your credentials file
processor = PDFProcessor(debug=True, credentials_path="path/to/service-account-key.json")
# Process a PDF file from GCS
results = processor.process_pdf(
bucket_name="your-gcs-bucket-name",
pdf_blob_name="path/to/your/document.pdf"
)
# Results are saved to the same bucket with the same name but .json extension
# e.g., "path/to/your/document.json"
# View the results
print(results)Or run the script directly:
python gcp/pdf_sectioner.py
-
PDF Access: The system reads the PDF directly from Google Cloud Storage as a BytesIO object in memory.
-
LLM-Enhanced Text Extraction:
- Extracts title, authors, and abstract from the first page
- Identifies section titles and their content
- Creates a structured representation of the document
-
Subject Identification:
- Processes the structured content to identify key subjects
- Assigns importance scores and categories to each subject
- Provides contextual information for each subject
-
Database Integration:
- Checks if each subject already exists in the database
- Adds new subjects that don't already exist
-
Results Storage:
- Saves the complete results as a JSON file to the same GCS bucket
- Uses the same filename as the PDF but with a .json extension
The current implementation includes placeholder functions for database integration. To integrate with your database:
- Modify
subject_exists_in_database()to query your actual database - Modify
add_subject_to_database()to insert data into your actual database
The system generates a detailed JSON output with:
- Document metadata (title, authors, abstract)
- Structured sections of the document
- Extracted subjects with importance scores and categories
- Processing statistics
MIT
