This is a project designed to simplify and enhance the course registration process for students at Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). The platform addresses common issues with StudOn, the university’s course management system, by providing a web scraper and a user-friendly interface that helps students:
- Quickly search for courses and find direct links to StudOn registration.
- Get notifications when new courses are available or when registration opens.
- Discover courses through recommendations based on their search queries.
This project originated during the "SS2023 Programming with LLMs Seminar" at the Pattern Recognition Lab.
- Features
- Tech Stack
- Demo
- Setup and Installation
- Example Commands
- How Scraping Works
- Project Structure
- Use Cases
- Future Plans
-
Web Scraping:
- Scrapes StudOn for course listings, folders, and documents.
- Tracks whether courses are open for registration.
- Predefined folder scraping for faster operation.
-
Web Application:
- Search and visualize courses and folders from StudOn.
- Direct links to join courses when registration is open.
- Expand/collapse all folders for easier navigation.
-
Database Integration:
- Stores scraped data in a MongoDB database for efficient querying.
- Notifications:
- Email alerts for new courses, joinable courses, or updated documents.
- Customizable notification preferences.
- Course Recommendations:
- Suggest courses based on user search queries.
- Performance Improvements:
- Optimize scraping speed and enable parallel processing.
- Login and Authentication:
- Google OAuth for secure user access.
- Deployment:
- Fully functional and accessible via a public server.
- Programming Language: Python
- Large Language Models: ChatGPT, GitHub Copilot
- Web Scraping: Selenium
- Web Framework: Flask
- Database: MongoDB
- Authentication: Google OAuth
- Libraries:
pandas,selenium,Werkzeug==2.2.2,Flask==2.3.1,pymongo,webdriver-manager,flask-oauthlib
The above GIF demonstrates the scraping bot live in action:
- Navigating through the StudOn platform.
- Identifying folders and courses.
- Extracting relevant data and saving it to MongoDB.
This GIF showcases the functionality of the web application:
- Searching and navigating through folders and courses.
- Viewing course details and joinable statuses.
- Subscribing to a folder to receive notifications on changes.
-
MongoDB:
- Download and install MongoDB from here.
- All required databases and collections will be automatically created during setup, but ensure the correct URI and port are configured in the
bot/config.pyfile. - Example:
MONGODB_URI = "127.0.0.1" MONGODB_PORT = 27017
-
Geckodriver and Chromedriver:
- Ensure
geckodriverandchromedriverare installed and compatible with your browser.
- Ensure
-
Python Packages:
- Install dependencies:
pip install pandas selenium Werkzeug==2.2.2 Flask==2.3.1 pymongo webdriver-manager flask-oauthlib
- Install dependencies:
- Scraping Configuration:
TO_BE_PROCESSED_FOLDERS: Specifies which folders and their subfolders to scrape.- Example:
"root//5. Technische Fakultät//5.3 INF//" - To scrape everything, set to
"root//".
- Example:
START_PAGE_URL: Defines the starting page for scraping.- Example for specific folder scraping:
"https://www.studon.fau.de/studon/goto.php?target=cat_1110" - For scraping everything, provide the main page link containing all superfolders.
- Example for specific folder scraping:
- MongoDB Configuration:
- Ensure
MONGODB_URI,MONGODB_PORT, and other database settings are properly configured.
- Ensure
- Google OAuth Configuration:
- Set
GOOGLE_CLIENT_IDandGOOGLE_CLIENT_SECRETfor enabling Google login. - Example:
GOOGLE_CLIENT_ID = "<your-client-id>" GOOGLE_CLIENT_SECRET = "<your-client-secret>"
- To disable Google login, set
USE_GOOGLE_AUTH = False.
- Set
- Clone this repository:
git clone https://github.com/yourusername/FAU-StudOn-Bot.git cd FAU-StudOn-Bot - Start MongoDB Server through UI f.e.
- Run the Web Scrapper
- for initial run: python main.py initial
- for incremental update: python main.py incremental
- Run the Flask Web Server
- python web_app.py
Switch Database: use university_courses
Find Specific Folder: db.folders.find({ "path": "root//5. Technische Fakultät//5.1 CBI//" })
Delete All Data: db.folders.deleteMany({})
Web scraping in FAU-StudOn-Bot involves extracting data from the FAU StudOn platform and storing it in a structured format for further processing. Below is an explanation of the workflow with visual aids.
- Left: Shows how courses and folders are displayed on the StudOn platform.
- Right: The corresponding HTML source code, where you can identify:
- Whether an item is a course or folder.
- If it’s a course, whether it is joinable.
- Links to the course and its name.
- Data from each page/folder is saved in MongoDB as JSON.
- For each page, the bot saves:
- A unique ID.
- Path to the folder/page.
- URL of the folder/page.
- A hash of the page content for tracking changes.
- A list of items (courses or folders) with details:
- Name
- Link
- Whether it is a folder or a course.
- If it is a joinable course.
- The backend script aligns all JSON data from the database into a nested dictionary structure for the web application.
- Example JSON structure:
{ "5. Technische Fakultät": { "5.3 INF": { "id": "67716f605e3bfb66771e5011", "is_subscribed": false, "Courses": { "Digitale Geistes- und Sozialwissenschaften / Digital Humanities": { "item_name": "Digitale Geistes- und Sozialwissenschaften / Digital Humanities", "item_link": "https://www.studon.fau.de/studon/ilias.php?ref_id=1651226&cmd=render&cmdClass=ilrepositorygui&cmdNode=146&baseClass=ilRepositoryGUI", "joinable": false, "is_folder": false } } } } }
folders/
app/
templates/
index.html # Frontend template
config.py # Configuration for web app
folder_structure.json # Generated file displaying scraped data structure
web_app.py # Backend logic for web app
bot/
Dokumente/ # Temporary storage for documents
dump/ # Temporary storage
chromedriver/ # Required for scraping
config.py # Configuration for bot
crawler.py # Core scraping logic
geckodriver/ # Required for scraping
login.py # Handles login interactions
main.py # Entry point for initial and incremental updates
scraper.py # Implements scraping workflows
readme.md # This file
requirements.txt # Dependencies- Easier Course Search:
- Quickly locate and join courses on StudOn.
- Notifications: Stay informed about new or joinable courses.
- Course Recommendations: Discover new courses based on search text.
- Integration with other FAU faculties beyond the Technical Faculty.
- Testing against edge cases.
- Notification service implementation.
- Code refactoring and testing.
- Faster execution through parallel processing.
- Full deployment for public use.



