Skip to content

WIP: Web scraping & automated data update#417

Open
laurenp-2 wants to merge 34 commits intomainfrom
web_scraping
Open

WIP: Web scraping & automated data update#417
laurenp-2 wants to merge 34 commits intomainfrom
web_scraping

Conversation

@laurenp-2
Copy link
Copy Markdown

@laurenp-2 laurenp-2 commented Mar 23, 2026

Summary

This pull request is the first step towards implementing automated data updates and web scraping. The PR introduces backend scripts that export current property information in the database into a CSV, and then updates the database with information from the csv after it is updated with new information. Additionally, it introduces a script to scrape the PJApts website.
This PR also builds off of changes introduced in the Admin Data Editor PR.

  • implementedexport_apartments and update_apartments_from_csv
  • implementedscrapePJApts and runScrapers
  • connect the export/update CSV pipeline, web scraper, and admin data editor
  • API endpoint to trigger the scraper
  • diff logic (compare scraped results against existing information pulled from the database
  • scraper -> csv output
  • Admin UI: button to trigger scraping, csv upload to trigger database updates

Test Plan

unit tests present in scripts.test.ts

Notes

CasperL1218 and others added 30 commits September 21, 2025 23:03
\
- Implemented apartment information edit endpoint
- Implemented create new apartment endpoint
- Implemented admin page apartment pagination, sorting, and editing functionalities
- Implement folder page to display user folders
- Add folder card component for folder representation
- Added 'add to folder' button in apartment page
- Debugged authentication issues in folder operations
  Backend Changes:
  - Add RoomType interface with UUID-based IDs (beds, baths, price)
  - Replace single numBeds/numBaths/price fields with roomTypes array
  - Add migration endpoint POST /api/admin/migrate-all-apartments-schema
    - Supports dry run mode for preview
    - Batch processing (100 apartments per batch)
    - Initializes roomTypes as empty array and removes old fields
  - Update PUT /api/admin/update-apartment to handle roomTypes
    - Generate UUIDs for new room types
    - Validate beds/baths/price >= 1 (integers only)
    - Check for duplicate room type combinations
  - Update POST /api/admin/add-apartment to accept roomTypes
    - Same validation and UUID generation
    - Allow empty roomTypes array
  - Update add_buildings.ts script to use new schema

  Frontend Changes (Temporary):
  - Comment out old bed/bath/price displays in ApartmentCard
  - Comment out bed display in NewApartmentCard
  - Update price sorting to use avgPrice temporarily
  - Proper room type display will be implemented in future commits
  - Proper room type sorting will be implemented in future commits

  Note: All apartments will start with empty roomTypes after migration.
  Frontend integer validation for admin UI will be added in following commits.
  Admin Page Room Type Management
  - Added room types editing modal with table UI for beds/baths/price
  - Displays existing room types with inline editing and delete functionality
  - Includes "Add Room Type" button with validation (>= 1 for all fields)
  - Checks for duplicate room types (same beds/baths/price combination)
  - Backend generates UUIDs for new room types automatically
  - Shows room type count in Data tab instead of old numBeds/numBaths fields
  - Displays room type IDs (truncated) for debugging

  Room Type Display Utilities and Card Updates
  - Created roomTypeUtils.ts with comprehensive display functions:
    - formatPrice: Formats prices with K suffix (e.g., $1.5K, $3K)
    - formatBeds/formatBaths: Formats bed/bath counts (e.g., "2 beds", "1 bath")
    - getRoomTypeRange: Gets min/max values for beds/baths/price
    - formatPriceRange: Formats price ranges (e.g., "$1.5K - $3K")
    - formatRoomTypesDisplay: Complete display string for cards
    - getMinPrice/getMaxPrice: Helper functions for sorting
  - Updated NewApartmentCard to display dynamic price ranges from room types
  - Shows "Coming soon" when no room types are available
  - Update ApartmentCard component to display room type information
    - Add imports for formatPriceRange, formatBedsRange, getRoomTypeRange utilities
    - Replace TODO section with dynamic room types display
    - Show "Coming soon" message for apartments with empty roomTypes
    - Display formatted price and bed ranges for apartments with room types data

  - Implement comprehensive search filtering with room types support
    - Re-enable price, bedroom, and bathroom filter parameters
    - Add exact match filtering for beds/baths (not >=)
    - Implement additional search result sections (location, price, bed/bath)
    - Return structured response with main results + 3 additional sections
    - Filter apartments based on roomTypes array instead of old schema
    - Handle empty roomTypes arrays in filter logic

  - Update search results page to support additional sections
    - Add state variables for additionalLocation, additionalPrice, additionalBedBath
    - Prepare frontend for displaying multiple result sections

  - Enhance apartment sorting with room types
    - Use getMinPrice/getMaxPrice utilities for sorting by price
    - Sort by minimum price (cheapest) or maximum price (most expensive)
    - Handle apartments with empty roomTypes arrays

  This completes the frontend display implementation for the room types refactor.
  Backend migration tools and admin CRUD are already in place from previous commits.
  Ready for data migration and testing.
…ayout

  - Rename "Data" tab to "Apartment Data" for clarity
  - Remove test apartments pagination and filtering logic
  - Add real-time search functionality:
    - Search by apartment name
    - Search by apartment address
    - Both searches reset pagination to first page
  - Add room types filter dropdown:
    - All Apartments (default)
    - With Room Types (show only apartments with room types data)
    - Without Room Types (show only apartments missing room types)
  - Reorganize apartment display with horizontal card layout:
    - Replace vertical list with compact card-based design
    - Organize information in 3 responsive columns (Location, Details, Stats)
    - Add visual distinction for apartments without room types (gray italic)
    - Position edit button in top-right corner of each card
    - Reduce vertical space usage by ~50%
  - Update display counter to show filtered vs total apartment counts
  - Simplify pagination logic by removing test apartment handling
  - Improve visual hierarchy with section labels and consistent spacing

  This makes it easier for admins to find and manage apartments,
  especially when populating room types data after migration.
  - Add "Create New Apartment" button in Apartment Data tab header
  - Implement two-step apartment creation workflow:
    1. Preview mode: Calculate location data from address
    2. Confirm mode: Create apartment in database
  - Create modal dialog with required fields:
    - Apartment Name (text input)
    - Full Address (text input with geocoding support)
    - Landlord ID (text input)
    - Area (dropdown: Collegetown, West, North, Downtown, Other)
  - Display preview of calculated location data before creation:
    - Show latitude and longitude from geocoded address
    - Show calculated distance to campus
    - Confirm before final creation
  - Add comprehensive error handling and validation:
    - Form validation for required fields
    - Backend error display in modal
    - Loading states for preview and create actions
  - Integrate with existing `/api/admin/add-apartment` endpoint
    - Endpoint automatically geocodes address to get coordinates
    - Calculates walking distance to Ho Plaza
    - Validates landlord exists
    - Checks for duplicate locations
  - Auto-reload apartment list after successful creation
  - Show success message with generated apartment ID

  This streamlines the apartment creation process for admins by
  automating location data calculation and providing validation
  before committing changes to the database.
  - Persist filter state and query text across search navigation
  - Auto-close autocomplete dropdown after search submission
  - Add red color indicator for active filter sections on search results page
  - Improve search results layout: scrollable apartment list (3/5 width) with sticky map (2/5 width, max
  650px)
  - Reduce apartment card gaps (6px column, 12px row) and fix clickable regions
  - Show empty state message when no exact matches found
  - Only display "No search results" in autocomplete when user has typed text
- Added scripts to export apartment data to CSV and update apartment data from CSV
- Updated package.json with new script commands for data export and update
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ CasperL1218
✅ laurenp-2
❌ Lauren Pothuru


Lauren Pothuru seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dti-github-bot
Copy link
Copy Markdown
Member

[diff-counting] Significant lines: 5535. This diff might be too big! Developer leads are invited to review the code.

@laurenp-2 laurenp-2 changed the title WIP: Web scraping WIP: Web scraping & automated data update Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants