Skip to content

Tomodatachi/Hotel-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

101 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏨 Hotel Analysis

This project explores how hotel prices, ratings, and location data interact — uncovering patterns that influence cost and quality across different regions.


📌 Project Goals

  • Analyze hotel booking data from Booking.com
  • Clean and transform raw data using an Airflow ETL pipeline
  • Load structured data into SQL Server for analysis
  • Discover relationships between price, reviews, ratings, and location

🛠️ Tech Stack

Tool Purpose
Python Data processing and ETL logic
Pandas Data cleaning and transformation
Airflow Workflow orchestration
SQL Server Data storage and querying
GitHub Version control and collaboration

📂 Folder Structure

Hotel-Analysis/ 
├── Booking.com's data                              # Raw CSV files scraped from Booking.com
   ├── README.md                                    # Some description of this folder
   ├── Requirements.txt                             # Version of Selenium used
   ├── Scrapping_code.py                            # Python code for scraping data
   ├── booking_(...).csv                            # Raw CSV files of scrapped data of (...) cities on Booking.com
├── Cleaning data                                   # Processing data
   ├── Missing values detection
      ├── booking_(...)_missing_report.txt          # Details of missing values from raw data
      ├── missing_detail.py                         # Python program for missing value detection in detail
      ├── missing_log.py                            # Python counting program to export a statistic file of missing values
      ├── missing_values_log.txt                    # Log file exported from counting program above
   ├── data
      ├── booking_(...).csv                         # All CSV files after re-editing some changes for later easier use
      ├── great_data.py                             # Python code to combine all CSV files above into one great data file
      ├── combined_bookings.csv                     # The great file of data containing information from CSV files above
      ├── booking_etl_dag.py                        # Python program to write DAG and ETL method inserted to Apache Airflow
      ├── cleaned_bookings.csv                      # The great file of cleaned data using Apache Airflow
   ├── metadata
      ├── metadata_log.py                           # Python code to export metadata of CSV files
      ├── metadata_log.txt                          # A report of metadata of CSV files
   ├── Cleaning_code.py                             # The code for re-editing raw CSV files into CSV files in the "data" folder
├── README.md                                       # Project overview

🚀 How to Run

  1. Clone the repo:
    git clone https://github.com/Tomodatachi/Hotel-Analysis.git
  2. Set up your Airflow DAG using booking_etl_dag.py
  3. Ensure Microsoft SQL Server is running and accessible
  4. Trigger the DAG to extract, transform, and load the data

📊 Insights You Can Explore

  • Which cities offer the best value for money?
  • Do higher review counts correlate with better ratings?
  • How do prices vary by province or season?

👤 Contributors

  • Lương Minh Ngọc # Project Leader | Data Scientist | Data Manager
  • Nguyễn Minh Hoàng # Project Member | Data Analyst
  • Hồ Nhật Tân # Project Member | Idea Maker | Data Analyst
  • Nguyễn Viết Hưng # Project Member | Data Analyst

About

This data project is to find a relationship between costs, ratings and areas affecting a hotel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages