Download training images from https://github.com/GWUvision/Hotels-50K/ (execute download_train.py).
Problems: dataset is huge (1,027,871 images from 50,000 hotels, and 92 major hotel chains) so that takes a lot of time. For the prototyping phase I had to download only part of it.
create_database.py Create PostgreSQL database 'hotel_ibs' with tables:
- hotel
- chain
- images
Problems: again, the dataset is huge so I had to only create part of the entries.
tag_images.py Get tags with OpenAI CLIP
- install OpenAI CLIP
- git clone https://github.com/openai/CLIP.git
- cd CLIP
- pip install -r requirements.txt
- pip install .
- run images through OpenAI CLIP
- save tags to relevant image_id
Problems:
- the current implementation that I found, I have to create the tags myself, they are not auto-generated from an internal Clip dictionary as I had thought.
- the larger the size of the tags dictionary, the more memory CLIP takes to tag the image. my computer was quickly running out of memory for dictionaries above 200 words. For dictionaries at 200 words it took 30s per image so I decided to move it to a Google Colab session to use T4 GPUs.
I already had a backend from another project and just had to make minor adjustements
I already had a frontend from another project and just had to make minor adjustements. There is a text box and a button "Search".
When page is loaded, information from database is downloaded. When user clicks the button search, the tags are searched for the text in the textbox. If images are tagged with this search item they are downloaded from the AWS S3 bucket and displayed. The images are cached to diminish download time in the future.
This is the first working state of the application:

The images from the original dataset only contained photos of the insides of the rooms, and I wanted more photos from the hotel and hotel facilities (pool, restaurant, ...) so I created a script to download 10 images per hotel from a pre-filtered hotel list. I also reduced the amount of hotels from xx thousand to xx hundreds. At the end I had ~4000 images from the original dataset and ~2000 from google. I had to retag them and re-uploaded the new complete set (~6000 images) to AWS and updated the database correspondingly.