本專案為,KICKS CREW Software Engineer 面試實作,要求如下。
crawl data from this url: https://racing.hkjc.com/racing/information/Chinese/racing/LocalResults.aspx (only 賽事資料(本地)結果)
-
can save data as json files or to a database, such as sqlite(bonus)
-
create a simple restful application with an endpoint to query the crawled specimens.
-
can written in node.js(Nest.js), python(fast api) or golang(gin)
- use swagger ui to document the api
- [bonus] create an endpoint to download the crawled data including image as a PDF.
-
please use git to manage code and commit to github.
- for reproducibility, please write a README.md to describe how to start the crawler and application.
- [bonus] use docker to containerize the application
project_folder
├── db_table_csv
│ └── player_info.csv
├── horse_pdf
│ └── G415.pdf
├── house_image
│ ├── A005.jpg
│ ├── A050.jpg
├─── crawler.py
├─── crawler_function.py
├─── etl_function.py
├─── fast_api.py
├─── data_base.json
├─── date_check.json
├─── data_base.bd
├─── Dockerfile
└─── requirements.txt
The actual operational files of this project are two: crawler.py and fast_api.py. All other files are outputs generated by these two files.
db_table_csv : If a client downloads a table from data_base.db via FastAPI, the result will be saved in this folder.
horse_pdf : If a client downloads a horse pdf from house_image via FastAPI, the result will be saved in this folder.
house_image : Save the crawled horse images by crawler.py
crawler.py : The main program for executing web scraping calls functions from crawler_function.py and etl_function.py to process data. It produces data_base.json, date_check.json, and data_base.bd
date_check.json : A data to validate crawler.py for completeness without affecting the project's operation.
data_base.json : Store the crawled data from crawler.py
data_base.bd : Store data with SQLite DB and data from data_base.json
fast_api.py : Execute the main program of FastAPI.
Dockerfile : The file used to run this project in Docker. There are two services: crawler and fastapi.
requirements.txt : The Python packages required for this project.
See the crawler.py
See the fast_api.py. See the swagger ui : http://34.171.58.0/docs
See Dockerfile
- sudo docker build -t kick_interview .
- sudo docker run -d -p 80:80 --name kick_interview kick_interview
1. git clone https://github.com/Sigolon/kick_interview.git
2. sudo docker build -t kick_interview .
3. sudo docker run -d -p 80:80 --name kick_interview kick_interview




