Abstract

本專案為，KICKS CREW Software Engineer 面試實作，要求如下。

crawl data from this url: https://racing.hkjc.com/racing/information/Chinese/racing/LocalResults.aspx （only 賽事資料（本地）結果）

can save data as json files or to a database, such as sqlite(bonus)
create a simple restful application with an endpoint to query the crawled specimens.
can written in node.js(Nest.js), python(fast api) or golang(gin)
- use swagger ui to document the api
- [bonus] create an endpoint to download the crawled data including image as a PDF.
please use git to manage code and commit to github.
- for reproducibility, please write a README.md to describe how to start the crawler and application.
- [bonus] use docker to containerize the application

Architecture diagram

Directory structure

project_folder
├── db_table_csv
│   └── player_info.csv
├── horse_pdf
│   └── G415.pdf
├── house_image
│   ├── A005.jpg
│   ├── A050.jpg
├─── crawler.py
├─── crawler_function.py
├─── etl_function.py
├─── fast_api.py
├─── data_base.json
├─── date_check.json
├─── data_base.bd
├─── Dockerfile
└─── requirements.txt

P.S

The actual operational files of this project are two: crawler.py and fast_api.py. All other files are outputs generated by these two files.

Folder

db_table_csv : If a client downloads a table from data_base.db via FastAPI, the result will be saved in this folder.

horse_pdf : If a client downloads a horse pdf from house_image via FastAPI, the result will be saved in this folder.

house_image : Save the crawled horse images by crawler.py

File

crawler.py : The main program for executing web scraping calls functions from crawler_function.py and etl_function.py to process data. It produces data_base.json, date_check.json, and data_base.bd

date_check.json : A data to validate crawler.py for completeness without affecting the project's operation.

data_base.json : Store the crawled data from crawler.py

data_base.bd : Store data with SQLite DB and data from data_base.json

fast_api.py : Execute the main program of FastAPI.

Dockerfile : The file used to run this project in Docker. There are two services: crawler and fastapi.

requirements.txt : The Python packages required for this project.

Development details

Step1 : Google Cloud Platform Config

GCP instance (VM)
GCP Web fire wall

Step2 : Crawler data and input SQLite DB

See the crawler.py

Step3 : Build FastAPI and connect SQLite DB

See the fast_api.py. See the swagger ui : http://34.171.58.0/docs

Step4 : Build Dockerfile and run the FastAPI server

See Dockerfile

sudo docker build -t kick_interview .
sudo docker run -d -p 80:80 --name kick_interview kick_interview

How to work

Use docker :

1. git clone https://github.com/Sigolon/kick_interview.git
2. sudo docker build -t kick_interview .
3. sudo docker run -d -p 80:80 --name kick_interview kick_interview

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Architecture diagram

Directory structure

P.S

Folder

File

Development details

Step1 : Google Cloud Platform Config

Step2 : Crawler data and input SQLite DB

Step3 : Build FastAPI and connect SQLite DB

Step4 : Build Dockerfile and run the FastAPI server

How to work

Use docker :

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
db_table_csv		db_table_csv
horse_pdf		horse_pdf
house_image		house_image
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
crawler.py		crawler.py
crawler_function.py		crawler_function.py
data_base.db		data_base.db
data_base.json		data_base.json
date_check.json		date_check.json
etl_function.py		etl_function.py
fast_api.py		fast_api.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Abstract

Architecture diagram

Directory structure

P.S

Folder

File

Development details

Step1 : Google Cloud Platform Config

Step2 : Crawler data and input SQLite DB

Step3 : Build FastAPI and connect SQLite DB

Step4 : Build Dockerfile and run the FastAPI server

How to work

Use docker :

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages