Skip to content

Sigolon/kick_interview

Repository files navigation

Abstract

本專案為,KICKS CREW Software Engineer 面試實作,要求如下。

crawl data from this url: https://racing.hkjc.com/racing/information/Chinese/racing/LocalResults.aspx (only 賽事資料(本地)結果)

  1. can save data as json files or to a database, such as sqlite(bonus)

  2. create a simple restful application with an endpoint to query the crawled specimens.

  3. can written in node.js(Nest.js), python(fast api) or golang(gin)

    • use swagger ui to document the api
    • [bonus] create an endpoint to download the crawled data including image as a PDF.
  4. please use git to manage code and commit to github.

    • for reproducibility, please write a README.md to describe how to start the crawler and application.
    • [bonus] use docker to containerize the application

Architecture diagram

未命名簡報

Directory structure

project_folder
├── db_table_csv
│   └── player_info.csv
├── horse_pdf
│   └── G415.pdf
├── house_image
│   ├── A005.jpg
│   ├── A050.jpg
├─── crawler.py
├─── crawler_function.py
├─── etl_function.py
├─── fast_api.py
├─── data_base.json
├─── date_check.json
├─── data_base.bd
├─── Dockerfile
└─── requirements.txt

P.S

The actual operational files of this project are two: crawler.py and fast_api.py. All other files are outputs generated by these two files.

Folder

db_table_csv : If a client downloads a table from data_base.db via FastAPI, the result will be saved in this folder.

horse_pdf : If a client downloads a horse pdf from house_image via FastAPI, the result will be saved in this folder.

house_image : Save the crawled horse images by crawler.py

File

crawler.py : The main program for executing web scraping calls functions from crawler_function.py and etl_function.py to process data. It produces data_base.json, date_check.json, and data_base.bd

date_check.json : A data to validate crawler.py for completeness without affecting the project's operation.

data_base.json : Store the crawled data from crawler.py

data_base.bd : Store data with SQLite DB and data from data_base.json

fast_api.py : Execute the main program of FastAPI.

Dockerfile : The file used to run this project in Docker. There are two services: crawler and fastapi.

requirements.txt : The Python packages required for this project.

Development details

Step1 : Google Cloud Platform Config

  1. GCP instance (VM) image image image

  2. GCP Web fire wall image

Step2 : Crawler data and input SQLite DB

See the crawler.py

Step3 : Build FastAPI and connect SQLite DB

See the fast_api.py. See the swagger ui : http://34.171.58.0/docs

Step4 : Build Dockerfile and run the FastAPI server

See Dockerfile

  1. sudo docker build -t kick_interview .
  2. sudo docker run -d -p 80:80 --name kick_interview kick_interview

How to work

Use docker :

1. git clone https://github.com/Sigolon/kick_interview.git
2. sudo docker build -t kick_interview .
3. sudo docker run -d -p 80:80 --name kick_interview kick_interview

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors