Taiwan Credit Card Information Crawler (TCCIC) API
Expand
- Add Ollama LLM and HuggingFace embedding model
- Fix yaml key error
- Add RAG embedding function
- Add RAG complete function
- Return source information when using RAG
- Gemini flash 2.0
- Improve RAG accuracy
- HuggingFace severless api for embedding model
- Model yaml
- Smarter crawlers
- A simpler, more extensible, and more maintainable crawler template(card.yaml)
- Modify the JSON data saving method
- Modify the RAG encoding method
- Logger
- Contextual retrieval
- BM25
- Hybrid search
- Reranker
- Support image crawling
- Refactor project structure, Make it more modular
- RESTful API style
- Make crawlers more automated
- Automatically search all cards
- Support dynamic web crawler
- Crawl the bank card lists
- Crawl the card features
- Support image retrieval
- Customize llm package
- Remove the llama-index framework
- Add batch crawler API
- Add more banks
- Improve System Stability & Security
OS: Ubuntu 22.04.3 LTS
Python Version: 3.11.13
Docker Version: 27.0.3
-
Install the vector database(Milvus)
Follow the installation guide on this page. -
Create a conda environment and installing dependencies
conda create -n tccic python=3.11 git clone https://github.com/qpal147147/TCCIC.git cd TCCIC pip install -r requirement.txt playwright install -
Modify project settings
- Choose your LLM provider and Embedding provider
If your provider is not listed, refer to this guide to customize your own option.
ACTIVE_LLM_PROVIDER: Literal["openai", "gemini", "huggingface"] = "gemini" ACTIVE_EMBEDDING_PROVIDER: Literal["openai", "gemini", "huggingface"] = "gemini"
- Vector database URL
VECTOR_CLIENT_URL: str = "http://localhost:19530"
- API Key
- Choose your LLM provider and Embedding provider
-
Run API Server
python -m app.main
-
Custom LLM and Embedding
-
Implement basic parameters in your class.
class CustomLLMConfig(BaseModel): """Custom LLM config""" api_key: SecretStr chat_model_name: str temperature: float max_tokens: int embedding_model_name: str embedding_dim: int gpu: bool
-
Add custom options to lists and functions
# LLM settings ACTIVE_LLM_PROVIDER: Literal["openai", "gemini", "huggingface", "custom"] = "custom" ACTIVE_EMBEDDING_PROVIDER: Literal["openai", "gemini", "huggingface", "custom"] = "custom"
def get_active_llm_config(self) # ... provider_map = { "openai": OpenAIConfig(...), "gemini": GeminiConfig(...), "huggingface": HuggingFaceConfig(...), "custom": CustomLLMConfig(...), } # ... def get_active_embedding_config(self) # ... provider_map = { "openai": OpenAIConfig(...), "gemini": GeminiConfig(...), "huggingface": HuggingFaceConfig(...), "custom": CustomLLMConfig(...), } # ...
-
Implement methods for invoking LLM and Embedding models.You need to create your own class and implement the interface.
# Example # app.services.chat # app.services.embedding class CustomChat(ChatInterface): def __init__(...) async def chat(...) async def summary_docs(...) class CustomEmbedding(EmbeddingInterface): def __init__(...) async def create_embeddings(...)
-
Register your class in the factory pattern.
# app.services.llm_factory class LLMFactory: def get_llm(): if llm_provider == "custom": return CustomChat(...) def get_embedding(): if llm_provider == "custom": return CustomEmbedding(...)
-
-
Customize Crawling Scope
You can find all bank-related crawler configurations in this file. Each parameter defines the scope of data extraction on the webpage.
To modify the crawler behavior, adjust Card List Spider and Card Feature Spider to implement your custom crawling logic.
A RESTful API for web crawling, data retrieval, and conversation.
-
POST http://localhost:1108/api/v1/crawler/card-listContent-Type: application/json
{ "bank_code": "taishin", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/" }-
bank_code string, Required
The bank’s unique identifier -
url string, Required
The URL of the bank’s card overview page
{ "status": "success", "message": "The crawling job has been submitted successfully.", "data": { "job_id": "85cc5f76-8184-4d1d-8936-ea0ca1ca84f3", "list_id": "list-73e807e2472d415888e968ceeeb63e87", "bank_code": "taishin", "card_name": null, "card_id": null }, "error": null }-
status string
The execution status of the API, eithersuccessorfail. -
message string
A brief status message of the API execution. -
data object or null
-
job_id string
The execution job ID, which is unique for each run. -
list_id string or null
The list ID that stores the crawling information, used for retrieval. -
bank_code string or null
The bank code crawled for this task. -
card_name string or null
The card name crawled for this task. -
card_id string or null
The unique card ID used for conversation and deletion; each card has a different ID.
-
-
error string or null
If an error occurs during execution, this field records the error message.
-
-
POST http://localhost:1108/api/v1/crawler/batch/card-listContent-Type: application/json
[ { "bank_code": "taishin", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/" }, { "bank_code": "ctbcbank", "url": "https://www.ctbcbank.com/twrbo/zh_tw/cc_index/cc_product/cc_introduction_index.html" } ]The parameters are the same as the non-batch list crawler, except they are modified to
arrayformat.{ "status": "success", "message": "The crawling job has been submitted successfully.", "data": [ { "job_id": "cbf76934-815a-4128-bb97-f7e86145a79b", "list_id": "list-928db6b5417e47edbe9985d14c2d1c93", "bank_code": "taishin", "card_name": null, "card_id": null }, { "job_id": "cbf76934-815a-4128-bb97-f7e86145a79b", "list_id": "list-d6a617d06cc54867a59eb63611dd061f", "bank_code": "ctbcbank", "card_name": null, "card_id": null } ], "error": null }The parameters are the same as the non-batch list crawler, with only the
datafield changed to an array format. -
GET http://localhost:1108/api/v1/crawler/card-list/{list_id}Path parameters
http://localhost:1108/api/v1/crawler/card-list/list-928db6b5417e47edbe9985d14c2d1c93- list_id string, Required
The list ID used for querying information.
{ "status": "success", "message": "Successfully crawled all card information.", "data": { "bank_code": "taishin", "bank_name": "台新銀行", "pages": [ { "page_url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/index.html?type=type1", "cards": [ { "title": "太陽卡/玫瑰卡(切換刷方案)", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/cg046/card001/" }, { "title": "@GoGo卡", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/cg021/card001/" }, ] }, { "page_url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/index.html?type=type3", "cards": [ { "title": "玫瑰卡", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/cg013/card0001/" }, ] } ] }, "error": null }-
bank_code string
The bank code crawled for this task. -
bank_name string
The bank name crawled for this task. -
pages array
-
page_url string
The URL of the card list. -
cards array
-
title string
Card name -
url string
Card URL
-
-
- list_id string, Required
-
POST http://localhost:1108/api/v1/crawler/card-featureContent-Type: application/json
{ "bank_code": "taishin", "card_name": "FlyGo卡", "card_url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/cg018/flygo/" }-
bank_code string, Required
The bank’s unique identifier -
card_name string, Required
Card name -
card_url string, Required
Card URL
{ "status": "success", "message": "The crawling job has been submitted successfully.", "data": { "job_id": "38be2bcf-67ca-4882-b241-18f8b8af32bf", "list_id": null, "bank_code": "taishin", "card_name": "FlyGo卡", "card_id": "card-870283de1f264befabbae33cdb1bf5c3" }, "error": null }-
job_id string
The execution job ID, which is unique for each run. -
list_id string or null
The list ID that stores the crawling information, used for retrieval. -
bank_code string or null
The bank code crawled for this task. -
card_name string or null
The card name crawled for this task. -
card_id string or null
The unique card ID used for conversation and deletion; each card has a different ID.
-
-
POST http://localhost:1108/api/v1/crawler/batch/card-featureContent-Type: application/json
[ { "bank_code": "taishin", "card_name": "FlyGo卡", "card_url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/cg018/flygo/" }, { "bank_code": "taishin", "card_name": "太陽卡/玫瑰卡", "card_url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/cg046/card001/" } ]The parameters are the same as the non-batch card crawler, except they are modified to
arrayformat.{ "status": "success", "message": "The crawling job has been submitted successfully.", "data": [ { "job_id": "38be2bcf-67ca-4882-b241-18f8b8af32bf", "list_id": null, "bank_code": "taishin", "card_name": "FlyGo卡", "card_id": "card-870283de1f264befabbae33cdb1bf5c3" }, { "job_id": "4abbb502-2301-73f4-216e-ad72a034c35f", "list_id": null, "bank_code": "taishin", "card_name": "太陽卡/玫瑰卡", "card_id": "card-870283de1f264befabbae33cdb1bf5c3" }, ], "error": null }The parameters are the same as the non-batch card crawler, with only the
datafield changed to an array format. -
GET http://localhost:1108/api/v1/crawler/card-feature/{job_id}/statusPath parameters
http://localhost:1108/api/v1/crawler/card-feature/38be2bcf-67ca-4882-b241-18f8b8af32bf/status- job_id string, Required
The execution job ID
{ "status": "success", "message": "Query job status successfully.", "data": { "job_status": true }, "error": null }- job_status boolean
The status of the execution job.
- job_id string, Required
-
POST http://localhost:1108/api/v1/card/qaContent-Type: application/json
{ "question": "信用卡的回饋額度", "bank_code": "taishin", "card_id": "card-870283de1f264befabbae33cdb1bf5c3" }-
question string, Required
User’s question -
bank_code string or null
Bank code used to restrict bank queries. -
card_id string or null
Card ID used to restrict card queries.
You may use any combination of
bank_codeandcard_idto restrict the search range.{ "status": "success", "message": "Query successfully.", "data": { "response": "台新銀行信用卡的相關回饋額度如下...", "sources": [ { "text": "這是一張台新銀行信用卡的資訊頁面,主要介紹了 FlyGo...", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/future/bf0a55e5-1f47-11f0-b432-0050568c09e3", "card_id": "card-c1e620f1afb049dea8bcd238e29b80c1", "card_name": "FlyGo卡", "bank_code": "taishin" }, { "text": "台新銀行FlyGo卡提供精選航旅最高5%,海外最高3%回饋...", "url": "https://www.taishinbank.com.tw/TSB/personal/credit/intro/overview/future/89cd913a-8172-11ef-b432-0050568c09e3", "card_id": "card-c1e620f1afb049dea8bcd238e29b80c1", "card_name": "FlyGo卡", "bank_code": "taishin" }, ] }, "error": null }-
response string
AI’s summary response -
sources array
-
text string
Source text of the data -
url string
Source url of the data -
card_id string
Source card ID of the data -
card_name string
Source card name of the data -
bank_code string
Source bank code of the data
-
-
-
DELETE http://localhost:1108/api/v1/card/{card_id}Path parameters
http://localhost:1108/api/v1/card/card-c1e620f1afb049dea8bcd238e29b80c1- card_id: string, Required The card’s unique ID
- Status: 204(Success)
