feat: Add TuShare data collector (incremental update & resume support) #2067
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces a new data collector for TuShare (daily frequency) under
qlib/scripts/data_collector/tushare/collector.py.It provides a robust ETL pipeline similar to the Yahoo collector but tailored for TuShare's API and A-share market features.
Key features:
update_data_to_bindumps only newly added dates (using a temporary directory) to improve performance, instead of full redump.L,D,P) to avoid survivorship bias.date, open, high, low, close, volume, [amount], factor, symbol.amountis optional.trade_calfor calendar acquisition, with a fallback to Qlib's default.Motivation and Context
The existing collectors (Yahoo) are less stable for CN market data. Users often need a production-ready TuShare collector that supports large-scale historical fetch (with rate limits) and daily incremental updates without redownloading entire history. This implementation fills that gap with a structure consistent with Qlib's existing collectors.
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.pyunder upper directory ofqlib.Test Details:
Verified with local unit/integration tests (
pytest tests/test_tushare_collector.py- Note: test file not included in this PR to keep it minimal, but logic verified):update_data_to_bincorrectly identifies the incremental window and creates temp storage.RuntimeErrorifqlib_data_1d_diris missing/invalid during update.Screenshots of Test Results (if appropriate):
(All passed locally)
Types of changes