Skip to content

Conversation

@Tucsky
Copy link
Owner

@Tucsky Tucsky commented Jan 16, 2026

The goal is to get rid of influx and improve performance, cost, versatility

- Add io.js for file path generation, record reading/writing, and metadata management.
- Introduce resample.js to handle resampling of market data to higher timeframes with upsert semantics.
- Create write.js for appending and upserting bars to binary files, including null bar handling for dense indexing.
- Implement functions for reading and writing binary records, ensuring efficient file operations and metadata updates.
@adeacetis
Copy link
Collaborator

Looking at it.

@Tucsky
Copy link
Owner Author

Tucsky commented Jan 16, 2026

Looking at it.

Thanks 🙏
It is a wip (on gh mobile rn and I don't see the draft option), but I just plan to add time based segmented binaries next and it should be good to go

Copy link
Collaborator

@adeacetis adeacetis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tucsky,
Here's my 2 cents about the commit on error handling.
Hope that helps.

].includes(code)
}

async requestWithRetry(
Copy link
Collaborator

@adeacetis adeacetis Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea to implement, but http requests, error handling, retries, logging should be managed by a HttpClient base class and not the Exchange class.
For instance, formatErrorForLog and getErrorStatus are generic helpers. They do not need to live here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's been a long-lasting issue in the code, but since you are there maybe we can start to refactor it. Maybe I can shime in, once you're done with the binaries storage

) {
let attempt = 1

while (true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While (true) ? Is this AI Slop ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is...😅

}

console.warn(
`[${this.id}] retrying ${label}${pairLabel} (${attempt + 1}/${maxAttempts})`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to re-architecture this, please pass the Exchange id as an argument to the function instead of referencing the private property.

@adeacetis
Copy link
Collaborator

Also, the eslint commit (thank you for that) could have been part of another PR.

Eslint improves the version number to 1.2.1
The two other commits should trigger a minor version increase to 1.3.0

@adeacetis
Copy link
Collaborator

adeacetis commented Jan 16, 2026

The goal is to get rid of influx and improve performance, cost, versatility

I subscribe to the fact we need to move on from Influx v1 but there are other possibilities in the market such as TimeScaleDB or QuestDB which are both using PostgreSQL's SQL Protocol.
Ideally, we want to be able to keep InfluxDB v1 implementation for legacy but add v3.
Also, with new storage should come import services that that would source data from legacy sources such as files and influx in order to populate new targets implementation (here binaries).

@adeacetis adeacetis marked this pull request as draft January 16, 2026 13:20
@adeacetis
Copy link
Collaborator

Converted the PR to draft.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove this script right now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question: why deleting coinalize? It's the sole data provider we use to backfill historical data. Is there a replacement ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check but i think it has not been working since 2-3 years due to Cloudflare blocking the requests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sato-ke has shared an implementation in the Discord server no later than last year.
https://discord.com/channels/1110160121813803058/1110584647789838437/1380748613843423362

.prettierrc.js Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bye prettier.


export default [
{
ignores: ["node_modules/**", "dist/**", "coverage/**", "tmp/**", "eslint.config.mjs"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can we add the data/**. I think this is the default path for collection data. Maybe not recommended but that could make eslint very slow.

- Updated BinariesStorage to read from segmented files, computing which segments intersect the requested time range.
- Introduced new functions for generating file paths for segments and reading/writing segment metadata.
- Modified resampling logic to handle segmented base and target timeframe files, ensuring proper aggregation and upsert semantics.
- Enhanced upsertBars function to manage bars by segment, allowing for efficient writing and handling of overwrites and appends.
- Updated documentation to reflect changes in file structure and functionality.
@Tucsky
Copy link
Owner Author

Tucsky commented Jan 18, 2026

@adeacetis

but there are other possibilities in the market such as TimeScaleDB or QuestDB which are both using PostgreSQL's SQL Protocol.

The storage layer already behaves as a pluggable abstraction: legacy Influx stays for compatibility.
Binaries is an additional backend optimized for a very specific workload. One of the main motivations is to be able to run aggr-server efficiently on low-power hardware (like a Raspberry Pi 5), remove the dependency on a costly VPS, and streamline the entire production -> cold backup pipeline around a single, predictable storage format (stop dealing with tick data altogether)
If later we want to add TimescaleDB or QuestDB, that can still happen, but binaries covers a very specific, high-value path today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants