[PD1-296] Initial version: SDK for spans and metrics #1

joshuanapoli · 2025-05-12T02:40:20Z

Summary

This repo is meant to be made public and published on PyPI as "cvec". The public package should make it easy for clients to import it.

Add SDK functions for read-only access to spans and metrics.
In this version, the SDK directly integrates with the CVector Timescale database.
The client connects using a tenant-specific user, with strictly limited permissions.
Later, these implementations will move into our back-end. The back-end API will avoid breaking clients when we redesign our database.

Span

A span is a period of interest, such as an experiment, a baseline recording session, or an alarm. The initial state of a Span is implicitly defined by a period where a given metric has a constant value.

The newest span for a metric does not have an end time, since it has not ended yet (or has not ended by the finish of the queried period).

In a future version, spans are mutable. An API will allow the client to annotate metrics, and edit the start/end times.

Metric

A metric is a named set of time-series data points pertaining to a particular resource (for example, the value reported by a sensor). A metric has a lifecycle of being activated or added to the system (birth_at) and later removed from the system (death_at). Metrics can have numeric or string values. Boolean values are mapped to 0 and 1.

Testing

Consider whether anything here is confidential, and should not be published to PyPI.
What do you think of the names "span" and "metric"?
Does the definition of span and metric make sense?
Is the SDK "pythonic"?
Are there useless comments, which should be removed?
Construct test cases and check that the SDK outputs make sense.

amy-nihao · 2025-05-15T13:58:12Z

src/cvec/cvec.py

+
+                # None indicates that the end time is not known; the span extends beyond
+                # the query period.
+                raw_end_at = None


Does this not mean the first Span object's raw_end_time is always None? The function signature specified that if end_at is given that will be the raw_end_time of the newest Span

I changed the documentation to match the implemented behavior in ecb633d

richardzyx

Really significant effort, appreciate the hard work on this! Concept make sense to me, code interface seems fine.

One comment/question is on build & release pipeline: I assume we'd want to package this into a .whl wheel file with .pyc compiled py bytecode? Optionally we can also obfuscate the source code with pyminifier or something similar?

There's also the option of distributing to private registry like AWS Codeartifact, but controlling access seems to be painful with the right security measures.

linear · 2025-05-16T12:43:03Z

PD1-296 Publish CVector SDK for Ammobia

Initial Release

Data Model

This SDK integrates directly with CVector's database. Each tenant has a schema and a database user, both named for the tenant. The API Key is the password of the user. The database user is restricted to only have access to the tenant's schema. Here are the available database tables:

CREATE TABLE tag_data (
    tag_name_id INTEGER NOT NULL,
    tag_value_changed_at TIMESTAMP WITH TIME ZONE,
    tag_value DOUBLE PRECISION
)

CREATE TABLE tag_data_str (
    tag_name_id INTEGER NOT NULL,
    tag_value_changed_at timestamptz NOT NULL,
    tag_value text
);

CREATE TABLE tag_names (
    id SERIAL PRIMARY KEY,
    normalized_name VARCHAR NOT NULL,
    birth_at TIMESTAMPTZ NULL,
    death_at TIMESTAMPTZ NULL
);

CREATE VIEW metrics AS
 SELECT td.tag_value AS value,
    td.tag_value_changed_at AS "time",
    tn.normalized_name AS metric
   FROM tag_data td
     JOIN tag_names tn ON td.tag_name_id = tn.id;

CVec Class

The SDK provides an API client class named CVec with the following functions.

__init__(?host, ?tenant, ?api_key, ?default_time_range)
Setup the SDK with the given host and API Key. The host and API key are loaded from environment variables CVEC_HOST, CVEC_TENANT, CVEC_API_KEY, if they are not given as arguments to the constructor. The default_time_range constrains most API keys, and can be overridden by the time_range argument to each API function.
get_spans(tag_name, ?time_range, ?limit)
Return all of the time spans where a tag has a constant value. The function returns a list of time-ranges with the value for each time-range. Returns a list of spans. Each span has the following fields: {id, tag_name, value, begin_at, end_at, raw_begin_at, raw_end_at, metadata}. In a future version of the SDK, spans can be annotated, edited, and deleted.
get_metric_data(?tag_names, ?time_range)
Return all data-points within a given time-range, optionally selecting a given list of tags. The return value is a Pandas DataFrame with three columns: tag_name, time, value. One row is returned for each tag value transition.
get_tags(?time_range)
Return a list of tags that had at least one transition in the given time range. All tags are returned if no time_range is given. Each tag has {id, name, birth_at, death_at}.

Future Features

Out of scope for this issue.

sample_metric_data(bucket_width, ?bucket_function, ?aggregate_function, ?bucket_offset, ?tag_names, ?time_range)
Get metric data, resampled on regular time buckets. The only supported bucket_function is LOCF, meaning Last Observation Carried Forward. The only supported aggregate_function is AVERAGE. The function returns a Pandas DataFrame with a column for each tag_name, plus a time column.
Island detection with user-defined criteria: instead of this, define a synthetic tag based on a function of other tag values. Then use get_spans based on the synthetic tag.

joshuanapoli · 2025-05-16T17:47:17Z

One comment/question is on build & release pipeline: I assume we'd want to package this into a .whl wheel file with .pyc compiled py bytecode? Optionally we can also obfuscate the source code with pyminifier or something similar?

The pyc files are specific to particular versions of Python. PEP 3147 was implement and adds the possibility of distributing a library using only pyc files (by compiling for every version of Python), but I can't find any distribution tool that creates this kind of archive. The wisdom of the internet: "If you don't want to distribute source, then you shouldn't use Python."

I'll add pyminifier with the obfuscate option, remove the schema documentation, and won't make this repo public.

joshuanapoli · 2025-05-17T23:55:40Z

Optionally we can also obfuscate the source code with pyminifier or something similar?

I tried pyminifier, but it makes the package unusable. I'll see if I can move the implementation to the back-end, so that this library doesn't expose our database structure.

joshuanapoli added 15 commits May 11, 2025 21:53

chore: Add .aider* to .gitignore

d102363

docs: Add README with data model and API documentation

eb8aed2

chore: install poetry

0513e53

chore: add black

945eb9b

chore: add pytest

bdbe1b1

feat: Create CVec class scaffold with basic methods

22a210a

feat: Add pandas as a project dependency

f39ad3c

chore: setup aider

0ea449b

feat: Implement CVec class constructor with env var fallback.

5b8a5b3

style: Run linter on cvec.py

f296e53

test: Add unit tests for the CVec constructor

e1974da

feat: Expose CVec class in src/cvec/__init__.py

096d6a6

test: Import CVec directly from package

dd35c3d

ci: Add GitHub Action for black and pytest checks

3613f7d

build: Update CI workflow and pyproject.toml for Python 3.9-3.13 support

4d468dc

joshuanapoli force-pushed the jn/initial branch from c0f0338 to 4d468dc Compare May 12, 2025 02:41

joshuanapoli added 14 commits May 11, 2025 22:42

ci: Run CI workflow on multiple Python versions

d0cbc90

feat: Add psycopg2-binary dependency for PostgreSQL support

34f4af3

docs: Add database schema documentation to README

5efaecf

refactor: Replace time_range with start_at and end_at parameters

05344e7

style: Run linter on cvec.py

1c6aeb8

fix: Corrected begin_at to start_at in get_spans documentation

05f0ebd

feat: Implement get_spans from tag_data and tag_data_str.

d58638c

style: Run linter on cvec.py

e3e582f

docs: Clarify tag_data and tag_data_str table descriptions in README

a541baf

feat: Modify get_spans to only report spans within time period.

380d757

style: Apply linting to cvec.py

b426e94

docs: Improve docstring for get_time_spans_where_value_changed

90de5de

feat: Allow unbounded start_at/end_at in get_spans and spans' end_at

ce499f7

style: Apply linter formatting to cvec.py

2b48944

joshuanapoli marked this pull request as ready for review May 15, 2025 13:28

joshuanapoli changed the title ~~Initial version: direct database access~~ Initial version: SDK for spans and metrics May 15, 2025

amy-nihao reviewed May 15, 2025

View reviewed changes

docs: Clarify raw_end_at description in Span object documentation

ecb633d

richardzyx approved these changes May 15, 2025

View reviewed changes

joshuanapoli added 11 commits May 15, 2025 11:01

feat: Add lint script to run black and mypy

81d2891

refactor: Rename tag_name to name in Span and get_spans, update docs

570ad68

refactor: Rename tag_name to name for spans and get_spans function

d737b78

chore: Use lint.sh script for linting commands

e872c72

refactor: Rename tag_names to names in get_metric_data method

9285909

feat: Rename tag_name column to name in get_metric_data output

3f06955

feat: Return value_double and value_string columns in get_metric_data

c918602

style: Apply linter to fix code formatting issues

664bc12

fix: Correctly handle missing values in get_metric_data test

d8e883a

feat: Format Span timestamps in RFC 3339 format in repr

2e5ccd3

docs: Add example usage and class documentation to README

06009c4

joshuanapoli changed the title ~~Initial version: SDK for spans and metrics~~ [PD1-296] Initial version: SDK for spans and metrics May 16, 2025

chore: use ruff linter

a13d8eb

joshuanapoli force-pushed the jn/initial branch from e1d13e9 to a13d8eb Compare May 17, 2025 23:41

joshuanapoli added 4 commits May 19, 2025 13:54

docs: Update README and pyproject metadata

7ab7697

docs: Complete Metrics documentation in the cvec SDK.

8d3e14a

docs: Add documentation for get_metrics and get_metric_data functions

6a11898

feat: Update SDK documentation and examples, fix minor issues

364451b

joshuanapoli force-pushed the jn/initial branch from 23cb85c to 364451b Compare May 19, 2025 19:05

fix: psycopg requires list rather than tuple

56f10f6

joshuanapoli merged commit d99e101 into main May 19, 2025
4 checks passed

joshuanapoli deleted the jn/initial branch May 19, 2025 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PD1-296] Initial version: SDK for spans and metrics #1

[PD1-296] Initial version: SDK for spans and metrics #1

Uh oh!

joshuanapoli commented May 12, 2025 •

edited

Loading

Uh oh!

amy-nihao May 15, 2025

Uh oh!

joshuanapoli May 15, 2025 •

edited

Loading

Uh oh!

richardzyx left a comment

Uh oh!

linear bot commented May 16, 2025

Initial Release

Data Model

CVec Class

Future Features

Uh oh!

joshuanapoli commented May 16, 2025

Uh oh!

joshuanapoli commented May 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[PD1-296] Initial version: SDK for spans and metrics #1

[PD1-296] Initial version: SDK for spans and metrics #1

Uh oh!

Conversation

joshuanapoli commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Span

Metric

Testing

Uh oh!

amy-nihao May 15, 2025

Choose a reason for hiding this comment

Uh oh!

joshuanapoli May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardzyx left a comment

Choose a reason for hiding this comment

Uh oh!

linear bot commented May 16, 2025

Initial Release

Data Model

CVec Class

Future Features

Uh oh!

joshuanapoli commented May 16, 2025

Uh oh!

joshuanapoli commented May 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joshuanapoli commented May 12, 2025 •

edited

Loading

joshuanapoli May 15, 2025 •

edited

Loading