Skip to content

Conversation

@Ja-Gk-00
Copy link
Contributor

Added benchmarks for measuring the performance of flat ser/des to tests.

@Ja-Gk-00 Ja-Gk-00 marked this pull request as draft November 19, 2025 08:50
@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
pyjelly/integrations/generic/generic_sink.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Ja-Gk-00 Ja-Gk-00 marked this pull request as ready for review November 20, 2025 14:06
@Ja-Gk-00
Copy link
Contributor Author

Ok, snippet of some provisional results (they were executed for a relatively small file, so they might not be the most accurate, just to show that running benchmarks works).
bench_results


def parse(self, input_file: IO[bytes]) -> None:
from pyjelly.integrations.generic.parse import parse_jelly_to_graph
from pyjelly.integrations.generic.parse import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do irrelevant changes in PRs. Please revert this and other changes like this.

"mypy>=1.8; platform_python_implementation == 'CPython'",
"hatchling>=1.24",
"hatch-mypyc; platform_python_implementation == 'CPython'",
"mypy>=1.8; platform_python_implementation == 'CPython'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

# version 3.1 required for python 3.14 support
ci = ["cibuildwheel>=3.1.0,<4 ; python_version >= '3.11'"]

# version 3.11 required for python 3.14 support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what?


# version 3.11 required for python 3.14 support
ci = ['cibuildwheel>=3.1.0,<4 ; python_version >= "3.11"']
bench = ["pytest-benchmark>=5.2.1", "rdflib>=7.1.4"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to "benchmark" to make it clearer what this is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just "benchmarks" instead of "benchmark_tests"?

"--in-jelly-quads",
type=str,
default=None,
help="optional Jelly quads file; if none, generated in-memory from nq slice.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "nq slice"?

g.addoption("--iterations", type=int, default=1, help="iterations per round.")


def _slice_lines_to_bytes(path: Path, limit: int) -> bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no comments here or anywhere else, again. This makes the code rather hard to review.

Please make this code readable, and then I will review it again.


@pytest.fixture(scope="session")
def nt_graph(nt_bytes_sliced: bytes) -> Graph:
g = Graph()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you even use Graph? For buffering in-memory you must use an array of statements, otherwise you will get nonsensical results. Same with Dataset, of course.

pedantic_cfg: dict[str, int],
limit_statements: int,
) -> None:
benchmark.pedantic(parse_nt_bytes, args=(nt_bytes_sliced,), **pedantic_cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are measuring here not the parsing speed, but the speed with which rdflib can insert stuff into the Graph. This is meaningless. You must only iterate over the resulting triples/quads, nothing else.

@Ostrzyciel
Copy link
Member

Ok, snippet of some provisional results (they were executed for a relatively small file, so they might not be the most accurate, just to show that running benchmarks works). bench_results

These results don't make any sense.

After you fix the issues I pointed out, please do a run again, until you get some results that actually make some sense.

Please do NOT put serialization and deserialization results in the same table, this makes the table hard to understand. Just run the two benchmarks separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants