Replies: 3 comments
-
Firstly don't set the heap size that high, most of the memory usage for TDB2 (assuming that is your database backend) is off-heap via memory mapped files so you want to leave plenty of memory for the OS to use for that. I wouldn't typically go higher than 8GB heap at your scale (and even that is likely overkill unless your queries are very memory intensive) https://jena.apache.org/documentation/tdb/faqs.html#java-heap
Often performance is data driven so if you can share your dataset that's great, if not (e.g. it's commercially sensitive) can you generate a sample dataset that shows the same performance problems with your queries that you can share?
That's insufficient information for us to help you.
There's some documentation on TDB optimisation at https://jena.apache.org/documentation/tdb/optimizer.html The default query optimiser code is at https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/OptimizerStd.java and you can see the various ARQ configuration symbols that turn specific optimisations on/off in the code there You may also find it helpful to use the |
Beta Was this translation helpful? Give feedback.
-
|
You can also view the various algebra forms of your queries via our online SPARQL Query Validator |
Beta Was this translation helpful? Give feedback.
-
|
Hi! Thanks a lot for the detailed explanation. That’s very helpful — we’ll definitely lower the heap size accordingly to leave more room for off-heap memory. As for the queries, they all follow the same general structure: During testing, the query pattern stays the same — only the temporal range changes between runs, starting at 15 minutes and gradually expanding up to 10 years. I’ve also made a small sample dataset publicly available so you can reproduce the issue: https://github.com/ddvlanck/traffic-measurements-synthetic-data. These are some of the ranges for which the query failed:
I’ll take a closer look at the TDB optimiser documentation and use qparse to inspect the optimised algebra, as you suggested. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! We are currently evaluating Apache Jena Fuseki by loading a dataset and running a set of SPARQL queries against it.
However, I’m running into performance issues that seem unexpected for the dataset size.
We are testing with a dataset containing ~16 million triples, and we’ve configured a 60-second timeout for query responses in our experiment setup.
All SELECT queries—ranging from those expected to return around 10 rows up to those expected to return ~1M rows—hit the timeout limit. Even the queries with very small expected result sets fail to return within 60 seconds.
The system running Apache Jena Fuseki has 128 GB RAM available, so hardware limitations don’t appear to be the cause.
🧪 Setup
5.6.0We updated the
JAVA_OPTIONSto-Xms90g -Xmx90g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch -XX:+UseStringDeduplication💡 What we’re looking for
We’d like to understand whether there are configuration options, tuning parameters, or indexing strategies we should apply to improve query performance.
Any guidance on:
…would be greatly appreciated.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions