Skip to content

Conversation

@simonmar
Copy link
Collaborator

@simonmar simonmar commented Jan 8, 2026

This commit adds an LMDB-based storage backend.

Why?

  • It's faster than RocksDB (see below)
  • It has no dependencies (all the code is bundled, just a couple of .c files).
  • No slow compaction stage during DB creation
  • Backup is just copying the DB, so restore can be instantaneous once the DB is downloaded.
  • Size of the on-disk DB may actually be smaller than RocksDB, but needs more experiments

How do you use it?

Adding --lmdb to the glean or glean-server command line tells it to use LMDB for any DBs created. If this option is given, Glean will still be able to work with existing RocksDB DBs, including restoring them from remote storage. (and vice-versa, if --lmdb is not given, then restoring LMDB DBs from remote storage is still supported).

How does it perform?

Testing indexing of Stackage using the Haskell indexer.

** index

lmdb: 189s, 2.6GB
rocksdb: 216s, 0.9GB

** backup:

lmdb: 5.7s, 0.8GB
rocksdb: 1.6s 0.9GB

** restore:

lmdb: 2.6s 2.1GB
rocksdb: 2.15s 0.9GB

Note that backup with LMDB just copies the DB and compresses it, most of the time is spent in compression, which can be tuned to be either faster / less compression or slower / more compression. Currently using zstd level 8 which seems a reasonable compromise.

Also note that backup produces a squashfs which can be mounted directly (instantaneous restore with decompression on the fly). See #669

C++ Indexing

I indexed all of LLVM + Clang. This is the time to write the data to the DB and do deriving (since otherwise the indexing time is dominated by the C++ indexer itself):

  • lmdb: 30.8s 754MB
  • rocksdb: 52.4s 256MB

Backup:

  • lmdb: 1.8s 213MB
  • rocksdb: 0.8s 252MB

Restore:

Benchmarking with query-bench

* RocksDB (9 Jan)

benchmarking complex-parallel
time                 4.722 s    (4.188 s .. 5.182 s)
                     0.998 R²   (0.998 R² .. 1.000 R²)
mean                 5.051 s    (4.859 s .. 5.403 s)
std dev              341.9 ms   (16.30 ms .. 415.4 ms)
variance introduced by outliers: 20% (moderately inflated)

benchmarking simple
time                 35.87 μs   (35.32 μs .. 36.95 μs)
                     0.989 R²   (0.978 R² .. 0.995 R²)
mean                 41.35 μs   (39.91 μs .. 42.80 μs)
std dev              5.186 μs   (4.647 μs .. 6.200 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking nested
time                 55.91 μs   (55.55 μs .. 56.57 μs)
                     0.993 R²   (0.981 R² .. 1.000 R²)
mean                 57.59 μs   (56.00 μs .. 61.65 μs)
std dev              7.433 μs   (1.047 μs .. 13.81 μs)
variance introduced by outliers: 89% (severely inflated)

benchmarking page
time                 21.88 ms   (20.42 ms .. 23.41 ms)
                     0.952 R²   (0.876 R² .. 0.991 R²)
mean                 22.14 ms   (21.08 ms .. 24.16 ms)
std dev              3.260 ms   (1.508 ms .. 5.239 ms)
variance introduced by outliers: 63% (severely inflated)

benchmarking pagenested
time                 43.15 ms   (41.51 ms .. 45.74 ms)
                     0.991 R²   (0.975 R² .. 1.000 R²)
mean                 42.00 ms   (41.28 ms .. 43.37 ms)
std dev              1.910 ms   (749.0 μs .. 2.976 ms)
variance introduced by outliers: 13% (moderately inflated)

benchmarking compile
time                 328.2 μs   (326.4 μs .. 330.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 327.3 μs   (326.6 μs .. 328.4 μs)
std dev              2.961 μs   (2.361 μs .. 4.059 μs)

benchmarking compile2
time                 95.93 μs   (95.46 μs .. 96.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 96.06 μs   (95.72 μs .. 96.69 μs)
std dev              1.519 μs   (865.7 ns .. 2.440 μs)

benchmarking compile3
time                 48.67 μs   (48.44 μs .. 48.90 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 48.87 μs   (48.71 μs .. 49.12 μs)
std dev              670.7 ns   (545.1 ns .. 856.4 ns)

benchmarking array_prefix
time                 306.2 ms   (274.9 ms .. 337.6 ms)
                     0.997 R²   (0.987 R² .. 1.000 R²)
mean                 310.7 ms   (306.6 ms .. 321.7 ms)
std dev              8.283 ms   (233.8 μs .. 10.44 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking stacked/page
time                 23.02 ms   (21.20 ms .. 24.57 ms)
                     0.977 R²   (0.956 R² .. 0.992 R²)
mean                 21.80 ms   (20.96 ms .. 22.68 ms)
std dev              1.934 ms   (1.618 ms .. 2.536 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking stacked/pagenested
time                 41.75 ms   (40.69 ms .. 43.81 ms)
                     0.995 R²   (0.988 R² .. 0.999 R²)
mean                 41.74 ms   (41.10 ms .. 42.64 ms)
std dev              1.536 ms   (1.034 ms .. 1.913 ms)

* LMDB (9 Jan)

benchmarking complex-parallel
time                 3.060 s    (2.493 s .. 3.351 s)
                     0.996 R²   (0.991 R² .. 1.000 R²)
mean                 3.680 s    (3.401 s .. 4.208 s)
std dev              516.3 ms   (6.608 ms .. 613.0 ms)
variance introduced by outliers: 24% (moderately inflated)

benchmarking simple
time                 31.78 μs   (31.26 μs .. 32.59 μs)
                     0.994 R²   (0.992 R² .. 0.996 R²)
mean                 35.66 μs   (34.61 μs .. 36.79 μs)
std dev              3.901 μs   (3.329 μs .. 4.924 μs)
variance introduced by outliers: 86% (severely inflated)

benchmarking nested
time                 49.22 μs   (48.66 μs .. 50.16 μs)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 50.49 μs   (49.64 μs .. 51.86 μs)
std dev              3.597 μs   (2.639 μs .. 4.710 μs)
variance introduced by outliers: 71% (severely inflated)

benchmarking page
time                 15.30 ms   (14.10 ms .. 16.63 ms)
                     0.963 R²   (0.933 R² .. 0.983 R²)
mean                 15.70 ms   (15.11 ms .. 16.45 ms)
std dev              1.653 ms   (1.265 ms .. 2.442 ms)
variance introduced by outliers: 52% (severely inflated)

benchmarking pagenested
time                 24.94 ms   (22.29 ms .. 27.82 ms)
                     0.947 R²   (0.903 R² .. 0.981 R²)
mean                 22.95 ms   (21.91 ms .. 24.36 ms)
std dev              2.825 ms   (2.143 ms .. 3.701 ms)
variance introduced by outliers: 53% (severely inflated)

benchmarking compile
time                 349.6 μs   (339.6 μs .. 363.0 μs)
                     0.991 R²   (0.987 R² .. 0.995 R²)
mean                 350.7 μs   (342.1 μs .. 360.2 μs)
std dev              29.73 μs   (25.58 μs .. 36.64 μs)
variance introduced by outliers: 72% (severely inflated)

benchmarking compile2
time                 97.12 μs   (95.42 μs .. 100.3 μs)
                     0.989 R²   (0.979 R² .. 0.998 R²)
mean                 101.6 μs   (98.86 μs .. 106.6 μs)
std dev              12.01 μs   (7.345 μs .. 21.75 μs)
variance introduced by outliers: 86% (severely inflated)

benchmarking compile3
time                 48.31 μs   (48.11 μs .. 48.53 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 48.79 μs   (48.34 μs .. 50.48 μs)
std dev              2.691 μs   (517.0 ns .. 5.640 μs)
variance introduced by outliers: 60% (severely inflated)

benchmarking array_prefix
time                 311.7 ms   (296.2 ms .. 346.3 ms)
                     0.996 R²   (0.975 R² .. 1.000 R²)
mean                 308.2 ms   (303.7 ms .. 319.8 ms)
std dev              8.794 ms   (1.644 ms .. 11.58 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking stacked/page
time                 19.68 ms   (17.41 ms .. 22.57 ms)
                     0.929 R²   (0.884 R² .. 0.970 R²)
mean                 18.42 ms   (17.20 ms .. 19.85 ms)
std dev              3.087 ms   (2.367 ms .. 4.283 ms)
variance introduced by outliers: 71% (severely inflated)

benchmarking stacked/pagenested
time                 21.98 ms   (20.72 ms .. 23.49 ms)
                     0.982 R²   (0.965 R² .. 0.991 R²)
mean                 23.61 ms   (22.42 ms .. 25.52 ms)
std dev              3.453 ms   (2.212 ms .. 5.199 ms)
variance introduced by outliers: 63% (severely inflated)

Glass

glass-democlient list on 1000 files in the Stackage DB.

rocksdb: 39s
lmdb: 31s

Caveats

The on-disk size of the DBs is larger than RocksDB by about 2x, although the backed-up size is about the same. This is mainly because LMDB doesn't do compression itself, however backup includes compression and the backup can be mounted directly as a squashfs (although mounting isn't implemented yet - we need to think about what happens if the server is restarted). I ran the Glass benchmark with the DB mounted as a squashfs and it was 32s vs. 31s, so hardly any difference.

There's no way to control the cache size, LMDB just maps pages into memory and relies on the OS to manage memory usage. It remains to be seen how well this works for a production server with many large DBs.

Key storage

LMDB doesn't support large keys, so I've used a strategy in which only the first ~2KB of the key is indexed. Searching for a key using a prefix larger than this may involve a linear search, but I think this is reasonable: we should never be looking up a fact using a large key. We can document this as a performance gotcha, just like sorting of fields. Indeed, we could use this same strategy with RocksDB to avoid duplicating large keys in the DB, which could potentially save a lot of space.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 8, 2026
@netlify
Copy link

netlify bot commented Jan 8, 2026

Deploy Preview for fb-oss-glean canceled.

Name Link
🔨 Latest commit 6c2ae22
🔍 Latest deploy log https://app.netlify.com/projects/fb-oss-glean/deploys/69728874088c2a0008bdfb42

@simonmar simonmar mentioned this pull request Jan 8, 2026
@simonmar simonmar force-pushed the lmdb-3 branch 4 times, most recently from 5ae6a5a to 2076683 Compare January 11, 2026 13:09
@simonmar
Copy link
Collaborator Author

Don't import this yet; I'm still fixing bugs.

@simonmar simonmar marked this pull request as ready for review January 13, 2026 09:43
@simonmar
Copy link
Collaborator Author

Should be good to go now, there will be some followups though.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Jan 13, 2026

@simonhollis has imported this pull request. If you are a Meta employee, you can view this in D90599900.

@simonhollis
Copy link
Contributor

@simonmar Thanks for adding LMDB support. We're excited about the functionality. Is there a way to package the openldap code cleanly as a new third party dependency? It's preferable to not include that in directly into the glean file tree.

@simonmar
Copy link
Collaborator Author

@simonmar Thanks for adding LMDB support. We're excited about the functionality. Is there a way to package the openldap code cleanly as a new third party dependency? It's preferable to not include that in directly into the glean file tree.

Hi @simonhollis - sure I can do something. But are you sure you want it to be a separate dependency? It's only 4 files and bundling it directly is by far the simplest way. The license is in the source files too.

We can't use the Linux package directly (liblmbd-dev on Debian-based systems) because I needed to tweak one of the #defines and recompile to increase the max key size. But I could move the code into a separate directory, say glean/lmdb-clib similar to what we do for folly although it will be a bit simpler because I would still just check in a copy of the code. I don't think there's any need to track the upstream repo or anything like that.

@simonmar
Copy link
Collaborator Author

@simonhollis how about this? f64c145

@iamirzhan
Copy link
Contributor

@simonhollis @simonmar

I agree it's ok to not track the upstream repo. But I think the main point is we already have lmdb sources in our internal structure as third-party/lmdb, and ideally we want to use them instead of copying same files under glean/ project. We still can compile it with DMDB_MAXKEYSIZE flag set.

@simonmar IIUC this is very similar case for other "internally" available code? Can we do similar as what we do for folly? pulling sources at build time instead of putting the sources in the repo?

Or maybe we could just keep it as in f64c145 for github repo and internally don't import these files but link to our pathes.

@simonmar what you would suggest?

@simonmar
Copy link
Collaborator Author

simonmar commented Jan 22, 2026

@iamirzhan

We still can compile it with DMDB_MAXKEYSIZE flag set.

There's also a small diff relative to upstream

***************
*** 1053,1059 ****
  #define METADATA(p)	 ((void *)((char *)(p) + PAGEHDRSZ))
  
  	/** ITS#7713, change PAGEBASE to handle 65536 byte pages */
! #define	PAGEBASE	((MDB_DEVEL) ? PAGEHDRSZ : 0)
  
  	/** Number of nodes on a page */
  #define NUMKEYS(p)	 ((MP_LOWER(p) - (PAGEHDRSZ-PAGEBASE)) >> 1)
--- 1054,1061 ----
  #define METADATA(p)	 ((void *)((char *)(p) + PAGEHDRSZ))
  
  	/** ITS#7713, change PAGEBASE to handle 65536 byte pages */
! #define	PAGEBASE	PAGEHDRSZ
! // #define	PAGEBASE	((MDB_DEVEL) ? PAGEHDRSZ : 0)
  
  	/** Number of nodes on a page */
  #define NUMKEYS(p)	 ((MP_LOWER(p) - (PAGEHDRSZ-PAGEBASE)) >> 1)

this is needed to get the increased key size. Perhaps we can get upstream to accept something to make this accessible with a -D flag, but that might take a while, I don't know. I'll open an issue with the upstream repo. (EDIT: issue here https://bugs.openldap.org/show_bug.cgi?id=10428)

I think it would be a shame to do something complex here because that negates the benefit of having a lightweight dependency, so:

Or maybe we could just keep it as in f64c145 for github repo and internally don't import these files but link to our pathes.

that sounds good to me, if you can find some solution to the diff above.

v.packed(fact.clause.key_size);
v.put({fact.clause.data, fact.clause.size()});

put(Family::entities, k.bytes(), v.bytes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what happens if the key is bigger? inconsistent behaviour? should we assert in that case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k is an encoded nat, which is never larger than 9 bytes - see rts/nat.h. There is a limit on the value size but it's very large and LMDB will throw an error if we exceed it.

@simonmar
Copy link
Collaborator Author

@iamirzhan do you want me to combine this PR with #679 ?

Note that AFAIK there's no way to have some source files that live only in the github repo, so you'll have to import all the files but just not use them in your internal build system.

@simonmar
Copy link
Collaborator Author

Response to the upstream request:

Howard Chu <[hyc@openldap.org](mailto:hyc@openldap.org)> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #1 from Howard Chu <[hyc@openldap.org](mailto:hyc@openldap.org)> ---
The feature is fully usable but will not be enabled by default in v0.9 since
that would make DB files that are incompatible with earlier versions. It will
be enabled by default in v1.0. In the meantime, you're free to customize your
own app-specific builds in whatever way you wish.

That's totally reasonable. Until 1.0 is released we'll need to use our own local mods though.

@iamirzhan
Copy link
Contributor

iamirzhan commented Jan 22, 2026

@simonhollis mentioned that we have 2k key size version of lmdb internally. So, I think the best way to go would be to put #679 here, but only for github repo? We will not include them to the internal build and also will not have these files in the internal repo

EDIT: didn't notice your comment that it's not possible :) yeah, I suppose we will just not include them in the build

@simonhollis
Copy link
Contributor

@simonhollis mentioned that we have 2k key size version of lmdb internally. So, I think the best way to go would be to put #679 here, but only for github repo? We will not include them to the internal build and also will not have these files in the internal repo

EDIT: didn't notice your comment that it's not possible :) yeah, I suppose we will just not include them in the build

Yes, we have an internal version of lmdb that enables -DMDB_MAXKEYSIZE=0, so we don't need the key length patch for internal builds. @simonmar could you then update this PR to add the Cabel package for OSS lmdb C-libs, and we'll build against our internal version.

@simonmar simonmar force-pushed the lmdb-3 branch 2 times, most recently from 26505c2 to b354fbd Compare January 22, 2026 15:11
@simonmar
Copy link
Collaborator Author

@simonhollis just to double-check, does your internal version also include the patch I mentioned above? Just -DMDB_MAXKEYSIZE=0 is not sufficient by itself to increase the key size.

@simonmar
Copy link
Collaborator Author

@iamirzhan @simonhollis ready for import, and I rebased the rest of the stack

@simonhollis
Copy link
Contributor

@simonhollis just to double-check, does your internal version also include the patch I mentioned above? Just -DMDB_MAXKEYSIZE=0 is not sufficient by itself to increase the key size.

Just dug into this. I think it's worth re-checking some calculations. What I'm seeing for our internal build doesn't have the patch but that without and DMDM_MAXKEYSIZE=0 is that the key length is already increased from 511 to 1,982 bytes. With your 64kB page patch, plus MDB_DEVEL=1 & DMDM_MAXKEYSIZE=0, key length would push to 32kB.

@simonmar
Copy link
Collaborator Author

@simonhollis you're right - sorry for the confusion. The patch doesn't seem to be necessary to get the 2kB key size. I'll revert to the pristine upstream code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants