Efficiently store a vector of enum variants
Let's say you have an enum Direction with 4 variants. You only need 2 bits to
store the discriminant, but Rust will use the minimum of 1 byte (8 bits).
Therefore, when using a Vec<Direction> with 16 elements it will use 16 bytes
of memory. However, this crate provides the EnumVec type, which only uses as
many bits as needed. So a EnumVec<Direction> with 16 elements will only use
4 bytes of memory.
Since Rust doesn't provide a way to count the variants of a type, the
enum_like crate defines a trait EnumLike with an associated constant
NUM_VARIANTS, and helper methods to convert from usize to T. This trait
is implemented for a few common types, like bool and Option<T>, and can be
implemented for any type. The implementation can be automated using the
enum_like_derive crate, which provides the #[derive(EnumLike)] proc macro.
Add this to your Cargo.toml:
[dependencies]
enum_vec = "0.3"
enum_like = "0.2"
enum_like_derive = "0.1"
And then in src/main.rs:
#[macro_use]
extern crate enum_like_derive;
extern crate enum_like;
extern crate enum_vec;
use enum_vec::EnumVec;
#[derive(Copy, Clone, Debug, EnumLike)]
enum Direction {
Left, Right, Up, Down,
}
fn main() {
let mut v = EnumVec::new();
v.push(Direction::Left);
v.push(Direction::Right);
v.push(Direction::Left);
v.push(Direction::Right);
for d in v {
println!("{:?}", d);
}
}See examples/src/main.rs for more usage examples.
Since an EnumVec is essentially a n-bitvec, you can use it as such.
type BitVec = EnumVec<bool>;
type TwoBitVec = EnumVec<[bool; 2]>;
type TwoBitVec = EnumVec<(bool, bool)>;
type FourBitVec = EnumVec<[bool; 4]>;You can automatically derive EnumLike for almost any type, as long as all of
its fields are EnumLike.
struct BitField {
opt_0: bool,
opt_1: bool,
opt_2: bool,
opt_3: bool,
}
enum BitsOrRaw {
Bits(BitField),
Raw { opt_01: (bool, bool), opt_23: (bool, bool), },
}You can write a custom EnumLike implementation: the following code allows
to create a EnumVec<Digit> where each element is 4 bits, instead of the 8
required by u8.
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
struct Digit {
x: u8, // x >= 0 && x <= 9
}
unsafe impl EnumLike for Digit {
const NUM_VARIANTS: usize = 10;
fn to_discr(self) -> usize {
self.x as usize
}
fn from_discr(x: usize) -> Self {
let x = x as u8;
Self { x }
}
}This trait is unsafe because other code assumes that to_discr() never returns
something >= than NUM_VARIANTS.
Since by default each block is 32 bits, the EnumVec is only 100% memory
efficient when each element is 1, 2, 4, 8, 16 or 32 bits long.
That's because the elements are never split across two blocks: a 15-bit element
stored inside a 32-bit block will always use 30 bits and waste the remaining 2.
In general, the efficiency
can be calculated as 1 - (32 % n) / 32, but it's always
equal or better than a normal Vec. However it's equal when n >= 11, so if
you have a type with 2048 variants, you should consider using a Vec instead.
| n | Vec | EnumVec8 | EnumVec16 | EnumVec32 | EnumVec64 | EnumVec128 |
|---|---|---|---|---|---|---|
| 1 | 0.125 | 1 | 1 | 1 | 1 | 1 |
| 2 | 0.25 | 1 | 1 | 1 | 1 | 1 |
| 3 | 0.375 | 0.75 | 0.9375 | 0.9375 | 0.984375 | 0.984375 |
| 4 | 0.5 | 1 | 1 | 1 | 1 | 1 |
| 5 | 0.625 | 0.625 | 0.9375 | 0.9375 | 0.9375 | 0.9765625 |
| 6 | 0.75 | 0.75 | 0.75 | 0.9375 | 0.9375 | 0.984375 |
| 7 | 0.875 | 0.875 | 0.875 | 0.875 | 0.984375 | 0.984375 |
| 8 | 1 | 1 | 1 | 1 | 1 | 1 |
| 9 | 0.5625 | 0 | 0.5625 | 0.84375 | 0.984375 | 0.984375 |
| 10 | 0.625 | 0 | 0.625 | 0.9375 | 0.9375 | 0.9375 |
| 11 | 0.6875 | 0 | 0.6875 | 0.6875 | 0.859375 | 0.9453125 |
The complete table is available as a python one-liner:
x = [(n, n/8 if n <= 8 else n/16 if n <= 16 else n/32 if n <= 32 else n/64, 1-(8%n)/8, 1-(16%n)/16, 1-(32%n)/32, 1-(64%n)/64, 1-(128%n)/128) for n in range(1, 64+1)]An EnumVec8 with 8-bit storage blocks cannot be used to store items larger
than 8 bits. Similarly, for storing elements larger than 32 bits, the default
EnumVec32 is not enough. The maximum size of an item in bits is defined on
the EnumLike crate as the number of bits that can fit in one usize.
The EnumVec with 128-bit storage is the most memory-efficient option right
now, but most of the operations are 2x slower than the other implementations
on a tipical 64-bit machine. The 8, 16, 32 and 64-bit versions have similar
performance.
The "efficiency limits" of each EnumVecN, the largest item size in bits
where it is better than a Vec are the following:
| Storage size | Efficiency limit |
|---|---|
| EnumVec8 | 4 |
| EnumVec16 | 4 |
| EnumVec32 | 11 |
| EnumVec64 | 22 |
| EnumVec128 | 42 |
To change the default storage just import the EnumVec from an internal
module:
use enum_vec::vec_u64::EnumVec;
use enum_vec::vec_u8::EnumVec as EnumVec8;
This will make the EnumVec use 64-bit blocks, improving the memory
efficiency, and also add the option to use an EnumVec8 with 8-bit blocks.
Note that the enum_vec![] macro will always create an EnumVec, so code
like:
let a: EnumVec8 = enum_vec![];
will not compile.
Which storage size to choose?
- Use
EnumVec8to minimize the overhead of small vectors, well actually consider using aSmallEnumVecinstead. - Use
EnumVec64with very large vectors, especially when the element bit size is not a power of 2, as it is more memory efficient in some cases. - Use
EnumVec128only if memory efficiency is more important than performance. - Use
Vecif performance is more important than memory efficiency. - Use
SmallEnumVecif most of the time you need to store few elements (up to 128 bits).
When the item size is 8 or 16 bits, using a Vec is always a better option.
But that's not always easy, as a Vec<[bool; 8]> will use 8 bytes per element
instead of 8 bits. To force it to use 8 bits wrap it as Vec<PackedU8<[bool; 8]>>:
use enum_like::PackedU8;
let a = vec![PackedU8::new([true; 8]); 10];
for x in a {
let x = x.value();
}
There is an experimental SmallEnumVec available at:
use enum_vec::smallvec_u32::EnumVec as SmallEnumVec;
When compiled with the smallvec feature, enabled in Cargo.toml:
enum_vec = { version = "0.3", features = ["smallvec"] }
A SmallEnumVec will use the stack to store the items, and will only allocate
when it grows too large. The default right now is to use 4x32 bits of inline
storage. This will allow to store 128 1-bit items, 64 2-bit, 32 4-bit, etc.
See the smallvec crate for more information.
- There is no indexing syntax, since the
EnumVeccan't return a reference. Use get and set instead. - You can't use slice methods, like split(), get(range), reverse(), chunk and
window iterators, sort(), dedup(), etc. Because there is no deref impl
(unlike
&Vecwhich can be used as a&[T]). - Most operations (push, pop, insert, remove) are about 2 or 3 times slower
than the
Vecequivalent. Operations like extend, from_slice, orvec![None; 1000];are even worse.
Here is a comparison of Vec<T> vs EnumVec<T> when T requires 2 bits of storage.
(commit e8db9c883b82e472e9aefb6087be55dafd76b6a0)
name normal_vec2 ns/iter enum_vec32_2 ns/iter diff ns/iter diff % speedup
::bench_all 3 5 2 66.67% x 0.60
::bench_all_small 3 5 2 66.67% x 0.60
::bench_all_worst_case 1,308 41 -1,267 -96.87% x 31.90
::bench_all_worst_case_small 19 5 -14 -73.68% x 3.80
::bench_any 8 12 4 50.00% x 0.67
::bench_any_small 8 12 4 50.00% x 0.67
::bench_any_worst_case 447 59 -388 -86.80% x 7.58
::bench_any_worst_case_small 11 6 -5 -45.45% x 1.83
::bench_extend 419 3,793 3,374 805.25% x 0.11
::bench_extend_small 48 108 60 125.00% x 0.44
::bench_from_slice 180 3,237 3,057 1698.33% x 0.06
::bench_from_slice_small 27 79 52 192.59% x 0.34
::bench_insert 8,059 13,154 5,095 63.22% x 0.61
::bench_insert_at_zero 16,898 38,729 21,831 129.19% x 0.44
::bench_insert_at_zero_small 218 190 -28 -12.84% x 1.15
::bench_insert_small 275 258 -17 -6.18% x 1.07
::bench_iter_all 2,327 4,948 2,621 112.63% x 0.47
::bench_macro_from_elem 602 2,435 1,833 304.49% x 0.25
::bench_macro_from_elem_small 28 80 52 185.71% x 0.35
::bench_push 4,914 7,097 2,183 44.42% x 0.69
::bench_push_small 181 130 -51 -28.18% x 1.39
::bench_pushpop 4,390 12,107 7,717 175.79% x 0.36
::bench_remove 5,261 10,823 5,562 105.72% x 0.49
::bench_remove_at_zero 15,880 68,593 52,713 331.95% x 0.23
::bench_remove_at_zero_small 101 443 342 338.61% x 0.23
::bench_remove_small 103 207 104 100.97% x 0.50
The only methods which are definitely faster than the Vec equivalent are
all and any, which take advantage of the packing to process many
elements at once.
Some other benchmarks appear faster because of reallocation:
a Vec will reallocate when it reaches 1, 2, 4, 8, ... elements but an
EnumVec will reallocate every 32/n, 64/n, ... and since in the benchmark
n=2 and the number of insertions in "_small" benchmarks defaults to 16,
a Vec will reallocate 4 times while an EnumVec will reallocate 1 time.
To run the benchmarks yourself, download the source code and run:
cargo +nightly bench --features smallvec > bench_log
cargo benchcmp normal_vec2 enum_vec32_2 bench_log
You will need to install
cargo-benchcmp
to be able to easily compare benchmarks.
For example, to compare the default 32-bit EnumVec with a 8-bit EnumVec,
when dealing with 4-bit elements, run:
cargo benchcmp enum_vec32_4 enum_vec8_4 bench bench_log