Error running YCSB workloads

Hi, 

I'm trying to run some YCSB workloads on Viper and I'm encountering an error (copied below) running Workload F (https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloadf) with multiple threads. I'm curious if you encountered this issue when evaluating Viper or if you know what might be happening. It happens on *almost* every run with 16 threads, and less often with fewer threads; I haven't seen it at all with 1 thread. I think I've also seen it happen occasionally on other workloads with multiple threads, but running workload F consistently triggers the error. 

I'm running workload F with 600K operations and records. I run Load E first, which is always successful. I'm using 32B keys and 1140B values, which roughly match the sizes used by YCSB; when keys are smaller than 32B, they are padded with spaces. I wrote a small wrapper around Viper that I compile to a shared library (`libviper_wrapper.so`) and link to a Java YCSB client for Viper; I don't think the error stems from my wrapper because most of the workloads run correctly, and workload F always runs for a while before the error occurs. 

When I execute Run F, the workload runs for a while and a segmentation fault eventually occurs in Viper, resulting in output like the following. The error comes through Java via YCSB, but the error itself is happening in some Viper code. 
```
... # truncated
Start resizing.
Added data file "/mnt/pmem/viper/data27"
Allocated 43690 blocks in 1 GiB.
End resizing.
Start resizing.
Added data file "/mnt/pmem/viper/data28"
Allocated 43690 blocks in 1 GiB.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff85365f82d, pid=6129, tid=6162
#
# JRE version: OpenJDK Runtime Environment (21.0.6+7) (build 21.0.6+7-Debian-1)
# Java VM: OpenJDK 64-Bit Server VM (21.0.6+7-Debian-1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libviper_wrapper.so+0x5282d]  viper::Viper<viper::kv_bm::BMRecord<unsigned char, 32ul>, viper::kv_bm::BMRecord<unsigned char, 1140ul> >::ReadOnlyClient::get_const_entry_from_offset(viper::KeyValueOffset) const+0x6d
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as: ... # truncated
End resizing.
[7.225s][warning][os] Loading hsdis library failed
# ... truncated
```
Running `addr2line -e libviper_wrapper.so 0x5282d` points to https://github.com/hpides/viper/blob/79ebf6e9248b6734244fb9b52d4cafaf703611b2/include/viper/viper.hpp#L1523
The error also occasionally happens in `get_value_from_offset` on a similar line: https://github.com/hpides/viper/blob/79ebf6e9248b6734244fb9b52d4cafaf703611b2/include/viper/viper.hpp#L1563
I tried to isolate the specific part of those lines where the segmentation fault occurs and it appears to be `this->viper_.v_blocks_[block]`.

I have a theory about why this error may be happening (although I'm not very familiar with the Viper source code, so this could be completely off base). The error always appears to occur during resizing, between a `Allocated 43690 blocks in 1 GiB.` line and an `End resizing.` line. I also noticed that the resizing code updates `v_blocks_` and while a compare-and-swap prevents multiple threads from attempting to resize at the same time, accesses to `v_blocks_` don't appear to be protected by a lock. Is it possible there's a race condition happening here? 

I'm running experiments on Debian Trixie, using Optane PM (the bug manifests with both 128GiB non-interleaved and 512GiB interleaved PM) and ext4-DAX as the file system. I'm happy to provide additional details about my environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running YCSB workloads #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error running YCSB workloads #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions