Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 2 additions & 42 deletions .claude/skills/lading-optimize-hunt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This document explains the reasoning behind each optimization pattern in the lad

## Preallocation Patterns

### `Vec::with_capacity(n)` / `String::with_capacity(n)` / `HashMap::with_capacity(n)`
### `Vec::with_capacity(n)` / `String::with_capacity(n)` / `FxHashMap::with_capacity(n)`

**Pattern**: `Vec::new()` + repeated push operations

Expand Down Expand Up @@ -53,46 +53,6 @@ for i in 0..1000 {

---

## Hash Function Optimization

### `FxHashMap` from rustc-hash

**Pattern**: Using default `HashMap` with integer, pointer, or short string keys

**Why it matters**: The default HashMap uses SipHash, a cryptographically secure hash function designed to prevent DoS attacks. For internal data structures where DoS isn't a concern, faster non-cryptographic hashes significantly improve performance.

**How it works**:
- SipHash: Secure but slow (~13 CPU cycles per byte)
- FxHash: Fast but not DoS-resistant (~0.5 CPU cycles per byte)
- FxHash works on 8 bytes at a time, making it much faster for common key types

**Performance impact**:
- rustc benchmarks showed 4-84% slowdowns when switching FROM FxHash back to default HashMap
- Consistently outperforms FNV hash in rustc itself
- Best for: integer keys, pointer keys, small fixed-size keys

**Example**:
```rust
// Slow for internal data structures
use std::collections::HashMap;
let mut map: HashMap<u64, Value> = HashMap::new();

// Fast for internal data structures
use rustc_hash::FxHashMap;
let mut map: FxHashMap<u64, Value> = FxHashMap::default();
```

**Bug risk**: Hash collision DoS attacks (only relevant if keys come from untrusted sources)

**When NOT to use**: User-facing APIs, networked services where keys come from untrusted sources

**References**:
- [Hashing - The Rust Performance Book](https://nnethercote.github.io/perf-book/hashing.html)
- [rustc-hash GitHub](https://github.com/rust-lang/rustc-hash)
- [Using rustc-hash in Rust: A Guide](https://dlcoder.medium.com/using-rustc-hash-in-rust-a-guide-to-faster-and-safer-hashing-1f3dacfbf6de)

---

## String Building Optimization

### `write!()` to reused buffer instead of `format!()`
Expand Down Expand Up @@ -260,7 +220,7 @@ for _ in 0..1_000_000 {

### Bounded buffers for unbounded growth

**Pattern**: Collections that grow indefinitely without limits (unbounded `Vec`, `String`, `HashMap`)
**Pattern**: Collections that grow indefinitely without limits (unbounded `Vec`, `String`, `FxHashMap`)

**Why it matters**: Unbounded growth can lead to memory exhaustion, unpredictable performance, and OOM crashes. In long-running services or load generators, this is a critical reliability concern.

Expand Down
3 changes: 1 addition & 2 deletions .claude/skills/lading-optimize-hunt/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,7 @@ Otherwise, check pending hunt issues or pick from hot subsystems:
|---------|-----------|----------|
| `Vec::new()` + repeated push | `Vec::with_capacity(n)` | None |
| `String::new()` + repeated push | `String::with_capacity(n)` | None |
| `HashMap::new()` hot insert | `HashMap::with_capacity(n)` | None |
| Default `HashMap` with int keys | `FxHashMap` (rustc-hash) | Hash collision DoS |
| `FxHashMap::default()` hot insert | `FxHashMap::with_capacity(n)` | None |
| `format!()` in hot loop | `write!()` to reused buffer | Format errors, buffer sizing |
| `&Vec<T>` or `&String` parameter | `&[T]` or `&str` slice | None |
| Allocation in hot loop | Move outside loop | Lifetime issues |
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ large_stack_arrays = "deny"
float_cmp = "deny"
manual_memcpy = "deny"
unnecessary_to_owned = "deny"
disallowed_types = "deny"

[workspace.lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = ['cfg(kani)', 'cfg(loom)'] }
Expand Down
5 changes: 5 additions & 0 deletions clippy.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
# https://rust-lang.github.io/rust-clippy/master/index.html#doc_markdown
doc-valid-idents = ["OpenTelemetry", "gRPC", ".."]

# https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types
disallowed-types = [
{ path = "std::collections::HashMap", reason = "Use rustc_hash::FxHashMap for better performance." },
]
1 change: 1 addition & 0 deletions integration/ducks/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ http-body-util = { workspace = true }
hyper = { workspace = true, features = ["server"] }
hyper-util = { workspace = true }
once_cell = { workspace = true }
rustc-hash = { workspace = true }
shared = { path = "../shared" }
ddsketch-agent = { workspace = true }
tokio = { workspace = true, features = [
Expand Down
6 changes: 4 additions & 2 deletions integration/ducks/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ use shared::{
integration_target_server::{IntegrationTarget, IntegrationTargetServer},
},
};
use std::{collections::HashMap, net::SocketAddr, pin::Pin, sync::Arc, time::Duration};
use std::{net::SocketAddr, pin::Pin, sync::Arc, time::Duration};

use rustc_hash::FxHashMap;
use tokio::task::JoinSet;
use tokio::{
io::AsyncReadExt,
Expand All @@ -54,7 +56,7 @@ struct HttpCounters {
entropy: DDSketch,
request_count: u64,
total_bytes: u64,
methods: HashMap<Method, u64>,
methods: FxHashMap<Method, u64>,
}

impl From<&HttpCounters> for HttpMetrics {
Expand Down
12 changes: 7 additions & 5 deletions lading/src/bin/captool/analyze/jsonl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
//!
//! This module provides analysis operations on JSONL (row-based) capture files.

use std::collections::{BTreeSet, HashMap, hash_map::RandomState};
use std::collections::BTreeSet;
use std::collections::hash_map::RandomState;
use std::hash::{BuildHasher, Hasher};

use lading_capture::line::{Line, MetricKind};
use rustc_hash::FxHashMap;

use super::{MetricInfo, SeriesStats};

Expand Down Expand Up @@ -39,9 +41,9 @@ pub(crate) fn list_metrics(lines: &[Line]) -> Vec<MetricInfo> {
pub(crate) fn analyze_metric(
lines: &[Line],
metric_name: &str,
) -> HashMap<BTreeSet<String>, SeriesStats> {
let mut context_map: HashMap<u64, (BTreeSet<String>, Vec<f64>)> = HashMap::new();
let mut fetch_indices: HashMap<u64, Vec<u64>> = HashMap::new();
) -> FxHashMap<BTreeSet<String>, SeriesStats> {
let mut context_map: FxHashMap<u64, (BTreeSet<String>, Vec<f64>)> = FxHashMap::default();
let mut fetch_indices: FxHashMap<u64, Vec<u64>> = FxHashMap::default();
let hash_builder = RandomState::new();

// Filter to matching metric and group by label set
Expand Down Expand Up @@ -72,7 +74,7 @@ pub(crate) fn analyze_metric(
}

// Compute statistics
let mut results = HashMap::new();
let mut results = FxHashMap::default();
for (key, (labels, values)) in context_map {
if values.is_empty() {
continue;
Expand Down
9 changes: 5 additions & 4 deletions lading/src/bin/captool/analyze/parquet.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@
//! This module provides analysis operations on parquet capture files using
//! Apache Arrow, staying in columnar format for efficiency.

use std::collections::{BTreeSet, HashMap};
use std::collections::BTreeSet;
use std::fs::File;
use std::path::Path;

use arrow_array::{Array, Float64Array, MapArray, StringArray, UInt64Array};
use lading_capture_schema::columns;
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
use rustc_hash::FxHashMap;

use super::{MetricInfo, SeriesStats};

Expand Down Expand Up @@ -93,12 +94,12 @@ pub(crate) fn list_metrics<P: AsRef<Path>>(path: P) -> Result<Vec<MetricInfo>, E
pub(crate) fn analyze_metric<P: AsRef<Path>>(
path: P,
metric_name: &str,
) -> Result<HashMap<BTreeSet<String>, SeriesStats>, Error> {
) -> Result<FxHashMap<BTreeSet<String>, SeriesStats>, Error> {
let file = File::open(path)?;
let reader_builder = ParquetRecordBatchReaderBuilder::try_new(file)?;
let reader = reader_builder.build()?;

let mut series_map: HashMap<BTreeSet<String>, Vec<(u64, f64)>> = HashMap::new();
let mut series_map: FxHashMap<BTreeSet<String>, Vec<(u64, f64)>> = FxHashMap::default();

for batch_result in reader {
let batch = batch_result?;
Expand Down Expand Up @@ -194,7 +195,7 @@ pub(crate) fn analyze_metric<P: AsRef<Path>>(
}

// Compute statistics for each series
let mut results = HashMap::new();
let mut results = FxHashMap::default();
for (labels, mut values) in series_map {
if values.is_empty() {
continue;
Expand Down
22 changes: 16 additions & 6 deletions lading/src/generator/container.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ use lading_throttle::Throttle;
use metrics::counter;
use serde::{Deserialize, Serialize};
use std::num::NonZeroU32;
use std::{collections::HashMap, process};
use std::process;
use tokio::time::Instant;

use rustc_hash::FxHashMap;
use tokio_stream::StreamExt;
use tracing::{debug, error, info, warn};

Expand Down Expand Up @@ -53,7 +55,7 @@ pub struct Config {
/// Environment variables for the container, maps to docker run --env.
pub env: Option<Vec<String>>,
/// Labels to set on the container, maps to docker run --label.
pub labels: Option<HashMap<String, String>>,
pub labels: Option<FxHashMap<String, String>>,
/// Whether to disable the container network, maps to docker run --network
/// none.
#[serde(default = "default_network_disabled")]
Expand Down Expand Up @@ -162,7 +164,7 @@ pub struct Container {
throttle: Option<Throttle>,
shutdown: lading_signal::Watcher,
metric_labels: Vec<(String, String)>,
containers: HashMap<u32, ContainerInfo>,
containers: FxHashMap<u32, ContainerInfo>,
}

impl Container {
Expand Down Expand Up @@ -209,7 +211,7 @@ impl Container {
throttle,
shutdown,
metric_labels,
containers: HashMap::new(),
containers: FxHashMap::default(),
})
}

Expand Down Expand Up @@ -682,6 +684,8 @@ impl Config {
// Docker API requires exposed ports as {"<port>/<protocol>": {}}
// Bollard represents the empty object as HashMap<(), ()>
#[allow(clippy::zero_sized_map_values)]
// A lot of the properties of `ContainerCreateBody` require `HashMap` and as such this lint must be disabled here.
#[allow(clippy::disallowed_types)]
let exposed_ports = if self.exposed_ports.is_empty() {
None
} else {
Expand All @@ -694,7 +698,10 @@ impl Config {
} else {
format!("{port}/tcp")
};
(port_with_protocol, HashMap::<(), ()>::new())
(
port_with_protocol,
std::collections::HashMap::<(), ()>::new(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is the only use of HashMap left, it's required for ContainerCreateBody properties

)
})
.collect(),
)
Expand All @@ -705,7 +712,10 @@ impl Config {
tty: Some(true),
cmd: self.args.clone(),
env: self.env.clone(),
labels: self.labels.clone(),
labels: self
.labels
.as_ref()
.map(|labels| labels.iter().map(|(k, v)| (k.clone(), v.clone())).collect()),
network_disabled: Some(self.network_disabled),
exposed_ports,
..Default::default()
Expand Down
7 changes: 4 additions & 3 deletions lading/src/generator/file_gen/logrotate_fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ use nix::libc::{self, ENOENT};
use rand::{SeedableRng, rngs::SmallRng};
use serde::{Deserialize, Serialize};
use std::{
collections::HashMap,
ffi::OsStr,
fs,
num::NonZeroU32,
Expand All @@ -27,6 +26,8 @@ use std::{
use tokio::task::{self, JoinError};
use tracing::{debug, error, info, warn};

use rustc_hash::FxHashMap;

mod model;

const TTL: Duration = Duration::from_secs(1); // Attribute cache timeout
Expand Down Expand Up @@ -222,7 +223,7 @@ impl Server {
// Initialize the FUSE filesystem
let fs = LogrotateFS {
state: Arc::new(Mutex::new(state)),
open_files: Arc::new(Mutex::new(HashMap::new())),
open_files: Arc::new(Mutex::new(FxHashMap::default())),
start_time,
start_time_system,
};
Expand Down Expand Up @@ -263,7 +264,7 @@ impl Server {
#[derive(Debug)]
struct LogrotateFS {
state: Arc<Mutex<model::State>>,
open_files: Arc<Mutex<HashMap<u64, model::FileHandle>>>,
open_files: Arc<Mutex<FxHashMap<u64, model::FileHandle>>>,

start_time: Instant,
start_time_system: SystemTime,
Expand Down
8 changes: 5 additions & 3 deletions lading_capture/src/validate/jsonl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
//! This module provides validation for JSONL (JSON Lines) format capture files.
//! It operates on already-deserialized `Line` objects.

use std::collections::BTreeSet;
use std::collections::hash_map::RandomState;
use std::collections::{BTreeSet, HashMap};
use std::hash::{BuildHasher, Hasher};

use rustc_hash::FxHashMap;

use crate::line::{Line, MetricKind};
use crate::validate::ValidationResult;

Expand All @@ -32,9 +34,9 @@ use crate::validate::ValidationResult;
#[allow(clippy::too_many_lines)]
pub fn validate_lines(lines: &[Line], min_seconds: Option<u64>) -> ValidationResult {
// Phase 1: Streaming assertions
let mut fetch_index_to_time: HashMap<u64, u128> = HashMap::new();
let mut fetch_index_to_time: FxHashMap<u64, u128> = FxHashMap::default();
// Collect (fetch_index, time) pairs for each series
let mut series_data: HashMap<u64, (Vec<(u64, u128)>, String)> = HashMap::new();
let mut series_data: FxHashMap<u64, (Vec<(u64, u128)>, String)> = FxHashMap::default();
let hash_builder = RandomState::new();
let mut fetch_index_errors = 0u128;
let mut first_error: Option<(u128, String, String)> = None;
Expand Down
8 changes: 5 additions & 3 deletions lading_capture/src/validate/parquet.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
//! Apache Arrow compute kernels. Unlike the row-based validation in `jsonl`,
//! this leverages columnar operations for better performance and memory efficiency.

use std::collections::BTreeSet;
use std::collections::hash_map::RandomState;
use std::collections::{BTreeSet, HashMap};

use rustc_hash::FxHashMap;
use std::fs::File;
use std::hash::{BuildHasher, Hasher};
use std::path::Path;
Expand Down Expand Up @@ -78,8 +80,8 @@ pub fn validate_parquet<P: AsRef<Path>>(
//
// We also collect (fetch_index, time) pairs for each series to validate
// per-series invariants in Phase 2.
let mut fetch_index_to_time: HashMap<u64, u128> = HashMap::new();
let mut series_data: HashMap<u64, (Vec<(u64, u128)>, String)> = HashMap::new();
let mut fetch_index_to_time: FxHashMap<u64, u128> = FxHashMap::default();
let mut series_data: FxHashMap<u64, (Vec<(u64, u128)>, String)> = FxHashMap::default();
let hash_builder = RandomState::new();

let mut line_count = 0u128;
Expand Down
Loading