Skip to content

Commit fd9a260

Browse files
chaliyclaude
andauthored
feat: grep binary detection, awk %.6g and sorted for-in
## Summary - **Grep binary detection**: auto-detect null bytes in content → print "Binary file X matches" instead of raw lines. Respects `-a` (text) and `-z` (null-delimited) flags - **AWK %.6g formatting**: `format_awk_number()` implements C's `%.6g` (6 significant digits, trim trailing zeros) matching real awk OFMT - **AWK sorted for-in**: deterministic iteration order — numeric keys sorted numerically, string keys lexically - **Spec doc refresh**: accurate test counts (AWK 96/96, Grep 76/76, Sed 75/75, JQ 109/114, Bash 739/744) 4 previously-skipped tests unskipped, 4 new tests added. ## Test plan - [x] `cargo clippy --all-targets --all-features -- -D warnings` clean - [x] `cargo test --all-features` all pass - [x] AWK: 96/96 pass, 0 skip (was 92/94) - [x] Grep: 76/76 pass, 0 skip (was 74/75) - [x] Sed: 75/75 pass, 0 skip - [x] JQ: 109/114 pass, 5 skip (jaq limitations) - [x] Bash: 739/744 pass, 5 skip (platform-varying output) --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent daea4a0 commit fd9a260

File tree

5 files changed

+113
-57
lines changed

5 files changed

+113
-57
lines changed

crates/bashkit/src/builtins/awk.rs

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,50 @@ enum AwkValue {
107107
Uninitialized,
108108
}
109109

110+
/// Format number using AWK's OFMT (%.6g): 6 significant digits, trim trailing zeros.
111+
fn format_awk_number(n: f64) -> String {
112+
if n.is_nan() {
113+
return "nan".to_string();
114+
}
115+
if n.is_infinite() {
116+
return if n > 0.0 { "inf" } else { "-inf" }.to_string();
117+
}
118+
// Integers: no decimal point
119+
if n.fract() == 0.0 && n.abs() < 1e16 {
120+
return format!("{}", n as i64);
121+
}
122+
// %.6g: use 6 significant digits
123+
let abs = n.abs();
124+
let exp = abs.log10().floor() as i32;
125+
if !(-4..6).contains(&exp) {
126+
// Scientific notation: 5 decimal places = 6 sig digits
127+
let mut s = format!("{:.*e}", 5, n);
128+
// Trim trailing zeros in mantissa
129+
if let Some(e_pos) = s.find('e') {
130+
let (mantissa, exp_part) = s.split_at(e_pos);
131+
let trimmed = mantissa.trim_end_matches('0').trim_end_matches('.');
132+
s = format!("{}{}", trimmed, exp_part);
133+
}
134+
// Normalize exponent format: e1 -> e+01 etc. to match C printf
135+
// Actually AWK uses e+06 style. Rust uses e6. Fix:
136+
if let Some(e_pos) = s.find('e') {
137+
let exp_str = &s[e_pos + 1..];
138+
let exp_val: i32 = exp_str.parse().unwrap_or(0);
139+
let mantissa = &s[..e_pos];
140+
s = format!("{}e{:+03}", mantissa, exp_val);
141+
}
142+
s
143+
} else {
144+
// Fixed notation
145+
let decimal_places = (5 - exp).max(0) as usize;
146+
let mut s = format!("{:.*}", decimal_places, n);
147+
if s.contains('.') {
148+
s = s.trim_end_matches('0').trim_end_matches('.').to_string();
149+
}
150+
s
151+
}
152+
}
153+
110154
impl AwkValue {
111155
fn as_number(&self) -> f64 {
112156
match self {
@@ -118,13 +162,7 @@ impl AwkValue {
118162

119163
fn as_string(&self) -> String {
120164
match self {
121-
AwkValue::Number(n) => {
122-
if n.fract() == 0.0 {
123-
format!("{}", *n as i64)
124-
} else {
125-
format!("{}", n)
126-
}
127-
}
165+
AwkValue::Number(n) => format_awk_number(*n),
128166
AwkValue::String(s) => s.clone(),
129167
AwkValue::Uninitialized => String::new(),
130168
}
@@ -2340,13 +2378,18 @@ impl AwkInterpreter {
23402378
AwkAction::ForIn(var, arr_name, actions) => {
23412379
// Collect array keys matching the pattern arr_name[*]
23422380
let prefix = format!("{}[", arr_name);
2343-
let keys: Vec<String> = self
2381+
let mut keys: Vec<String> = self
23442382
.state
23452383
.variables
23462384
.keys()
23472385
.filter(|k| k.starts_with(&prefix) && k.ends_with(']'))
23482386
.map(|k| k[prefix.len()..k.len() - 1].to_string())
23492387
.collect();
2388+
// Sort for deterministic iteration: numeric keys first, then lexical
2389+
keys.sort_by(|a, b| match (a.parse::<f64>(), b.parse::<f64>()) {
2390+
(Ok(na), Ok(nb)) => na.partial_cmp(&nb).unwrap_or(std::cmp::Ordering::Equal),
2391+
_ => a.cmp(b),
2392+
});
23502393

23512394
for key in keys {
23522395
self.state.set_variable(var, AwkValue::String(key));

crates/bashkit/src/builtins/grep.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -570,6 +570,9 @@ impl Builtin for Grep {
570570
let mut match_count = 0;
571571
let mut file_matched = false;
572572

573+
// Binary detection: content with null bytes, -a and -z not set
574+
let is_binary = !opts.binary_as_text && !opts.null_terminated && content.contains('\0');
575+
573576
// Split on null bytes if -z flag is set, otherwise split on newlines
574577
let lines: Vec<&str> = if opts.null_terminated {
575578
content.split('\0').collect()
@@ -669,6 +672,16 @@ impl Builtin for Grep {
669672
}
670673

671674
// Now generate output
675+
// Binary file: just report "Binary file X matches" instead of lines
676+
if is_binary && file_matched && !opts.count_only && !opts.files_with_matches {
677+
let display_name = if filename.is_empty() {
678+
"(standard input)"
679+
} else {
680+
filename.as_str()
681+
};
682+
output.push_str(&format!("Binary file {} matches\n", display_name));
683+
continue 'file_loop;
684+
}
672685
if opts.files_with_matches && file_matched {
673686
output.push_str(filename);
674687
output.push('\n');

crates/bashkit/tests/spec_cases/awk/awk.test.sh

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -418,12 +418,18 @@ printf '0\n' | awk '{print sin($1), cos($1)}'
418418
### end
419419

420420
### awk_exp_log_func
421-
### skip: exp/log function output precision differs
421+
# exp/log use %.6g formatting (6 significant digits)
422422
printf '1\n' | awk '{print exp($1)}'
423423
### expect
424424
2.71828
425425
### end
426426

427+
### awk_log_func
428+
printf '100\n' | awk '{print log($1)}'
429+
### expect
430+
4.60517
431+
### end
432+
427433
### awk_match_func
428434
printf 'hello world\n' | awk '{if (match($0, /wor/)) print RSTART, RLENGTH}'
429435
### expect
@@ -514,13 +520,22 @@ found
514520
### end
515521

516522
### awk_for_in_array
517-
### skip: for-in array iteration order not deterministic
523+
# for-in iterates keys in sorted order (numeric, then lexical)
518524
printf 'a\n' | awk 'BEGIN {a[1]="x"; a[2]="y"} {for (k in a) print k, a[k]}'
519525
### expect
520526
1 x
521527
2 y
522528
### end
523529

530+
### awk_for_in_string_keys
531+
# for-in with string keys sorts lexically
532+
printf 'a\n' | awk 'BEGIN {a["b"]="2"; a["a"]="1"; a["c"]="3"} {for (k in a) print k, a[k]}'
533+
### expect
534+
a 1
535+
b 2
536+
c 3
537+
### end
538+
524539
### awk_delete_array
525540
printf 'a\n' | awk 'BEGIN {a[1]="x"; delete a[1]} {print (1 in a) ? "yes" : "no"}'
526541
### expect

crates/bashkit/tests/spec_cases/grep/grep.test.sh

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -348,11 +348,18 @@ printf 'foo123\n' | grep -P 'foo\d+'
348348
foo123
349349
### end
350350

351-
### grep_ignore_binary
352-
### skip: binary file detection not implemented
351+
### grep_binary_detect
352+
# Binary file detection: content with null bytes triggers binary message
353353
printf 'foo\0bar\n' | grep foo
354354
### expect
355-
foo
355+
Binary file (standard input) matches
356+
### end
357+
358+
### grep_binary_with_a_flag
359+
# -a flag treats binary as text, outputs match normally
360+
printf 'foo\0bar\n' | grep -a foo
361+
### expect
362+
foobar
356363
### end
357364

358365
### grep_include_pattern

specs/009-implementation-status.md

Lines changed: 22 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -107,17 +107,16 @@ Bashkit implements IEEE 1003.1-2024 Shell Command Language. See
107107

108108
## Spec Test Coverage
109109

110-
**Total spec test cases:** 1144 (1090 pass, 54 skip)
110+
**Total spec test cases:** 1105 (1095 pass, 10 skip)
111111

112112
| Category | Cases | In CI | Pass | Skip | Notes |
113113
|----------|-------|-------|------|------|-------|
114-
| Bash (core) | 741 | Yes | 735 | 6 | `bash_spec_tests` in CI |
115-
| AWK | 90 | Yes | 73 | 17 | loops, arrays, -v, ternary, field assign |
116-
| Grep | 82 | Yes | 79 | 3 | now with -z, -r, -a, -b, -H, -h, -f, -P, --include, --exclude |
117-
| Sed | 65 | Yes | 53 | 12 | hold space, change, regex ranges, -E |
118-
| JQ | 108 | Yes | 100 | 8 | reduce, walk, regex funcs, --arg/--argjson, combined flags |
119-
| Python | 58 | Yes | 50 | 8 | **Experimental.** VFS bridging, pathlib, env vars |
120-
| **Total** | **1144** | **Yes** | **1090** | **54** | |
114+
| Bash (core) | 744 | Yes | 739 | 5 | `bash_spec_tests` in CI |
115+
| AWK | 96 | Yes | 96 | 0 | loops, arrays, -v, ternary, field assign, getline, %.6g |
116+
| Grep | 76 | Yes | 76 | 0 | -z, -r, -a, -b, -H, -h, -f, -P, --include, --exclude, binary detect |
117+
| Sed | 75 | Yes | 75 | 0 | hold space, change, regex ranges, -E |
118+
| JQ | 114 | Yes | 109 | 5 | reduce, walk, regex funcs, --arg/--argjson, combined flags, input/inputs, env |
119+
| **Total** | **1105** | **Yes** | **1095** | **10** | |
121120

122121
### Bash Spec Tests Breakdown
123122

@@ -239,38 +238,24 @@ None currently tracked.
239238
- Array assignment in split: `split($0, arr, ":")`
240239
- Complex regex patterns
241240

242-
**Skipped Tests (15):**
241+
**Skipped Tests: 0** (all AWK tests pass)
243242

244-
| Feature | Count | Notes |
245-
|---------|-------|-------|
246-
| Power operators | 2 | `^`, `**` |
247-
| Printf formats | 4 | `%x`, `%o`, `%c`, width specifier |
248-
| Functions | 3 | `match()`, `gensub()`, `exit` statement |
249-
| Field handling | 2 | `-F'\t'` tab delimiter, missing field returns empty |
250-
| Negation | 1 | `!$1` logical negation operator |
251-
| ~~ORS/getline~~ | ~~2~~ | ✅ Implemented |
252-
| $0 modification | 1 | `$0 = "x y z"` re-splits fields |
253-
254-
**Recently Implemented:**
243+
**Implemented Features:**
255244
- For/while/do-while loops with break/continue
256245
- Postfix/prefix increment/decrement (`i++`, `++i`, `i--`, `--i`)
257-
- Arrays: `arr[key]=val`, `"key" in arr`, `for (k in arr)`, `delete arr[k]`
246+
- Arrays: `arr[key]=val`, `"key" in arr`, `for (k in arr)` (sorted), `delete arr[k]`
258247
- `-v var=value` flag for variable initialization
259248
- Ternary operator `(cond ? a : b)`
260-
- Field assignment `$2 = "X"`
249+
- Field assignment `$2 = "X"`, `$0 = "x y z"` re-splits fields
261250
- `getline` — reads next input record into `$0`
262-
- ORS (output record separator) tests verified
263-
- `next` statement
264-
265-
<!-- Known AWK gaps for LLM compatibility (tracked in docs/compatibility.md) -->
266-
<!-- - Power operators (^ and **) - used in math scripts -->
267-
<!-- - printf %x/%o/%c formats - used in hex/octal output -->
268-
<!-- - match()/gensub() functions - used in text extraction -->
269-
<!-- - exit statement with code - used in error handling -->
270-
<!-- - !$1 negation - used in filtering empty fields -->
271-
<!-- - ORS variable - used in custom output formatting -->
272-
<!-- - getline - used in multi-file processing -->
273-
<!-- - $0 modification with field re-splitting -->
251+
- ORS (output record separator)
252+
- `next`, `exit` with code
253+
- Power operators `^`, `**`
254+
- Printf formats: `%x`, `%o`, `%c`, width specifier
255+
- `match()` (RSTART/RLENGTH), `gensub()`, `sub()`, `gsub()`
256+
- `!$1` logical negation, `-F'\t'` tab delimiter
257+
- `%.6g` number formatting (OFMT-compatible)
258+
- Deterministic `for-in` iteration (sorted keys)
274259

275260
### Sed Limitations
276261

@@ -292,13 +277,7 @@ None currently tracked.
292277

293278
### Grep Limitations
294279

295-
**Skipped Tests (3):**
296-
297-
| Feature | Count | Notes |
298-
|---------|-------|-------|
299-
| Recursive test | 1 | Test needs VFS setup with files |
300-
| Pattern file `-f` | 1 | Requires file redirection support |
301-
| Binary detection | 1 | Auto-detect binary files |
280+
**Skipped Tests: 0** (all grep tests pass)
302281

303282
**Implemented Features:**
304283
- Basic flags: `-i`, `-v`, `-c`, `-n`, `-o`, `-l`, `-w`, `-E`, `-F`, `-q`, `-m`, `-x`
@@ -310,19 +289,18 @@ None currently tracked.
310289
- Byte offset: `-b`
311290
- Null-terminated: `-z` (split on `\0` instead of `\n`)
312291
- Recursive: `-r`/`-R` (uses VFS read_dir)
313-
- Binary handling: `-a` (filter null bytes)
292+
- Binary handling: `-a` (filter null bytes), auto-detect binary (null byte → "Binary file ... matches")
314293
- Perl regex: `-P` (regex crate supports PCRE features)
315294
- No-op flags: `--color`, `--line-buffered`
316295

317296
### JQ Limitations
318297

319-
**Skipped Tests (8):**
298+
**Skipped Tests (5):**
320299

321300
| Feature | Count | Notes |
322301
|---------|-------|-------|
323302
| Alternative `//` | 1 | jaq errors on `.foo` applied to null instead of returning null |
324303
| Path functions | 2 | `setpath`, `leaf_paths` not in jaq standard library |
325-
| ~~I/O functions~~ | ~~3~~ |`input`, `inputs`, `env` all implemented |
326304
| Regex functions | 2 | `match` (jaq omits capture `name` field), `scan` (jaq needs explicit `"g"` flag) |
327305

328306
**Recently Fixed:**

0 commit comments

Comments
 (0)