Skip to content

Conversation

@ljzxc
Copy link
Contributor

@ljzxc ljzxc commented Mar 13, 2025

Support combined character sets within a single token

Changes

  • Modified tokenize_format_string to handle combined character sets within a single pattern token
  • Now treats patterns like {lu} as a single position that can use any character from the combined set
  • Instead of creating separate tokens for each character class, concatenates all specified character sets into a single larger set
  • For example, {lu} now creates one token containing all lowercase and uppercase letters, allowing for all combinations at that position
  • Added better handling for empty strings and invalid charset specifiers
  • Improved check for distinguishing between range patterns ({a-z}) and combined character sets

Impact

This enables more flexible pattern specifications for users who want to specify multiple character classes at a single position. For instance, {lud} will now generate all combinations with lowercase letters, uppercase letters, and digits at that position, rather than treating them as three separate positions in sequence.

Example

$ gorilla -p lol{lu}{lu} 
gorilla: (warning) missing mutation sets
gorilla: will generate 2704 words from a pattern lol{lu}{lu}
         sizes before mutations: 16224 bytes / 0 MB / 0 GB / 0 TB
lolaa
lolba
lolca
[...]

$ gorilla -p test{luds}
gorilla: (warning) missing mutation sets
gorilla: will generate 95 words from a pattern test{luds}
         sizes before mutations: 570 bytes / 0 MB / 0 GB / 0 TB
testa
testb
[...]
test}
test~
gorilla: finished in 322µs. 95 words -> 95 words

Ref #42

@andreiverse
Copy link
Owner

Looks good to me. Amazing work!

@ljzxc
Copy link
Contributor Author

ljzxc commented Mar 30, 2025

Hey, let me know if you need anything else to merge this. Thanks!

@andreiverse andreiverse merged commit 897d75c into andreiverse:main Apr 2, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants