Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]
name = "cscsca"
authors = ["Charles Feyen"]
version = "0.27.1"
version = "0.28.0"
edition = "2024"
readme = "README.md"
keywords = ["linguistics", "conlang", "sound_change_applier"]
Expand Down
23 changes: 14 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ A sound change applier based on linguistic sound change notation.
- Expansive conditions and anti-conditions
- Definitions that can be inserted anywhere in a rule
- Automatic and manual matching for lists of phones
- Gaps of arbitrary phones in conditions (useful for harmony)
- Arbitrary length sections of repeated phones
- Can get information to use in conditions at runtime (variables)
- Reasonably minimalist and simple, but also highly expressive and versatile
- Usable as a crate that can be adapted to fit many mediums beyond CLI
Expand Down Expand Up @@ -57,9 +57,13 @@ h >>

### Scopes
Scopes are a way to dynamically determine which phone, group of phones, or lack thereof exists in a rule.
There are two types of scopes
There are three types of scopes
- optional **`(`**...**`)`**: a phone or group of phones that is optional
- selection **`{`**...**`,`**...**`}`**: a list of comma-separated phones or a group of phones that selects one phone or group of phones in that list
- repetition **`[`**...**`]`**: a phone or group of phones repeated 0 or more times. If a **`!`** is added in the scope, the scope represents the phone or group of phones before the **`!`** repeated 0 or more times, if it does not contain the phone or group of phones after the **`!`**


**Note**: repetition scopes are only allowed in conditions/anti-conditions (see: Conditions and Anti-Conditions)

Examples:
```cscsca
Expand All @@ -71,13 +75,21 @@ l (j) >> j

## `p` and `b` become `f` and `v` respectively
{p, b} >> {f, v}

## `u` becomes `y` when after `i` in a word (see: Conditions and Anti-Conditions)
u >> y / i [*] _

## `u` becomes `y` when after `i` in a word, unless a `w` is between the two (see: Conditions and Anti-Conditions)
u >> y / i [* ! w] _
```

### Labels
As seen in the example above, corresponding scopes in the input and output try to agree on what they choose. However, there are times when we want this behavior to be different than the default or expanded to conditions

To force scopes to agree on what they choose, we can use labels. A label has a name that starts with **`$`** and precedes a scope

**Note**: repetition scopes agree not in phones, but in phone count, causing agreeing repetition scopes to be the same length or shorter than the one that sets the agreement

Examples:
```cscsca
## `i` and `u` merge with preceding `h` or `x` into `j` `i` and `w` `u`
Expand Down Expand Up @@ -169,16 +181,9 @@ DEFINE F {f, s, ç, x}

### Special Characters
- **`*`**: represents any non-boundary phone. **`*`** may be preceded by a label to agree on which phone is represented
- **`..`**: a gap of zero or more non-boundary phones. (**Notes**: **`..`** must have a space on both sides and is only allowed in conditions). A gap may be preceded by a label to limit gap length to less than or equal to the length of the first gap with the same label
- **`#`**: a word boundary
- **`\`**: escapes the effects of the following character, may be used at the end of a line to continue the rule on the next line

### Reserved Characters
Characters that do nothing, but need to be escaped
- **`.`**
- **`[`**
- **`]`**

### IO and Variables
To print the current phonetic form, type **`PRINT`** at the start of a line, followed by the message you would like to print with it

Expand Down
2 changes: 1 addition & 1 deletion docs/README_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ A sound change applier based on linguistic sound change notation.
- Expansive conditions and anti-conditions
- Definitions that can be inserted anywhere in a rule
- Automatic and manual matching for lists of phones
- Gaps of arbitrary phones in conditions (useful for harmony)
- Arbitrary length sections of repeated phones
- Can get information to use in conditions at runtime (variables)
- Reasonably minimalist and simple, but also highly expressive and versatile
- Usable as a crate that can be adapted to fit many mediums beyond CLI
Expand Down
21 changes: 13 additions & 8 deletions docs/writing_rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,13 @@ h >>

### Scopes
Scopes are a way to dynamically determine which phone, group of phones, or lack thereof exists in a rule.
There are two types of scopes
There are three types of scopes
- optional **`(`**...**`)`**: a phone or group of phones that is optional
- selection **`{`**...**`,`**...**`}`**: a list of comma-separated phones or a group of phones that selects one phone or group of phones in that list
- repetition **`[`**...**`]`**: a phone or group of phones repeated 0 or more times. If a **`!`** is added in the scope, the scope represents the phone or group of phones before the **`!`** repeated 0 or more times, if it does not contain the phone or group of phones after the **`!`**


**Note**: repetition scopes are only allowed in conditions/anti-conditions (see: Conditions and Anti-Conditions)

Examples:
```cscsca
Expand All @@ -52,13 +56,21 @@ l (j) >> j

## `p` and `b` become `f` and `v` respectively
{p, b} >> {f, v}

## `u` becomes `y` when after `i` in a word (see: Conditions and Anti-Conditions)
u >> y / i [*] _

## `u` becomes `y` when after `i` in a word, unless a `w` is between the two (see: Conditions and Anti-Conditions)
u >> y / i [* ! w] _
```

### Labels
As seen in the example above, corresponding scopes in the input and output try to agree on what they choose. However, there are times when we want this behavior to be different than the default or expanded to conditions

To force scopes to agree on what they choose, we can use labels. A label has a name that starts with **`$`** and precedes a scope

**Note**: repetition scopes agree not in phones, but in phone count, causing agreeing repetition scopes to be the same length or shorter than the one that sets the agreement

Examples:
```cscsca
## `i` and `u` merge with preceding `h` or `x` into `j` `i` and `w` `u`
Expand Down Expand Up @@ -150,16 +162,9 @@ DEFINE F {f, s, ç, x}

### Special Characters
- **`*`**: represents any non-boundary phone. **`*`** may be preceded by a label to agree on which phone is represented
- **`..`**: a gap of zero or more non-boundary phones. (**Notes**: **`..`** must have a space on both sides and is only allowed in conditions). A gap may be preceded by a label to limit gap length to less than or equal to the length of the first gap with the same label
- **`#`**: a word boundary
- **`\`**: escapes the effects of the following character, may be used at the end of a line to continue the rule on the next line

### Reserved Characters
Characters that do nothing, but need to be escaped
- **`.`**
- **`[`**
- **`]`**

### IO and Variables
To print the current phonetic form, type **`PRINT`** at the start of a line, followed by the message you would like to print with it

Expand Down
6 changes: 3 additions & 3 deletions src/applier/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ fn patterns_to_phones<'s: 'p, 'p>(patterns: &[Pattern<'s>], choices: &Choices<'_
return Err(ApplicationError::UnmatchedTokenInOutput(pattern.clone()));
}
},
Pattern::Gap { .. } => return Err(ApplicationError::GapOutOfCond),
Pattern::Repetition { .. } => return Err(ApplicationError::RepetitionOutOfCond),
_ => return Err(ApplicationError::UnmatchedTokenInOutput(pattern.clone()))
}
}
Expand All @@ -238,7 +238,7 @@ pub enum ApplicationError<'s> {
UnmatchedTokenInOutput(Pattern<'s>),
InvalidSelectionAccess(Pattern<'s>, usize),
ExceededLimit(LimitCondition),
GapOutOfCond,
RepetitionOutOfCond,
PatternCannotBeConvertedToPhones(Pattern<'s>),
}

Expand All @@ -257,7 +257,7 @@ impl std::fmt::Display for ApplicationError<'_> {
LimitCondition::Time(_) => "Could not apply changes in allotted time",
LimitCondition::Count { attempts: _, max: _ } => "Could not apply changes with the allotted application attempts",
}),
Self::GapOutOfCond => write!(f, "{}", RuleStructureError::GapOutOfCond),
Self::RepetitionOutOfCond => write!(f, "{}", RuleStructureError::RepetitionOutOfCond),
Self::PatternCannotBeConvertedToPhones(pattern) => write!(f, "'{pattern}' cannot be converted to a phone or list of phones"),
}
}
Expand Down
2 changes: 1 addition & 1 deletion src/assets/demo.sca
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,4 @@ DEFINE Pv- { p, t, k }
DEFINE Pv+ { b, d, g }

## u is fronted if an i exists before it in the same word without a w between them
u >> y / i $gap .. _ // w $gap .. _
u >> y / i [* ! w] _
1 change: 0 additions & 1 deletion src/escaped_strings.rs
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,6 @@ fn escape_input(input: &str) -> String {
fn niche_escapes() {
assert_eq!("\\_\\/".to_string(), escape_input("_/"));
assert_eq!("\\_a".to_string(), escape_input("_a"));
assert_eq!("\\. \\.\\. \\.\\.\\.".to_string(), escape_input(". .. ..."));

// isolated only escapes
assert!(check_escapes("\\_a").is_err());
Expand Down
4 changes: 1 addition & 3 deletions src/ir/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use std::num::NonZero;

use crate::{
executor::io_events::{GetType, IoEvent},
keywords::{AND_CHAR, COND_CHAR, DEFINITION_LINE_START, DEFINITION_PREFIX, ESCAPE_CHAR, NOT_CHAR, VARIABLE_PREFIX},
keywords::{DEFINITION_LINE_START, DEFINITION_PREFIX, ESCAPE_CHAR, VARIABLE_PREFIX},
ONE,
};

Expand Down Expand Up @@ -49,7 +49,6 @@ pub enum IrError<'s> {
EmptyDefinition,
BadEscape(Option<char>),
ReservedCharacter(char),
UnexpectedNot,
InvalidGetFormat(GetType),
}

Expand All @@ -66,7 +65,6 @@ impl std::fmt::Display for IrError<'_> {
Self::BadEscape(None) => write!(f, "Found '{ESCAPE_CHAR}' with no following character"),
Self::BadEscape(Some(c)) => write!(f, "Escaped normal character '{c}' ({ESCAPE_CHAR}{c})"),
Self::ReservedCharacter(c) => write!(f, "Found reserved character '{c}' consider escaping it ('{ESCAPE_CHAR}{c}')"),
Self::UnexpectedNot => write!(f, "Found '{NOT_CHAR}' not after '{COND_CHAR}' or '{AND_CHAR}'"),
Self::InvalidGetFormat(get_type) => write!(f, "Invalid format after '{get_type}', expected variable name and message"),
}
}
Expand Down
46 changes: 23 additions & 23 deletions src/ir/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,7 @@ fn get_lazy_def_name() {
assert_eq!(Some("ab"), get_first_phone("ab"));
assert_eq!(Some("a"), get_first_phone("a b"));
assert_eq!(Some("a"), get_first_phone("a/"));
assert_eq!(Some("a"), get_first_phone("a.."));
assert_eq!(None, get_first_phone(".. a"));
assert_eq!(Some("a.."), get_first_phone("a.."));
assert_eq!(None, get_first_phone("_"));
assert_eq!(None, get_first_phone("/"));
assert_eq!(Some("\\/"), get_first_phone("\\/"));
Expand Down Expand Up @@ -290,31 +289,32 @@ fn tokenize_scope_bounds_with_suroundings() {


#[test]
fn tokenize_gap() {
assert_eq!(Ok(vec![IrLine::Ir { tokens: vec![IrToken::Gap], lines: ONE}]), tokenize(".."));
}
fn tokenize_repetition() {
assert_eq!(Ok(vec![IrLine::Ir { tokens: vec![
IrToken::ScopeStart(ScopeType::Repetition),
IrToken::Any,
IrToken::ScopeEnd(ScopeType::Repetition),
], lines: ONE}]), tokenize("[*]"));

#[test]
fn tokenize_gap_with_suroundings() {
assert_eq!(Ok(vec![IrLine::Ir { tokens: vec![IrToken::Phone(Phone::Symbol("a")), IrToken::Gap, IrToken::Phone(Phone::Symbol("b")),], lines: ONE}]), tokenize("a .. b"));

assert_eq!(Ok(vec![IrLine::Ir { tokens: vec![
IrToken::ScopeStart(ScopeType::Repetition),
IrToken::Any,
IrToken::Negative,
IrToken::Phone(Phone::Symbol("w")),
IrToken::ScopeEnd(ScopeType::Repetition),
], lines: ONE}]), tokenize("[* ! w]"));
}

#[test]
fn tokenize_dot_with_suroundings() {
assert_eq!(
tokenize("a..b"),
Err((IrError::ReservedCharacter('.'), 1))
);

assert_eq!(
tokenize("a.b"),
Err((IrError::ReservedCharacter('.'), 1))
);

assert_eq!(
tokenize("a\\.b"),
Ok(vec![IrLine::Ir { tokens: vec![IrToken::Phone(Phone::Symbol("a\\.b"))], lines: ONE}])
);
fn tokenize_repetition_with_suroundings() {
assert_eq!(Ok(vec![IrLine::Ir { tokens: vec![
IrToken::Phone(Phone::Symbol("a")),
IrToken::ScopeStart(ScopeType::Repetition),
IrToken::Any,
IrToken::ScopeEnd(ScopeType::Repetition),
IrToken::Phone(Phone::Symbol("b")),
], lines: ONE}]), tokenize("a [*] b"));
}

#[test]
Expand Down
11 changes: 6 additions & 5 deletions src/ir/tokenizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ use std::num::NonZero;
use crate::{
escaped_strings::check_escapes,
executor::io_events::{GetType, IoEvent, RuntimeIoEvent, TokenizerIoEvent},
ir::{prefix::Prefix, tokenization_data::TokenizationData, tokens::{Break, IrToken}, IrError, IrLine},
keywords::{is_special_char, is_special_str, AND_CHAR, ANY_CHAR, ARG_SEP_CHAR, BOUND_CHAR, COMMENT_LINE_START, COND_CHAR, DEFINITION_LINE_START, DEFINITION_PREFIX, ESCAPE_CHAR, GAP_STR, GET_AS_CODE_LINE_START, GET_LINE_START, INPUT_PATTERN_STR, LABEL_PREFIX, LAZY_DEFINITION_LINE_START, LTR_CHAR, MATCH_CHAR, NOT_CHAR, OPTIONAL_END_CHAR, OPTIONAL_START_CHAR, PRINT_LINE_START, RTL_CHAR, SELECTION_END_CHAR, SELECTION_START_CHAR, SPECIAL_STRS, VARIABLE_PREFIX},
ir::{IrError, IrLine, prefix::Prefix, tokenization_data::TokenizationData, tokens::{Break, IrToken}},
keywords::{AND_CHAR, ANY_CHAR, ARG_SEP_CHAR, BOUND_CHAR, COMMENT_LINE_START, COND_CHAR, DEFINITION_LINE_START, DEFINITION_PREFIX, ESCAPE_CHAR, REPETITION_END_CHAR, REPETITION_START_CHAR, GET_AS_CODE_LINE_START, GET_LINE_START, INPUT_PATTERN_STR, LABEL_PREFIX, LAZY_DEFINITION_LINE_START, LTR_CHAR, MATCH_CHAR, NOT_CHAR, OPTIONAL_END_CHAR, OPTIONAL_START_CHAR, PRINT_LINE_START, RTL_CHAR, SELECTION_END_CHAR, SELECTION_START_CHAR, SPECIAL_STRS, VARIABLE_PREFIX, is_special_char, is_special_str},
phones::Phone,
sub_string::SubString,
tokens::{AndType, CondType, Direction, ScopeType, Shift, ShiftType},
Expand Down Expand Up @@ -137,6 +137,8 @@ fn parse_character<'s>(c: char, tokens: &mut Vec<IrToken<'s>>, prefix: &mut Opti
OPTIONAL_END_CHAR => push_phone_and(c, IrToken::ScopeEnd(ScopeType::Optional), tokens, slice, prefix, tokenization_data, lazy_expansions)?,
SELECTION_START_CHAR => push_phone_and(c, IrToken::ScopeStart(ScopeType::Selection), tokens, slice, prefix, tokenization_data, lazy_expansions)?,
SELECTION_END_CHAR => push_phone_and(c, IrToken::ScopeEnd(ScopeType::Selection), tokens, slice, prefix, tokenization_data, lazy_expansions)?,
REPETITION_START_CHAR => push_phone_and(c, IrToken::ScopeStart(ScopeType::Repetition), tokens, slice, prefix, tokenization_data, lazy_expansions)?,
REPETITION_END_CHAR => push_phone_and(c, IrToken::ScopeEnd(ScopeType::Repetition), tokens, slice, prefix, tokenization_data, lazy_expansions)?,
// handles simple one-to-one char to token pushes
AND_CHAR => push_phone_and(c, IrToken::Break(Break::And(AndType::And)), tokens, slice, prefix, tokenization_data, lazy_expansions)?,
NOT_CHAR => {
Expand All @@ -149,7 +151,7 @@ fn parse_character<'s>(c: char, tokens: &mut Vec<IrToken<'s>>, prefix: &mut Opti
tokens.pop();
IrToken::Break(Break::AntiCond)
},
_ => return Err(IrError::UnexpectedNot),
_ => IrToken::Negative,
};

push_phone_and(c, token, tokens, slice, prefix, tokenization_data, lazy_expansions)?;
Expand Down Expand Up @@ -230,7 +232,7 @@ fn push_phone_and<'s>(c: char, token: IrToken<'s>, tokens: &mut Vec<IrToken<'s>>

/// Pushes the slice as a phone and prepares it to start the next slice
///
/// Handles escape validity and input pattern and gap generation
/// Handles escape validity and input pattern generation
///
/// If there is a prefix, it either expands the phone as a definition or
/// inserts a selection token and resets the prefix to None
Expand All @@ -248,7 +250,6 @@ fn push_phone<'s>(tokens: &mut Vec<IrToken<'s>>, slice: &mut SubString<'s>, pref

match (&prefix, literal) {
(None, INPUT_PATTERN_STR) => tokens.push(IrToken::CondType(CondType::Pattern)),
(None, GAP_STR) => tokens.push(IrToken::Gap),
(None, "") => (),
(None, _) => tokens.push(IrToken::Phone(Phone::Symbol(literal))),
(Some(Prefix::Definition), _) => tokenization_data.get_definition(literal, tokens, lazy_expansions)?,
Expand Down
10 changes: 5 additions & 5 deletions src/ir/tokens.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
use std::fmt::Display;

use crate::{
keywords::{ANY_CHAR, ARG_SEP_CHAR, COND_CHAR, GAP_STR, LABEL_PREFIX},
keywords::{ANY_CHAR, ARG_SEP_CHAR, COND_CHAR, LABEL_PREFIX, NOT_CHAR},
phones::Phone,
tokens::{ScopeType, Shift, CondType, AndType}
tokens::{AndType, CondType, ScopeType, Shift}
};

/// Tokens that make up the intermediate representation of sound shifts
Expand All @@ -19,14 +19,14 @@ pub enum IrToken<'s> {
Any,
/// An item seperator for selection scopes
ArgSep,
/// A gap of size 0 or greater that does not contain a word boundery
Gap,
/// The main focus and type of a condition or anti-condition
CondType(CondType),
/// The start of a scope
ScopeStart(ScopeType),
/// The end of a scope
ScopeEnd(ScopeType),
/// Repetition negator
Negative,
}

impl Display for IrToken<'_> {
Expand All @@ -35,12 +35,12 @@ impl Display for IrToken<'_> {
Self::Any => write!(f, "{ANY_CHAR}"),
Self::ArgSep => write!(f, "{ARG_SEP_CHAR}"),
Self::Break(r#break) => write!(f, "{break}"),
Self::Gap => write!(f, "{GAP_STR}"),
Self::CondType(focus) => write!(f, "{focus}"),
Self::Phone(phone) => write!(f, "{phone}"),
Self::ScopeEnd(kind) => write!(f, "{}", kind.fmt_end()),
Self::ScopeStart(kind) => write!(f, "{}", kind.fmt_start()),
Self::Label(name) => write!(f, "{LABEL_PREFIX}{name}"),
Self::Negative => write!(f, "{NOT_CHAR}"),
}
}
}
Expand Down
11 changes: 4 additions & 7 deletions src/keywords.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ const_list! {
SELECTION_START_CHAR = '{';
/// the end of an selection scope
SELECTION_END_CHAR = '}';
/// the start of a repetition scope
REPETITION_START_CHAR = '[';
/// the end of a repetition scope
REPETITION_END_CHAR = ']';

// Cond foci
/// The seperator in a match condition
Expand All @@ -91,11 +95,6 @@ const_list! {
const_list! {
/// Special characters that are not used by themselves
UNUSED_CHARS: [pub char];

/// Used when duplicated for a gap
DOT_CHAR = '.';
SQUARE_START_CHAR = '[';
SQUARE_END_CHAR = ']';
}

const_list! {
Expand All @@ -110,8 +109,6 @@ const_list! {
/// Strings that act like special characters when isolated
pub(crate) SPECIAL_STRS: [pub &str];

/// A gap
GAP_STR = "..";
/// The input in a pattern condition
INPUT_PATTERN_STR = "_";
}
Expand Down
Loading