user-facing chunking of lemmata

I have a case like this:

![Image](https://github.com/user-attachments/assets/7a16b1c7-d82a-4550-b01e-55e63561958c)

This is a chapter colophon with three witnesses.  Saktumiva has found strings to collate, but the resulting apparatus is messy and misleading.  What we have in the witnesses is three different colophon statements.  What Saktumiva is seeing is sub-strings that it is trying to collate meaningfully.  The outcome is that the witness texts are chopped up in ways that obscure the reality of the transmission, which is more simple that the apparatus suggests.

There are other places where I have wished that I had a way of telling Saktumiva's collation algorithm not to subdivide a string for the purposes of comparison.

Would it be possible to introduce a way of manually marking the preferred chunking of strings?  The tag that comes to mind immediately is `<seg>`  (maybe with an attribute).

Then, in the above example, my witnesses would read:

```
K: <seg>  kāya<abbr>ci</abbr><ex>kitsāyāṃ</ex> || <num style="letter-numeral">16</num>
            || 0 ||</seg>

H:  <seg>uttaratantre<lb n="4"/> ekapañcāśattamaḥ
                        kāyacikitsāyāṃ pratiśyāyacikitsā ṣo<cb/>ḍaśamo dhyāyaḥ ||</seg>

A: <seg> iti suśrutasaṃhitāyām uttaratantrāntargate
            śālākyatantre pratiśyāyapratiṣedho nāma caturviṃśatitamo 'dhyāyaḥ
            || 24 ||</seg>

```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

user-facing chunking of lemmata #103

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

user-facing chunking of lemmata #103

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions