Releases: medelman17/eyecite-ts
v0.11.2
Patch Changes
-
#199
6797408Thanks @medelman17! - fix: suppress phantom citations emitted from numeric-prefixed party names (#196)Real-world NY caption like
Board of Mgrs. of the 15 Union Sq. W. Condominium v. BCRE 15 Union St., LLC, 2025 NY Slip Op 00784emitted twocase
citations: the real slip op plus a phantom15 Union Sq. W. Condominium v. BCRE 15extracted from inside the plaintiff's name. The phantom read15
as volume,Union Sq. W. Condominium v. BCREas reporter, and15as page.Root cause. The
state-reporterregex's non-greedy reporter capture
([A-Za-z.\d\s]+?) happily spanned the" v. "case-name separator and
backtracked until a second number appeared. The downstream false-positive
filter caught this only whenreporters-dbwas loaded — which is opt-in for
bundle-size reasons, so most consumers saw the phantom pass through.Fix. Added negative lookahead
(?!\s+vs?\.\s)to bothstate-reporter
andlaw-reviewpatterns so the reporter/journal capture cannot span a
" v. "or" vs. "token. No real US reporter or journal name contains
that sequence. Applied to both patterns because a first-pass guard on just
state-reportersurfaced the same phantom underlaw-review.Five new regression tests: the exact #196 text, a
vs.variant, a cross-type
guard (no phantomjournal), and two adversarial controls.
v0.11.1
Patch Changes
-
#197
925a719Thanks @medelman17! - fix: alignCASE_NAME_ABBREVSwith reporters-db Bluebook T6 list + ampersand supportAfter three consecutive bug reports (#187, #188, #193) exposing missing
abbreviations, this change alignsCASE_NAME_ABBREVSwith the canonical
Bluebook T6 case-name abbreviation list maintained by Free Law Project
(reporters-db/case_name_abbreviations.json).Three improvements:
- Strip internal apostrophes in stem lookup.
isLikelyAbbreviationPeriod
previously kept inner apostrophes, soNat'l.computed stemnat'lwhich
no reasonable pure-letter set could match. Now normalized tonatl, which
matches the Bluebook's apostrophe-form abbreviations as pure-letter stems. - 41 new entries in
CASE_NAME_ABBREVS. Period-forms (co,cmty,
envtl,gend,par,prot,ref,sol,cty,adver) and
apostrophe-forms (assn,dept,natl,intl,govt,commn,commr,
contl,fedn,meml,pship,profl,secy,sholder,socy,
commcn,engg,engr,entmt,envt,examr,invr,admr,admx,
empr,empt,exr,exx,publg,publn,regl).cowas the
highest-impact gap — "Smith & Co. United States Corp." was silently
truncated to "United States Corp." because "Co. U" fired the
sentence-boundary scan. &inisLikelyPartyName. Ampersand is ubiquitous in corporate
captions ("Smith & Jones", "Goldman, Sachs & Co.") and previously caused
the Priority-3 single-party fallback (#193) to reject such captions. Now
treated as a valid standalone token.
7 new regression tests covering period-forms, apostrophe-forms with trailing
period, adversarialDep't …caption, and ampersand patterns. - Strip internal apostrophes in stem lookup.
v0.11.0
Minor Changes
-
#192
7f84d0cThanks @medelman17! - feat: star-pagination (at *N) support on all pincite-bearing citation types (#191)Star-pagination pincites (
at *1,at *2-4) were silently dropped onid,
supra,shortFormCase, full case cites with slip-opinion reporters (NY Slip
Op), and neutrals (Westlaw, Lexis). In real-world NY state-court briefs this
meant a significant fraction of pincites came backundefined. Plain-integer
pincites (at 465) continued to work.Changes:
parsePincite/PinciteInfo— accept optional*prefix; new
starPage?: booleanflag distinguishes slip-opinion pages from reporter
pages. Existingpage: numberstill carries the numeric portion, so
backward compatibility for consumers readingpinciteas a number is
preserved.- Full case cites —
PINCITE_REGEX,LOOKAHEAD_PINCITE_REGEX, and
PINCITE_SKIP_REGEXnow accept an optionalatkeyword and*prefix.
Pincite extraction also runs when no trailing parenthetical is present,
so forms like2020 NY Slip Op 00001 at *2capture the pin even though
there is no(Court YYYY)block. - Short-form citations —
ID_PATTERN,IBID_PATTERN,SUPRA_PATTERN,
STANDALONE_SUPRA_PATTERN, andSHORT_FORM_CASE_PATTERNnow accept*?
before the pincite digits. The matching extractors populate
pinciteInfo.starPageand now exposepinciteInfoonIdCitation,
SupraCitation, andShortFormCaseCitation. - Neutral citations —
NeutralCitationgainspincite?: numberand
pinciteInfo?: PinciteInfofields.extractNeutralnow accepts the
cleaned source text and extracts a trailing, at *N/at *Npincite.
Previously, numeric pincites on neutrals were also silently dropped;
this change fixes that as a side effect.
Known limitation: the second occurrence of a NY Slip Op short-form
(2020 NY Slip Op 00001 at *2) is still classified ascaserather than
shortFormCase, becauseSHORT_FORM_CASE_PATTERNforbids a page between
the reporter andat. The pincite data itself is captured correctly.
Shortform classification for NY Slip Op will be addressed in a follow-up.
Patch Changes
-
#195
f10c234Thanks @medelman17! - fix: recognize single-party corporate captions andIn the Matter of …prefix (#193)FullCaseCitation.caseNamecame backnullfor any caption that didn't
containv.or match the short procedural-prefix list (In re,
Matter of,Estate of,Ex parte, etc.). Common NY patterns like
Board of Mgrs. of the St. Tropez CondominiumandBoard of Directors of Hill Parksilently lost their case names — downstream UI fell back
to displaying the bare reporter triple.Two root causes:
-
Missing long-form procedural prefix.
In the Matter of Xwas
reduced toMatter of Xbecause the short prefix matched mid-string
before the long one could. AddedIn the Matter ofto
PROCEDURAL_PREFIX_REGEXand theextractPartyNamesprefix list, both
with priority overMatter of. -
No generic fallback for single-party captions. When both
V.and
procedural-prefix scans fail, the backward scanner now uses the
post-truncationprecedingTextitself as the caption, after stripping
any leading signal word (See,cf., etc.) and validating via
isLikelyPartyName+SENTENCE_INITIAL_WORDS. Because the truncation
step already boundsprecedingTextby sentence/citation/paren-signal
boundaries, sentence prose like "The court held that..." is not
mis-matched.11 new regression tests cover corporate captions (
Board of Mgrs. of,
Board of Managers of,Board of Directors of, bareCorp.),
In the Matter ofpriority overMatter of, sentence-prose safety, and
pre-existing adversarial/Estate of/ex rel.controls.
-
v0.10.3
Patch Changes
-
#189
7425448Thanks @medelman17! - fix: close remaining case-name boundary gaps for NY-style citations (#187, #188)Two root causes behind remaining
caseNamefailures on real-world NY briefs:- Missing geographic/street abbreviations.
Is.(Island),Mt.(Mount),
Ft.(Fort),Pt.(Point),Rt.(Route),St.(Saint/Street),Blvd.,
Sq.,Hwy.,Pkwy., andHts.were not inCASE_NAME_ABBREVS, so the
backward scanner treated their periods as sentence boundaries and truncated
names likeClark-Fitzpatrick, Inc. v. Long Is. R.R. Co.and
Matter of Long Is. Power Auth. Hurricane Sandy Litig.. Added to the
Bluebook T6/T10 set. - Missing paren signal words.
quoted in,accord, and the
citing, e.g.,form were not recognized as hard boundaries, so backward
scans of citations introduced by those signals overshot into the prior
citation's trailing parenthetical. ExtendedPAREN_SIGNAL_BOUNDARY_REGEX.
- Missing geographic/street abbreviations.
v0.10.2
Patch Changes
-
#185
e1a46d0Thanks @medelman17! - fix: robust case-name boundary detection with Bluebook T6/T10 abbreviations (#182, #183, #184)Replace the narrow LEGAL_ABBREVS regex (~30 entries) with a comprehensive Bluebook-sourced abbreviation set (200+ entries from T6/T7/T10) backed by heuristics for single-letter initials and dotted initialisms. Add hard boundary detection for Id. markers and parenthetical signal words (quoting, citing, cited in). Fixes case names that were undefined, truncated, or overshot when party names contained abbreviation chains like "Cent. Sch. Dist.", "Mgt., Inc.", or "A.N.L.Y.H. Invs."
v0.10.1
Patch Changes
- #178
d72ba1eThanks @medelman17! - Fix case name extraction still capturing sentence context in two scenarios: sentence-initial pronouns like "This" bypassing the trimming guard, and "In" prefix not being stripped from caseName after extractPartyNames removes it from plaintiff.
v0.10.0
Minor Changes
- #177
fc83dffThanks @medelman17! - Add granular component spans for all citation types. Each citation now carries aspansrecord with per-component position data (volume, reporter, page, court, year, caseName, plaintiff, defendant, signal, etc.). Explanatory parentheticals gain aspanfield. NewspanFromGroupIndexutility exported for power users. Closes #172, closes #171.
Patch Changes
-
#175
76bd36dThanks @medelman17! - Fix case name extraction capturing preceding sentence context as plaintiff. AddisLikelyPartyNamevalidation with trimming when lowercase non-connector words are detected. -
#173
d96719aThanks @medelman17! - Fix Id. resolution to correctly resolve through short-form, supra, and non-case citations. Remove deadallowNestedResolutionoption.
v0.9.0
Minor Changes
-
#166
42fc27dThanks @medelman17! - Add statute citation patterns for 31 additional US state jurisdictionsExpands the
abbreviated-codepattern family from 12 to 43 states using a data-driven
regex generation approach. Each new state was verified against real citation formats from
court opinions and legislative text.New jurisdictions: AK, AZ, AR, CT, DC, HI, IA, ID, KS, KY, LA, ME, MN, MO, MS, MT,
ND, NE, NH, NM, NV, OK, OR, RI, SC, SD, TN, VT, WI, WV, WY.
v0.8.4
Patch Changes
- #164
449e1eaThanks @medelman17! - Fix constitutional citation extraction gaps: support comma after "Const." separator, recognize "Amdt." abbreviation, and add low-confidence bare article pattern for standalone "Art. I, §8" references
v0.8.3
Patch Changes
-
#162
919c8c9Thanks @medelman17! - fix: prevent greedy false matches in position mapping lookahead (#161)The
rebuildPositionMapslookahead found the first matching character,
not the correct one. WhennormalizeDashesexpanded em-dashes (— → ---)
near text containing hyphens (e.g., page ranges like "110-115"), the
deletion lookahead grabbed the wrong "-", collapsing subsequent position
mappings and producing zero-length original spans on extracted citations.The fix adds a confirmation check: a lookahead match is only accepted when
at least 3 characters after the match point also align. Both deletion and
insertion directions are searched simultaneously and the shorter confirmed
match wins, preventing greedy false matches.