Skip to content

Releases: medelman17/eyecite-ts

v0.11.2

19 Apr 20:17
e3db78c

Choose a tag to compare

Patch Changes

  • #199 6797408 Thanks @medelman17! - fix: suppress phantom citations emitted from numeric-prefixed party names (#196)

    Real-world NY caption like Board of Mgrs. of the 15 Union Sq. W. Condominium v. BCRE 15 Union St., LLC, 2025 NY Slip Op 00784 emitted two case
    citations: the real slip op plus a phantom 15 Union Sq. W. Condominium v. BCRE 15 extracted from inside the plaintiff's name. The phantom read 15
    as volume, Union Sq. W. Condominium v. BCRE as reporter, and 15 as page.

    Root cause. The state-reporter regex's non-greedy reporter capture
    ([A-Za-z.\d\s]+?) happily spanned the " v. " case-name separator and
    backtracked until a second number appeared. The downstream false-positive
    filter caught this only when reporters-db was loaded — which is opt-in for
    bundle-size reasons, so most consumers saw the phantom pass through.

    Fix. Added negative lookahead (?!\s+vs?\.\s) to both state-reporter
    and law-review patterns so the reporter/journal capture cannot span a
    " v. " or " vs. " token. No real US reporter or journal name contains
    that sequence. Applied to both patterns because a first-pass guard on just
    state-reporter surfaced the same phantom under law-review.

    Five new regression tests: the exact #196 text, a vs. variant, a cross-type
    guard (no phantom journal), and two adversarial controls.

v0.11.1

19 Apr 19:56
8ed22bd

Choose a tag to compare

Patch Changes

  • #197 925a719 Thanks @medelman17! - fix: align CASE_NAME_ABBREVS with reporters-db Bluebook T6 list + ampersand support

    After three consecutive bug reports (#187, #188, #193) exposing missing
    abbreviations, this change aligns CASE_NAME_ABBREVS with the canonical
    Bluebook T6 case-name abbreviation list maintained by Free Law Project
    (reporters-db/case_name_abbreviations.json).

    Three improvements:

    • Strip internal apostrophes in stem lookup. isLikelyAbbreviationPeriod
      previously kept inner apostrophes, so Nat'l. computed stem nat'l which
      no reasonable pure-letter set could match. Now normalized to natl, which
      matches the Bluebook's apostrophe-form abbreviations as pure-letter stems.
    • 41 new entries in CASE_NAME_ABBREVS. Period-forms (co, cmty,
      envtl, gend, par, prot, ref, sol, cty, adver) and
      apostrophe-forms (assn, dept, natl, intl, govt, commn, commr,
      contl, fedn, meml, pship, profl, secy, sholder, socy,
      commcn, engg, engr, entmt, envt, examr, invr, admr, admx,
      empr, empt, exr, exx, publg, publn, regl). co was the
      highest-impact gap — "Smith & Co. United States Corp." was silently
      truncated to "United States Corp." because "Co. U" fired the
      sentence-boundary scan.
    • & in isLikelyPartyName. Ampersand is ubiquitous in corporate
      captions ("Smith & Jones", "Goldman, Sachs & Co.") and previously caused
      the Priority-3 single-party fallback (#193) to reject such captions. Now
      treated as a valid standalone token.

    7 new regression tests covering period-forms, apostrophe-forms with trailing
    period, adversarial Dep't … caption, and ampersand patterns.

v0.11.0

19 Apr 16:29
768ffe5

Choose a tag to compare

Minor Changes

  • #192 7f84d0c Thanks @medelman17! - feat: star-pagination (at *N) support on all pincite-bearing citation types (#191)

    Star-pagination pincites (at *1, at *2-4) were silently dropped on id,
    supra, shortFormCase, full case cites with slip-opinion reporters (NY Slip
    Op), and neutrals (Westlaw, Lexis). In real-world NY state-court briefs this
    meant a significant fraction of pincites came back undefined. Plain-integer
    pincites (at 465) continued to work.

    Changes:

    • parsePincite / PinciteInfo — accept optional * prefix; new
      starPage?: boolean flag distinguishes slip-opinion pages from reporter
      pages. Existing page: number still carries the numeric portion, so
      backward compatibility for consumers reading pincite as a number is
      preserved.
    • Full case citesPINCITE_REGEX, LOOKAHEAD_PINCITE_REGEX, and
      PINCITE_SKIP_REGEX now accept an optional at keyword and * prefix.
      Pincite extraction also runs when no trailing parenthetical is present,
      so forms like 2020 NY Slip Op 00001 at *2 capture the pin even though
      there is no (Court YYYY) block.
    • Short-form citationsID_PATTERN, IBID_PATTERN, SUPRA_PATTERN,
      STANDALONE_SUPRA_PATTERN, and SHORT_FORM_CASE_PATTERN now accept *?
      before the pincite digits. The matching extractors populate
      pinciteInfo.starPage and now expose pinciteInfo on IdCitation,
      SupraCitation, and ShortFormCaseCitation.
    • Neutral citationsNeutralCitation gains pincite?: number and
      pinciteInfo?: PinciteInfo fields. extractNeutral now accepts the
      cleaned source text and extracts a trailing , at *N / at *N pincite.
      Previously, numeric pincites on neutrals were also silently dropped;
      this change fixes that as a side effect.

    Known limitation: the second occurrence of a NY Slip Op short-form
    (2020 NY Slip Op 00001 at *2) is still classified as case rather than
    shortFormCase, because SHORT_FORM_CASE_PATTERN forbids a page between
    the reporter and at. The pincite data itself is captured correctly.
    Shortform classification for NY Slip Op will be addressed in a follow-up.

Patch Changes

  • #195 f10c234 Thanks @medelman17! - fix: recognize single-party corporate captions and In the Matter of … prefix (#193)

    FullCaseCitation.caseName came back null for any caption that didn't
    contain v. or match the short procedural-prefix list (In re,
    Matter of, Estate of, Ex parte, etc.). Common NY patterns like
    Board of Mgrs. of the St. Tropez Condominium and Board of Directors of Hill Park silently lost their case names — downstream UI fell back
    to displaying the bare reporter triple.

    Two root causes:

    • Missing long-form procedural prefix. In the Matter of X was
      reduced to Matter of X because the short prefix matched mid-string
      before the long one could. Added In the Matter of to
      PROCEDURAL_PREFIX_REGEX and the extractPartyNames prefix list, both
      with priority over Matter of.

    • No generic fallback for single-party captions. When both V. and
      procedural-prefix scans fail, the backward scanner now uses the
      post-truncation precedingText itself as the caption, after stripping
      any leading signal word (See, cf., etc.) and validating via
      isLikelyPartyName + SENTENCE_INITIAL_WORDS. Because the truncation
      step already bounds precedingText by sentence/citation/paren-signal
      boundaries, sentence prose like "The court held that..." is not
      mis-matched.

      11 new regression tests cover corporate captions (Board of Mgrs. of,
      Board of Managers of, Board of Directors of, bare Corp.),
      In the Matter of priority over Matter of, sentence-prose safety, and
      pre-existing adversarial/Estate of/ex rel. controls.

v0.10.3

19 Apr 02:59
7381360

Choose a tag to compare

Patch Changes

  • #189 7425448 Thanks @medelman17! - fix: close remaining case-name boundary gaps for NY-style citations (#187, #188)

    Two root causes behind remaining caseName failures on real-world NY briefs:

    • Missing geographic/street abbreviations. Is. (Island), Mt. (Mount),
      Ft. (Fort), Pt. (Point), Rt. (Route), St. (Saint/Street), Blvd.,
      Sq., Hwy., Pkwy., and Hts. were not in CASE_NAME_ABBREVS, so the
      backward scanner treated their periods as sentence boundaries and truncated
      names like Clark-Fitzpatrick, Inc. v. Long Is. R.R. Co. and
      Matter of Long Is. Power Auth. Hurricane Sandy Litig.. Added to the
      Bluebook T6/T10 set.
    • Missing paren signal words. quoted in, accord, and the
      citing, e.g., form were not recognized as hard boundaries, so backward
      scans of citations introduced by those signals overshot into the prior
      citation's trailing parenthetical. Extended PAREN_SIGNAL_BOUNDARY_REGEX.

v0.10.2

16 Apr 15:23
59de8e3

Choose a tag to compare

Patch Changes

  • #185 e1a46d0 Thanks @medelman17! - fix: robust case-name boundary detection with Bluebook T6/T10 abbreviations (#182, #183, #184)

    Replace the narrow LEGAL_ABBREVS regex (~30 entries) with a comprehensive Bluebook-sourced abbreviation set (200+ entries from T6/T7/T10) backed by heuristics for single-letter initials and dotted initialisms. Add hard boundary detection for Id. markers and parenthetical signal words (quoting, citing, cited in). Fixes case names that were undefined, truncated, or overshot when party names contained abbreviation chains like "Cent. Sch. Dist.", "Mgt., Inc.", or "A.N.L.Y.H. Invs."

v0.10.1

10 Apr 19:50
6a31695

Choose a tag to compare

Patch Changes

  • #178 d72ba1e Thanks @medelman17! - Fix case name extraction still capturing sentence context in two scenarios: sentence-initial pronouns like "This" bypassing the trimming guard, and "In" prefix not being stripped from caseName after extractPartyNames removes it from plaintiff.

v0.10.0

10 Apr 19:02
3e4f75a

Choose a tag to compare

Minor Changes

  • #177 fc83dff Thanks @medelman17! - Add granular component spans for all citation types. Each citation now carries a spans record with per-component position data (volume, reporter, page, court, year, caseName, plaintiff, defendant, signal, etc.). Explanatory parentheticals gain a span field. New spanFromGroupIndex utility exported for power users. Closes #172, closes #171.

Patch Changes

  • #175 76bd36d Thanks @medelman17! - Fix case name extraction capturing preceding sentence context as plaintiff. Add isLikelyPartyName validation with trimming when lowercase non-connector words are detected.

  • #173 d96719a Thanks @medelman17! - Fix Id. resolution to correctly resolve through short-form, supra, and non-case citations. Remove dead allowNestedResolution option.

v0.9.0

06 Apr 16:51
80717ff

Choose a tag to compare

Minor Changes

  • #166 42fc27d Thanks @medelman17! - Add statute citation patterns for 31 additional US state jurisdictions

    Expands the abbreviated-code pattern family from 12 to 43 states using a data-driven
    regex generation approach. Each new state was verified against real citation formats from
    court opinions and legislative text.

    New jurisdictions: AK, AZ, AR, CT, DC, HI, IA, ID, KS, KY, LA, ME, MN, MO, MS, MT,
    ND, NE, NH, NM, NV, OK, OR, RI, SC, SD, TN, VT, WI, WV, WY.

v0.8.4

06 Apr 15:05
c969193

Choose a tag to compare

Patch Changes

  • #164 449e1ea Thanks @medelman17! - Fix constitutional citation extraction gaps: support comma after "Const." separator, recognize "Amdt." abbreviation, and add low-confidence bare article pattern for standalone "Art. I, §8" references

v0.8.3

06 Apr 14:09

Choose a tag to compare

Patch Changes

  • #162 919c8c9 Thanks @medelman17! - fix: prevent greedy false matches in position mapping lookahead (#161)

    The rebuildPositionMaps lookahead found the first matching character,
    not the correct one. When normalizeDashes expanded em-dashes (— → ---)
    near text containing hyphens (e.g., page ranges like "110-115"), the
    deletion lookahead grabbed the wrong "-", collapsing subsequent position
    mappings and producing zero-length original spans on extracted citations.

    The fix adds a confirmation check: a lookahead match is only accepted when
    at least 3 characters after the match point also align. Both deletion and
    insertion directions are searched simultaneously and the shorter confirmed
    match wins, preventing greedy false matches.