Skip to content

Partial mention matches get higher scores than full ones #123

@abhinavkulkarni

Description

@abhinavkulkarni

Hi,

One quirk I have observed is that partial mention matches seem to get scored higher by the ED than the full ones. The following example illustrates the point:

import requests
import spacy


nlp = spacy.load('en_core_web_trf')


API_URL = "http://0.0.0.0:5555"
text_doc = """In early September, in just 48 hours the UK got a new prime minister (Liz Truss) and a new king (Charles III, following the death of Queen Elizabeth II).

Both take over at a turbulent time in British politics, with no shortage of current and future challenges. To name just a few: a stagnant economy, sky-high energy prices, more Brexit fallout with the EU, and Scots demanding a fresh independence vote.

On GZERO World, Ian Bremmer speaks to former British PM Tony Blair (1997-2007), who believes there will be a lot of uncertainty over the next year or two if Truss insists on big tax cuts and big borrowing.

Blair also looks back at the queen's legacy and the future of the monarchy, explains why Brexit will hurt but probably not fragment the UK, and defends why we need to return to his comfort zone of the political center to fix today's problems.
"""

doc = nlp(text_doc)

spans = []
for ent in doc.ents:
    if ent.label_ == 'PERSON':
        span = (ent.start_char, len(ent.text))
        spans.append(span)

ed_result = requests.post(API_URL, json={
    "text": text_doc,
    "spans": spans
}).json()

for result in ed_result:
    print(result)

I get the following output:

[70, 9, 'Liz Truss', 'Liz_Truss', 0.3872783780234141, 0.0, 'NULL']
[97, 11, 'Charles III', 'Charles,_Prince_of_Wales', 0.3447332806264307, 0.0, 'NULL']
[139, 12, 'Elizabeth II', 'Elizabeth_II', 0.5253115976087314, 0.0, 'NULL']
[423, 11, 'Ian Bremmer', 'Ian_Bremmer', 0.3872783780234141, 0.0, 'NULL']
[463, 10, 'Tony Blair', 'Prime_Minister_of_the_United_Kingdom', 0.5042874506104144, 0.0, 'NULL']
[564, 5, 'Truss', 'Liz_Truss', 0.7929572470920269, 0.0, 'NULL']
[614, 5, 'Blair', 'Tony_Blair', 0.959426865481041, 0.0, 'NULL']

As can be seen, the mention Truss seems to get quite a high score compared to the full mention Liz Truss.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions