Skip to content

Consider n-grams #25

@cmcaine

Description

@cmcaine

At the risk of telling you things you're already well aware of, I'm going to forge ahead with this issue ;)

In the demo, if you search for e.g. "good work is no", then the top results are entries that contain the given words more often, rather than the entry that contains the full phrase "good work is no".

This is expected from the algorithm you describe in your blog post (also, did you realise that you are describing a form of TF-IDF scoring for each term?).

You could improve relevance for these kinds of searches by searching and scoring bigrams or n-grams in addition to individual terms.

I'm not invested in this particularly, (i.e. I don't intend to use this library), but I was browsing your projects and took a look. Maybe these comments are interesting, maybe not!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions