JSP3

Semester project 3 for Linguistics

Goals

analyze some text
learn techniques for analyzing text
learn some software engineering skills
learn git

2025 task: Word Sense Disambiguation

Take a text, sense-annotate open-class words with a wordnet.

In previous work, we have done this manually:

Teaching Through Tagging — Interactive Lexical Semantics (Bond et al., GWC 2021)

Do this automatically
- evaluate how well this works EVAL
  - WHO:
  - all the WSD tasks need this
  - look at common errors
  - fix some (e.g. se/si)
- test different contexts WSD-C
  - Effective context engineering for AI agents
- test different wordnet information WSD-W
  - watch Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines
- test different LLM models WSD-M
  - look at settings to make it more efficient
  - e.g., prompt caching
- find translations and use as context ALIGN
  - find epub, extract text, split into para, align
    - use: https://github.com/averkij/lingtrain-aligner/releases/tag/0.1.0
  - then add to WSD-C
- textual criticism --- compare versions TEXT
  - CNK version
  - https://cs.wikisource.org/wiki/V%C3%A1lka_s_mloky
- look at sentiment over the story SENTI
  - use general tool
  - use senses
  - compare
  - visualize
- improve Czech wordnet EXPAND
  - add new senses to existing concepts using aligned data
  - verify with LLM?
  - create candidate definitions/examples?
- add new Czech concepts NEW
  - add new suggestions for concepts to the hierarchy difficult

Goals

get something useful for each task
combine to make a best-of-breed
write, submit and publish a paper
release at least one automatically tagged, aligned corpus

Approach

work on tasks in pairs
use github to coordinate
- GitHub quickstart
- git an Introduction

Fortnightly meeting

5-10 minutes progress
longer discussion of issues as necessary
small meetings pair+me or WSD+me, .... as necessary

Next tasks

ALL make github account
- send me accountname
- I will add to github
- then add your name to a task
- WSD --- try to run ollama on a prompt
- ALIGN --- try to run align on chapter 1 of VsM
- TEXT --- look for existing information on Karel Capek and versions, maybe ask Bohemian studies
FCB
- prepare databases and data
- meet to set up eval, ...

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
prepare		prepare
tasks		tasks
.gitignore		.gitignore
README.md		README.md
attendance.md		attendance.md
report.tex		report.tex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JSP3

Goals

2025 task: Word Sense Disambiguation

Goals

Approach

Fortnightly meeting

Next tasks

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

bond-lab/JSP3

Folders and files

Latest commit

History

Repository files navigation

JSP3

Goals

2025 task: Word Sense Disambiguation

Goals

Approach

Fortnightly meeting

Next tasks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages