Skip to content

nicolauduran45/mutomo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mutomo 🥤🤖

Description

mutomo aims to build an open dataset for multidimensional key information extraction from scientific abstracts, and to fine-tune a small, local language model for structured, abstractive extraction tasks.

Objectives

  • Create a high-quality, annotated dataset covering research motivations, objectives, methods, impacts, and topics.
  • Design an annotation workflow and task definition using Argilla.
  • Fine-tune and evaluate a small, open-source LLM for efficient, on-premise extraction.

🏁 Getting Started

Clone the Repository

⚙️ Environment

To use conda for full reproducibility:

conda env create -f environment.yml --name mutomo
conda activate mutomo

To update/export your environment:

conda env export --name huggingface-gpu > environment.yml

📝Project Tasks

  • Define task and annotation schema
  • Add example annotations in Argilla
  • Annotator onboarding & training
  • Build annotated dataset
  • Fine-tune and test small LLM model
  • Share dataset and baseline models

🛠 Contributing

Feel free to submit issues or pull requests!

⚖️ License

This project is licensed under the Apache License 2.0.

📚 References

If you find this work useful, please cite: SOON!

About

*mutomo* aims to build an open dataset for multidimensional key information extraction from scientific abstracts, and to fine-tune a small, local language model for structured, abstractive extraction tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors