Skip to content
View jknafou's full-sized avatar

Highlights

  • Pro

Block or report jknafou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jknafou/README.md

πŸ‘‹ Hi, I'm Julien Knafou, Ph.D.

Data Scientist | NLP Researcher | Multilingual AI Enthusiast
Geneva, Switzerland


πŸš€ About Me

I'm a postdoctoral researcher specializing in large language models (LLMs), cross-lingual/domain transfer learning, and machine translation. My work focuses on making scientific and technical knowledge accessible across languages and domains, with a strong emphasis on open-source contributions and reproducible research.


πŸ”¬ Research Interests

  • Large Language Models (LLMs)
  • Cross-Lingual & Domain Transfer Learning
  • Machine Translation (MT)
  • Information Retrieval (IR) & Retrieval-Augmented Generation (RAG)
  • Data Augmentation for NLP

πŸ› οΈ Skills & Tools

  • Languages: Python, R, LaTeX
  • Frameworks/Libraries: Transformers, PyTorch, TensorFlow, HuggingFace, Ray Tune, FAIRSEQ, FastAPI
  • Infrastructure: Google Cloud, OpenStack, SLURM (HPC)
  • Deployment: Docker, FastAPI
  • Languages Spoken: English, French, Spanish, Portuguese, German (B2)

πŸ“š Selected Projects

  • TransBERT: Leading the development of a state-of-the-art LLM leveraging synthetically translated data. Achieves SOTA performance with open-source code and a forthcoming EMNLP2025 paper.
  • Large Scale Corpus Translation: Designed a scalable package for translating massive corpora (e.g., 22M PubMed abstracts ENβ†’FR), to be released with TransBERT.
  • RAG TREC: Developing a Retrieval-Augmented Generation pipeline for the upcoming TREC conference.

πŸ“ Publications


πŸ“« Let's Connect


"Bridging language barriers through AI and open science."

Pinned Loading

  1. TransCorpus TransCorpus Public

    TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.

    Python

  2. bibtex-normalizer bibtex-normalizer Public

    Normalize a .bib file with multiple citations by casing the first letter of each word of the title if a bibtex was not correctly generated. The current state makes it easy to modify other fields th…

    TeX 1 1