Data Scientist | NLP Researcher | Multilingual AI Enthusiast
Geneva, Switzerland
I'm a postdoctoral researcher specializing in large language models (LLMs), cross-lingual/domain transfer learning, and machine translation. My work focuses on making scientific and technical knowledge accessible across languages and domains, with a strong emphasis on open-source contributions and reproducible research.
- Large Language Models (LLMs)
- Cross-Lingual & Domain Transfer Learning
- Machine Translation (MT)
- Information Retrieval (IR) & Retrieval-Augmented Generation (RAG)
- Data Augmentation for NLP
- Languages: Python, R, LaTeX
- Frameworks/Libraries: Transformers, PyTorch, TensorFlow, HuggingFace, Ray Tune, FAIRSEQ, FastAPI
- Infrastructure: Google Cloud, OpenStack, SLURM (HPC)
- Deployment: Docker, FastAPI
- Languages Spoken: English, French, Spanish, Portuguese, German (B2)
- TransBERT: Leading the development of a state-of-the-art LLM leveraging synthetically translated data. Achieves SOTA performance with open-source code and a forthcoming EMNLP2025 paper.
- Large Scale Corpus Translation: Designed a scalable package for translating massive corpora (e.g., 22M PubMed abstracts ENβFR), to be released with TransBERT.
- RAG TREC: Developing a Retrieval-Augmented Generation pipeline for the upcoming TREC conference.
- TransBERT: A Framework for Synthetic Translation in Domain-Specific Language Modeling (EMNLP2025, submission)
- Deep learning-based risk prediction for clinical trials (Patterns, 2023)
- Ensemble of deep learning language models for COVID-19 literature (Systematic Reviews, 2023)
- Multilingual RECIST classification in radiology (Frontiers in Digital Health, 2023)
- More on Google Scholar | DBLP
"Bridging language barriers through AI and open science."
