- Scaling Laws: Empirical & theoretical analysis in pipeline and distributed settings; connections to loss landscape geometry (e.g., Hessian spectrum of Transformers)
- Pipeline Parallelism: Scheduling, memory optimization, and convergence guarantees under asynchrony
- Theoretical Deep Learning: Loss landscapes, Bayesian/probabilistic modeling, and applications to Reinforcement Learning
- Optimization Methods: Advancing zeroth-order (ZO) and first-order algorithms for LLMs, decentralized systems, and non-convex problems
🎓 B.S. in Applied Mathematics & Informatics @ MIPT (2022–2026)
🎓 Yandex School of Data Analysis @ YSDA (2025-2026)
🎯 Data Science Track (MIPT & Yandex SDA, 2024–2026) @ DS (2023-2026)
🧑🏫 Teaching Assistant: Algorithms, Data Structures, Intro to AI