- Scaling Laws: Empirical & theoretical analysis in pipeline and distributed settings; connections to loss landscape geometry (e.g., Hessian spectrum of Transformers)
- Pipeline Parallelism: Scheduling, memory optimization, and convergence guarantees under asynchrony
- Theoretical Deep Learning: Loss landscapes, Bayesian/probabilistic modeling, and applications to Reinforcement Learning
- Optimization Methods: Advancing zeroth-order (ZO) and first-order algorithms for LLMs, decentralized systems, and non-convex problems
π B.S. in Applied Mathematics & Informatics @ MIPT (2022β2026)
π Yandex School of Data Analysis @ YSDA (2025-2026)
π― Data Science Track (MIPT & Yandex SDA, 2024β2026) @ DS (2023-2026)
π§βπ« Teaching Assistant: Algorithms, Data Structures, Intro to AI