ML/NLP/LLM Engineer with expertise in AI Systems Architecture, Machine Learning, and Deep Learning. Specialized in building scalable AI systems, developing classical ML/DL models, implementing traditional NLP solutions, integrating large language models into production environments, and managing full development lifecycle from architecture to deployment.
Core competency lies in combining modern approaches (LLM, multi-agent systems, RAG) with proven classical ML and DL methodologies to ensure system stability, predictability, and high performance.
- Development of RAG and GraphRAG systems
- Model fine-tuning (LoRA, QLoRA, PEFT) for domain-specific applications
- Inference optimization (vLLM, TensorRT, llama.cpp, Ollama)
- Advanced prompt engineering (Zero-shot, Few-shot, CoT, ReAct, Planning)
- Multi-agent system architecture (LangGraph, AutoGEN, Planning Agents, Langchain)
- Agent integration with APIs and external services
- Dynamic tool selection systems
- Regression models (Linear, Ridge, Lasso) and classification algorithms (Logistic Regression, SVM, Decision Trees, Random Forest)
- Ensemble methods (Gradient Boosting, XGBoost, LightGBM, CatBoost)
- Clustering techniques (K-Means, DBSCAN, Hierarchical Clustering)
- Feature engineering, hyperparameter tuning, model validation
- Neural network development and training with PyTorch (MLP, CNN, RNN, LSTM, GRU)
- Transfer learning and fine-tuning of pre-trained models (ResNet, EfficientNet, BERT)
- Architecture optimization, regularization, scheduler implementation
- Large-scale dataset handling and GPU-accelerated training
- Text preprocessing: tokenization, stemming, lemmatization, stop-word removal
- Text vectorization (Bag-of-Words, TF-IDF, Word2Vec, FastText, GloVe)
- Text classification, sentiment analysis, topic modeling (LDA)
- Chatbot and dialogue system development using traditional NLP methods
- Integration of NLTK, spaCy, gensim into ML projects
- REST API development with FastAPI
- Data storage and caching with PostgreSQL and Redis
- API optimization for high-load environments
- Containerization (Docker, Docker Compose)
- CI/CD pipelines (GitHub Actions, GitLab CI)
- Model monitoring, logging, and management (MLFlow, LangSmith)
- Implementation and optimization of vector search (ChromaDB, Pinecone, Weaviate, FAISS)
- Hybrid search system development
- Implemented Enterprise RAG system with corporate process integration and hybrid search support
- Developed multi-agent platform using LangGraph for educational process automation
- Built GraphRAG Knowledge System utilizing Neo4j and LLM for semantic search
- Developed and deployed classical ML models for price prediction, data classification, and risk assessment
- Trained and optimized CNN and LSTM architectures for image analysis and sequence processing tasks
- Mentored junior engineers, established development standards, conducted code reviews
- Successfully transitioned multiple AI products from prototype to stable production deployment
Tanym (Astana) | NLP/LLM Engineer
December 2024 — Present
- Lead developer of NLP/LLM modules in AI assistant platform
- Multi-agent system development and LLM integration into educational workflows
- RAG pipeline implementation, API development, and service containerization
- Inference optimization and generation quality enhancement
Programming Languages: Python (async-first, typing, pydantic v2, dependency injection, clean architecture), SQL (query optimization, indexes, transactions), C++ (performance-critical inference, bindings)
ML / DL Frameworks: PyTorch (production training & fine-tuning), PyTorch Lightning, Hugging Face Transformers, Accelerate, PEFT (LoRA / QLoRA), scikit-learn (baselines & evaluation), XGBoost, LightGBM, CatBoost, numpy, pandas / polars
LLM Frameworks & Orchestration: LangChain (production pipelines, integrations), LangGraph (stateful agents), AutoGEN (multi-agent research & prototyping), OpenAI API, Hugging Face Inference, vLLM (serving integration)
NLP / Text Processing: spaCy (production NLP), NLTK (legacy & preprocessing), Embeddings (dense & domain-specific), Tokenization & chunking strategies, TF-IDF (baselines), Word2Vec, FastText, Text normalization & deduplication
Vector Search & Retrieval: ChromaDB (local & prototyping), FAISS (low-level vector search), Pinecone (managed vector DB), Weaviate (schema-aware vector search), pgvector, Hybrid search (BM25 + dense), Cross-encoder reranking
Databases & Caching: PostgreSQL (primary OLTP store), Redis (cache, rate limits, session memory)
MLOps / AI Ops: Docker (multi-stage builds), Docker Compose, GitHub Actions (CI/CD), MLflow (experiments & model registry), ClearML (pipeline orchestration), LangSmith (LLM tracing & eval), Environment-based config, Rollback-ready deployments
Inference & Performance Optimization: vLLM (high-throughput LLM serving), TensorRT (GPU optimization), Quantization (AWQ / GPTQ), Dynamic batching, KV-cache reuse, Streaming inference, llama.cpp, Ollama (local & edge inference)