🔭 I’m currently working on
Low-latency LLM infrastructure at Amazon Web Services, building systems with SigV4 + LDAP authentication for production clients. I also optimize distributed observability pipelines across 150+ AWS services using OAuth-backed sync platforms, ECS Fargate, and Docker.
👯 I’m looking to collaborate on
Projects involving HPC, scalable machine learning systems, or efficient inference backends—especially ones pushing token throughput, optimizing distributed performance, or innovating in edge deployment.
🤝 I’m looking for help with
Advanced GPU kernel-level tuning and low-latency optimization techniques for multi-modal LLMs. Also interested in learning more about quant trading infra or novel compression algorithms for AI/ML inference.
🌱 I’m currently learning
Deep dive into distributed training frameworks, especially gradient checkpointing and tensor parallelism for multimodal models. Also brushing up on real-time streaming data systems and reinforcement learning for ops optimization.
💬 Ask me about
Achieving 120 tokens/sec inference on Llama-3 using vLLM
Cutting runtime by 45 seconds at NIST with multiprocessing and Numba
Engineering sub-5 ms auth for AWS clients with Coral + Guice
Developing GraphQL backends and Kafka pipelines at Clinia
⚡ Fun fact
I once boosted image processing throughput by 720% using a 72-core EC2 HPC setup—on my own—and I still have the benchmark logs to prove it.
-
University of California, Berkeley
- Seattle, WA
-
14:24
(UTC -07:00) - in/rishabhsinha17
Pinned Loading
-
low-latency-llm-inference-server
low-latency-llm-inference-server PublicProduction-grade stack delivering 120 tokens / s from Llama-3-8B with 40 % lower p99 latency under 32-request concurrency.
C#
-
rl-hyperparam-tuner
rl-hyperparam-tuner PublicEnd‑to‑end prototype that trains a ResNet‑18 on CIFAR‑10 while a PPO agent dynamically adjusts learning rate. Metrics are logged to PostgreSQL via PySpark and visualized with a Grafana dashboard. A…
Python 1
-
slurm-vision-rag-platform
slurm-vision-rag-platform PublicEnd‑to‑end reference implementation for a vision RAG pipeline fine‑tuned on LLaVA‑1.5‑7B and served via FastAPI.
Python
-
Real-Time-Twitter-Stock-Sentiment-Transformer-Model
Real-Time-Twitter-Stock-Sentiment-Transformer-Model PublicThe Real-Time Twitter Stock Sentiment Analysis used Python, Transformers, and Twitter API to analyze stock sentiments from real-time tweets. It involved data acquisition, preprocessing, Transformer…
Python 1
-
Real-time-Sign-Language-Recognition-Using-OpenCV-and-Deep-Learning
Real-time-Sign-Language-Recognition-Using-OpenCV-and-Deep-Learning PublicEmployed OpenCV for video processing and hand-detection in real-time. Utilized Keras with TensorFlow backend to train a deep learning model for sign language classification on a dataset of 2900 300…
Python 1
-
Lorenz-System-Attractor-Singular-Value-Decomposition-Complex-Systems-Research-Using-Python
Lorenz-System-Attractor-Singular-Value-Decomposition-Complex-Systems-Research-Using-Python PublicThis Python project computes the singular value decomposition of the trajectory matrix of a lorenz system attractor. The python program creates a three dimensional plot of the trajectory matrix of …
Python
If the problem persists, check the GitHub status page or contact support.
