Saif
I work on ML systems — focusing mostly on training infrastructure, kernel optimization, and performance engineering.
My work centers on identifying where LLM pre-training throughput disappears: memory movement, graph breaks, kernel launch overhead, and communication bottlenecks. I believe that these infrastructural advances are what allows a niche research experiment to turn into something capable of transforming entire fields.
Beyond the engineering, I'm fascinated by the pre-training phase and the resulting base models. This interest leads me explore a bit of mechanistic interpretability and reinforcement learning, looking at how model circuits change and evolve to generate highly structured, predictable outputs from the messy raw outputs of a base model.
Research Interests
Training Performance and Systems
Interested in the systems layer of modern machine learning: model optimization, kernel efficiency, distributed training behavior, instrumentation and scaling dynamics.
Small Models and Efficient Experimentation
Exploring how far carefully designed systems and training methods can push smaller models under constrained compute budgets.
Projects
Training Systems Research
2025 – PresentSmall-scale training systems built for throughput and kernel-level understanding. Triton kernels, RL post-training, consumer hardware.
Performance Engineering Notes
OngoingTechnical notes on GPU architecture, memory systems, profiling, numerical stability and optimization.
Writings
Excerpt on RoPE I wrote while teaching undergrads about LLMs.
A first post to test MDX rendering with LaTeX support.