M : τ

Saif

I work on ML systems — focusing mostly on training infrastructure, kernel optimization, and performance engineering.

My work centers on identifying where LLM pre-training throughput disappears: memory movement, graph breaks, kernel launch overhead, and communication bottlenecks. I believe that these infrastructural advances are what allows a niche research experiment to turn into something capable of transforming entire fields.

Beyond the engineering, I'm fascinated by the pre-training phase and the resulting base models. This interest leads me explore a bit of mechanistic interpretability and reinforcement learning, looking at how model circuits change and evolve to generate highly structured, predictable outputs from the messy raw outputs of a base model.


Research Interests

Training Performance and Systems

Interested in the systems layer of modern machine learning: model optimization, kernel efficiency, distributed training behavior, instrumentation and scaling dynamics.

Small Models and Efficient Experimentation

Exploring how far carefully designed systems and training methods can push smaller models under constrained compute budgets.


Projects

Training Systems Research

2025 – Present

Small-scale training systems built for throughput and kernel-level understanding. Triton kernels, RL post-training, consumer hardware.

Performance Engineering Notes

Ongoing

Technical notes on GPU architecture, memory systems, profiling, numerical stability and optimization.


Writings

Excerpt on RoPE I wrote while teaching undergrads about LLMs.

Jan 1, 1999 Hello, World

A first post to test MDX rendering with LaTeX support.

view all writing →

contact