
Lead Principal Engineer
Steven Yoo
I build AI systems that make hard tradeoffs at scale — speed, cost, quality — and ship them in production. From ads ranking at Twitter and Meta to enterprise agents at Atlassian.
About
I build AI systems that make hard tradeoffs at scale: speed vs. quality, cost vs. accuracy, relevance vs. diversity. I've spent my career turning those tradeoffs into shipped products — first in ads ranking, now in enterprise AI agents.
Most ML engineers can train models. Fewer can ship production systems that integrate tools, survive messy enterprise data, hit SLAs, and justify enterprise pricing. That gap is where I operate.
I've spent 13+ years building systems that connect ML directly to revenue: at Microsoft Bing (knowledge graph, semantic search), Twitter (ads ranking — highest revenue impact on the platform), Meta (ads personalization, vision + recommendation), and now Atlassian, where I lead Rovo Chat & Agents — one of the few enterprise AI agents that ships in production at scale.
I live my dream. I wanted to become a software engineer since high school. The convergence of ML and software is the most exciting thing happening in our field, and I get to build at the frontier of it every day.
Education
Stanford MS, AI & Databases
Cambridge Cultural Scholar, 2010–2012
Location
Bellevue, WA
Experience
- –Joined as Senior Principal Engineer; promoted to Lead Principal Engineer in 2026
- –Shipped Rovo Chat from 0→1, scaling from 1K to 40K MAU; one of the few enterprise AI agents that ships in production
- –Designed and shipped multi-agent framework, MCP integration, and web search (Rovo Chat 2.0)
- –Tech-lead across 30+ engineers; partnered with directors and EMs on roadmap and careers
- –Founded AI School — multi-year org-wide program upskilling all engineers as Gen AI reshaped the landscape
- –Defined the ML engineer role at Atlassian: leveling framework, calibration guides, hiring committee
- –IC (Dec 2021 – Aug 2023): Applied Research team — built retrieval and ranking models for vision and recommendation (XRayVideo, ViSE, JSTM, RecoRay, DACU)
- –EM (Jun 2020 – Nov 2021): Led Ads Personalization ML ranking team; grew team and drove revenue impact through ranking improvements
- –EM (2018–2020): Led ads ranking core modeling team from 6 to 12 people; 8 promotions including 2 staff; drove Twitter's highest revenue impact through ranking
- –Shipped online-learned wide-and-deep network models in TensorFlow; co-authored RecSys 2019 paper on delayed feedback in online-trained NN models
- –Introduced SplitNet architecture and metric learning for early-stage rankers; auto-tune ML model for relevance-system resource trade-offs
- –IC (2016–2018): Shipped Twitter's first deep learning models for ads ranking using Lua-Torch; rewrote candidate generation stage into a unified ML problem
- –Tech lead for semantic query parser (2015–2016): platformized answer triggering system for high-visibility answers; query understanding for structured and unstructured QnA
- –Built semi-supervised fact extractors, entity type classifiers, and attribute rankers for Bing's knowledge graph
- –Dominant image classifier for generic web documents; entity summary labels for 2M+ top entities
Writing
Engineering articles and research papers.
Enhancing Rovo Chat with Hybrid LLM Approach
How Atlassian mixes smaller and larger models to optimize quality, latency, and cost in a production AI assistant.
How Rovo Chat Embraces Multi-Agent Orchestration
How Rovo evolved from a single-agent system to a hierarchical multi-agent framework that handles complex enterprise queries.
How Rovo Deep Research Works
Inside the retrieval-augmented pipeline that decomposes complex queries into multiple research paths and synthesizes cited reports.
Deep Research v2: Inside Atlassian's Next-Gen AI Research Engine
The upgraded Deep Research engine with adaptive orchestration and iterative workflows combining enterprise knowledge with web data.
SplitNet Architecture for Ad Candidate Ranking
A two-tower architecture that splits ad candidate ranking into lightweight early-stage and expressive late-stage models, reducing compute while improving relevance.
A New Approach: Metric Learning for SplitNet
Applying metric learning to improve early-stage ad rankers — learning a shared embedding space to better capture ad-user affinity at low latency.
Using Machine Learning to Predict the Value of Ad Requests
An auto-tune ML model that automatically balances relevance and system resource trade-offs in Twitter's ads server.
Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR Prediction
RecSys 2019 paper tackling the challenge of training CTR models continuously when positive labels arrive with unpredictable delays in advertising systems.