Denny Stohr
Senior Machine Learning Engineer at Netflix. PhD, TU Darmstadt.
I build production-grade LLM agents and ML systems for high-stakes decision-making.
Current work at Netflix
LLM Diagnostic Agent. An agentic system that turns natural-language questions into production-grade analytics. Intent understanding, schema discovery, validated SQL generation, narrative reports with evidence and caveats. Powers root-cause analysis for metric regressions and canary failures. Reduced investigation time from days to minutes.
Support Signal Extraction. Schema-guided LLM pipelines that pull structured insights from messy customer transcripts—linking unstructured feedback to device context and QoE metrics to surface recurring issues invisible in telemetry.
Causal Regression Detection. Automated cohort-level regression detection and attribution using AIPW, identifying which device/network segments drive QoE changes.
Previous work (2018–2024). Six years shipping streaming systems at scale. Adaptive bitrate algorithms, mobile throughput prediction, live low-latency resilience. Tech-led HDR playback improvements delivering +0.42% view-hours overall (+9.87% on specific devices). Designed 30+ large-scale A/B tests across tens of millions of users. Built simulation frameworks, mentored engineers, partnered across client/CDN/encoding.
Interested in. LLM reliability at scale, validation loops that fail gracefully, latency-sensitive ML, tools that ship and stick.
PhD, TU Darmstadt (2014–2018)
Research on large-scale network emulation and adaptive streaming optimization at the Multimedia Communications Lab. Built Python-based toolchains for reproducible analysis of DASH player behavior across network conditions. Part of CRC 1053 MAKI (Multi-Mechanisms Adaptation for the Future Internet). Research visit at Simon Fraser University.
Best Paper Award: NetSys 2015
Selected publications: ACM Multimedia 2017, ACM SIGMM Workshop 2016
Dissertation