Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

Arpan Gujarati*, Reza Karimi*, Safya Alzayat, Wei Hao,
Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace

Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable—on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100 ms latency targets for 99.997% of requests. We further demonstrate that Clockwork exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.

* Clockwork is open-source and the source code is accessible at:


@inproceedings {gujarati2020serving,author = {Gujarati, Arpan and Karimi, Reza and Alzayat, Safya and Hao, Wei and Kaufmann, Antoine and Vigfusson, Ymir and Mace, Jonathan},title = {{Serving DNNs like Clockwork: Performance Predictability from the Bottom Up}},booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},year = {2020},isbn = {978-1-939133-19-9},pages = {443--462},url = {},publisher = {{USENIX} Association},month = nov,}


  • R. Karimi, A. Simpson, A. Kaufmann, Y, Vigfusson, J. Mace, "Letting the Cloud Serve DNNs with Ruthless Efficiency" in in ACM Symposium on Operating Systems Principles (SOSP), Huntsville, Ontario, Canada, 2019. [Poster Presentation at AI Systems Workshop (SOSP'19)]


Arpan Gujarati

Safya Alzayat

Wei Hao


Max Planck Institute for Software Systems
National Science Foundation
Emory University