Skeletor: workload characterization frameowrk
Pranav Bhandari, Avani Wildani
Outside of specialized storage systems where the provider controls data composition and usage, the standard strategy for designing a storage system is “maximum provisioning,” or handling the worst access pattern expected. This strategy is wasteful: datacenter power waste attributed to peak provisioning and poor workload prediction exceeded $ 3.8 billion in 2012 in the United States alone . Modern workloads are multi-dimensional and heavily customized; hence, existing classification labels like “archival”, “user” or “HPC” are insufficient, and at times, even misleading. Without having a translatable form of classification, we cannot infer from various research on computer system designs or communicate workloads effectively.
The goal of the project is to replace today’s qualitative workload labels with parameterized model workloads to serve as exemplars for the space of workload characteristics. These models will be derived using machine learning techniques to ascertain what metrics are most relevant across and within different usage scenarios, such as customer deployments or high performance scientific experiments, and validated by domain experts, improving research sharing and reproducibility in systems.
Developing an online tool (https://www.metricext.com) to serve as an online repository of traces and and generate relevant metrics from them. • System deployed on AWS EC2 using AWS Lambda, DynamoDB, Node.js. • Implemented support for compressed IBM GPFS traces.
Developing a CNN to characterize and classifying block I/O workload using access patterns.
- NRDC. America’s data centers are wasting huge amounts of energy. 2014. http://www.nrdc.org/energy/files/data-center-efficiency-assessment-IB.pdf.
- Eric R Masanet, Richard E Brown, Arman Shehabi, Jonathan G Koomey, and Bruce Nordman. Estimating the energy use and efficiency potential of us data centers. Proceedings of the IEEE, 99(8):1440–1453, 2011.