A summary of Hidden Technical Debt in Machine Learning Systems by D. Sculley et al.

Nicholas M. Synovic

11-11-2022 - 8 minutes read - 1571 words

A summary of Hidden Technical Debt in Machine Learning Systems

D. Sculley Proceedings of NeurIPS, 2015 DOI

For the summary of the paper, go to the Summary section of this article.

A summary of Hidden Technical Debt in Machine Learning Systems

First Pass

Read the title, abstract, introduction, section and sub-section headings, and conclusion

Problem

What is the problem addressed in the paper?

Machine learning systems incur technical debt like traditional software systems. However, they also incur additional debt that traditional software systems do not.

Motivation

Why should we care about this paper?

Because this paper aims at establishing definitions for potential technical debt that can be incurred while developing machine learning systems.

Context

What other types of papers is the work related to?

This paper is related to those that discuss and define technical debt broadly and in domain specific applications.

Contributions

What are the author’s main contributions?

Their main contributions are definitions and language to be used when describing the different technical debt that can be incurred while developing machine learning systems.

These debts include:

Boundary Erosion
Entanglement
Hidden Feedback Loops
Undeclared Consumers
Data Dependencies
Configuration Issues
Changes in the External World
System-Level Anti-Patterns
Traditional Software System Technical Debt

Second Pass

A proper read through of the paper is required to answer this

Background Work

What has been done prior to this paper?

Work has been done to identify traditional software technical debt and software system anti-patterns.

Figures, Diagrams, Illustrations, and Graphs

Are the axes properly labeled? Are results shown with error bars, so that conclusions are statistically significant?

There is one figure in this paper. While the visualization works, the usage of a gray scale image makes it difficult to read the text.

Clarity

Is the paper well written?

The paper is well written. The structure of this paper is quite simplistic, which I appreciate as it makes it easier to digest.

Relevant Work

Mark relevant work for review

The following relevant work can be found in the Citations section of this article.

J. D. Morgenthaler, M. Gridnev, R. Sauciuc, and S. Bhansali. Searching for build debt: Experiences managing technical debt at google. In Proceedings of the Third International Workshop on Managing Technical Debt, 2012.
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica. Ad click prediction: a view from the trenches. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, 2013.
. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems, pages 817–824, 2008.
T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6-8, 2014., pages 571–582, 2014.
B. Dalessandro, D. Chen, T. Raeder, C. Perlich, M. Han Williams, and F. Provost. Scalable hands free transfer learning for online advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1573–1582. ACM, 2014.
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B. Su. Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6-8, 2014., pages 583–598, 2014.
D. Sculley, M. E. Otey, M. Pohl, B. Spitznagel, J. Hainsworth, and Y. Zhou. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, 2011

Methodology

What methodology did the author’s use to validate their contributions?

The author’s look to traditional software and machine learning system examples for anti-patterns and

Author Assumptions

What assumptions does the author(s) make? Are they justified assumptions?

As this paper proposes potential areas of technical debt for machine learning systems, nearly all of it is assuming that something will go wrong when working with a machine learning system.

Correctness

Do the assumptions seem valid?

Given the type of paper that this is, yes, the assumptions seem valid. Furthermore, the justifications for each type of technical debt are well explained and made clear in the literature.

Future Directions

My own proposed future directions for the work

As I’m now starting to study ML dependencies, this work is a great springboard to drill deeper into what both ML and software engineers consider technical debt and dependents.

Open Questions

What open questions do I have about the work?

Are there identifiable cases of ML models that have one or more of the technical debts described here?

Author Feedback

What feedback would I give to the authors?

Overall this was a solid paper. I’d appreciate more real world examples of models that have dealt with technical debt. Additionally, a survey of engineers about whether they think each type of technical debt is worthy of considerations would have been appreciated.

Summary

A summary of the paper

The paper Hidden Technical Debt in Machine Learning Systems by D. Sculley et al. [1] presents potential technical debt considerations that ML engineers need to consider when developing ML systems. These considerations stem both from experience as well as traditional software engineering technical debt. Furthermore, they pose several questions that engineers should ask when taking on technical debt.

These questions are:

How easily can an entirely new algorithmic approach be tested at full scale?
What is the transitive closure of all data dependencies?
How precisely can the impact of a new change to the system be measured?
Does improving one model or signal degrade others?
How quickly can new members of the team be brought up to speed?

The authors defined several areas where technical debt can accrue:

Model Boundary Erosion
- Entanglement: CACE principle (Changing Anything Changes Everything) which applies to both data and model training
- Correction Cascades: Transfer learning/ fine tuning a PTM creates a new model (B) which is now dependent upon the original model’s (A) weights and architecture
- Undeclared Consumers: Similar to visibility debt [2]; creates tight and hidden coupling to the model which if changed, could affect the wider system
Data Dependencies: Static analysis of data dependencies could resolve this [3]
- Unstable Data Dependencies: Some data signals change qualitatively or quantitatively over time. Therefore, improving input signals to the system could harm the output signals of the model
- Underutilized Data Dependencies: Similar to underutilized dependencies in traditional software engineering; data signals that the model is trained on but have little to no effect on the output signal. Removing these signals post-training, however, could greatly affect the quality of the model. Examples include:
  - Legacy Features
  - Bundled Features: Features that collectively improve performance that are bundled together
  - ǫ-Features: Adding features to marginally improve accuracy
  - Correlated Features: Models struggle to distinguish between two correlated features, one of which is causal (important) and the other non-causal (not important)
Feedback Loops
- Direct Feedback Loops: A model may directly influence the selection if its own future training data based on the decisions it makes
- Hidden Feedback Loops: Two systems may influence each other indirectly by affecting the sources of their data
ML-System Anti-Patterns
- Glue Code: Trying to stitch together two incompatible components of a system results in significant code overhead
- Pipeline Jungles: Data pre-processing
- Dead Experimental Codepaths
- Abstraction Debt: There is a lack of standardized abstractions for ML systems components
- Common Smells:
  - Plain-Old-Data Type Smell
  - Multiple-(Programming) Language Smell
  - Prototype Smell
Configuration Debt
- Taken verbatim from the paper:

• It should be easy to specify a configuration as a small change from a previous configuration.
• It should be hard to make manual errors, omissions, or oversights.
• It should be easy to see, visually, the difference in configuration between two models.
• It should be easy to automatically assert and verify basic facts about the configuration:
number of features used, transitive closure of data dependencies, etc.
• It should be possible to detect unused or redundant settings.
• Configurations should undergo a full code review and be checked into a repository

A Changing External World
- Fixed Threshold in Dynamic Systems: Manually selecting a decision threshold that a system has to abide by.
- Monitoring and Testing: “Comprehensive live monitoring of system behavior in real time combined with automated response is critical for long-term system reliability”
  - Prediction Bias
  - Action Limits
  - Up-Stream Producers
- Data Testing Debt
- Reproducibility Debt: It is difficult to reproduce results from ML research due to randomized algorithms, non-determinism inherent in parallel learning, initial conditions, and interactions with the external world
- Process Management Debt: How does one handle a system with many models?
- Cultural Debt: Difficulty re-using academic research in industry

Summarization Technique

This paper was summarized using a modified technique proposed by S. Keshav in his work How to Read a Paper [0].