An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain#

Workshop Paper SCORED 2022 PTM Supply Chain

Authors#

Wenxin Jiang
Nicholas M. Synovic
Rohan Sethi
Aryan Indarapu
Matt Hyatt
Taylor R. Schorlemmer
George K. Thiruvathukal
James C. Davis

Abstract#

Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective. We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain.

Artifacts#

Todo

  • Add the paper preprint

  • Add the poster

  • Add link to the source code

Paper Preprint

Download

Published Paper

View

Poster

Download

Source Code

View

BibTex
@inproceedings{jiang_empirical_2022,
   address = {New York, NY, USA},
   series = {{SCORED}'22},
   title = {An {Empirical} {Study} of {Artifacts} and {Security} {Risks} in the {Pre}-trained {Model} {Supply} {Chain}},
   copyright = {All rights reserved},
   isbn = {978-1-4503-9885-5},
   url = {https://dl.acm.org/doi/10.1145/3560835.3564547},
   doi = {10.1145/3560835.3564547},
   abstract = {Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective. We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain.},
   urldate = {2023-09-06},
   booktitle = {Proceedings of the 2022 {ACM} {Workshop} on {Software} {Supply} {Chain} {Offensive} {Research} and {Ecosystem} {Defenses}},
   publisher = {Association for Computing Machinery},
   author = {Jiang, Wenxin and Synovic, Nicholas and Sethi, Rohan and Indarapu, Aryan and Hyatt, Matt and Schorlemmer, Taylor R. and Thiruvathukal, George K. and Davis, James C.},
   month = nov,
   year = {2022},
   keywords = {empirical software engineering, machine learning, software reuse, software supply chain, deep neural networks, model hubs},
   pages = {105--114},
   file = {Full Text PDF:/home/nicholas/Zotero/storage/TT6JV3IL/Jiang et al. - 2022 - An Empirical Study of Artifacts and Security Risks.pdf:application/pdf},
}