PeaTMOSS: Mining Pre-Trained Models in Open-Source Software#
Preprint Manuscript arXiv 2023 Dataset
Abstract#
Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges.
To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: PurdueDualityLab/PeaTMOSS-Demos.
Artifacts#
Todo
Add the paper preprint
Add the poster
Add link to the source code
BibTex
@misc{jiang_peatmoss_2023,
title = {{PeaTMOSS}: {Mining} {Pre}-{Trained} {Models} in {Open}-{Source} {Software}},
copyright = {All rights reserved},
shorttitle = {{PeaTMOSS}},
url = {http://arxiv.org/abs/2310.03620},
doi = {10.48550/arXiv.2310.03620},
abstract = {Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos.},
urldate = {2024-01-29},
publisher = {arXiv},
author = {Jiang, Wenxin and Jones, Jason and Yasmin, Jerin and Synovic, Nicholas and Sashti, Rajeev and Chen, Sophie and Thiruvathukal, George K. and Tian, Yuan and Davis, James C.},
month = oct,
year = {2023},
note = {arXiv:2310.03620 [cs]},
keywords = {Computer Science - Software Engineering, Computer Science - Artificial Intelligence}
}