New SEI Tool Enhances Machine Learning Model Test and Evaluation

News provided by

Carnegie Mellon Software Engineering Institute

Oct 17, 2024, 10:05 ET

PITTSBURGH, Oct. 17, 2024 /PRNewswire/ -- Software systems with a machine learning (ML) component often fail in production. One reason is that ML models are frequently developed in isolation, making it impossible to test and evaluate against system and operational requirements and constraints. The Software Engineering Institute (SEI) at Carnegie Mellon University (CMU) today announced its release of a new tool to help teams developing ML-enabled software systems mitigate this problem. Machine Learning Test and Evaluation (MLTE), available for download from GitHub, is a semi-automated process and infrastructure for testing ML models based on stakeholder-generated quality attribute requirements.

ML model developers often work in silos. They lack knowledge of the overarching system or its operational environment. Without this context, developers can only evaluate a model on its accuracy, or the predictability of its output. Once the model is delivered, software engineers and quality assurance teams often have no specifications or knowledge to guide its testing. None of the groups can evaluate how well the model will work in production.

"The bottom line is that many models fail in production because they are not tested properly," said Grace Lewis, a principal researcher at the SEI and lead of its Tactical and AI-Enabled Systems Initiative. "When ML-enabled systems fail operational tests because of problems with the model, it creates huge delays in system delivery, especially if new data needs to be collected to retrain the model."

To fill this gap in the development of ML-enabled software, Lewis and her team at the SEI collaborated with the U.S. Army Artificial Intelligence Integration Center (AI2C) and Christian Kästner, an associate professor in the CMU School of Computer Science.

They created MLTE, which applies best practices from traditional software development to ML model test and evaluation (T&E). The process brings together all the stakeholders of an ML-enabled software project, not just the ML developers, to negotiate the model's quality attribute requirements based on system needs. Those attributes become specifications for automated internal and system-dependent testing. Test results populate reports that developers and other stakeholders can use to decide if the model is ready for production. If it is not, the reports can inform further iteration and testing. Special libraries within the MLTE infrastructure automate parts of the process.

"MLTE provides system and operational context for ML model developers to make informed decisions about design and development," said Lewis. "Other stakeholders can better understand whether the requirements for models are realistic so that problems can be detected and fixed early in the process, not discovered in operational tests or production."

MLTE is a system-centric, quality-attribute-driven, semi-automated process and infrastructure to enable negotiation, specification, and testing of ML model and system qualities. It incorporates TEC, an earlier SEI tool that detects mismatched expectations among the teams building an ML component. Both TEC and MLTE are part of an SEI effort to establish integrated T&E of ML capabilities throughout the Department of Defense.

To download MLTE, visit the project's GitHub site. Read more about the tool's background in the papers Using Quality Attribute Scenarios for ML Model Test Case Generation and MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities.

About the Carnegie Mellon University Software Engineering Institute
Always focused on the future, the Software Engineering Institute (SEI) advances software as a strategic advantage for national security. We lead research and direct transition of software engineering, cybersecurity, and artificial intelligence technologies at the intersection of academia, industry, and government. We serve the nation as a federally funded research and development center (FFRDC) sponsored by the U.S. Department of Defense (DoD) and are based at Carnegie Mellon University, a global research university annually rated among the best for its programs in computer science and engineering. For more information, visit the SEI website at https://www.sei.cmu.edu.

SOURCE Carnegie Mellon Software Engineering Institute