Deepchecks Announces Groundbreaking LLM Evaluation Solution for Advanced AI System Validation
TEL AVIV, Israel, Nov. 28, 2023 /PRNewswire/ -- Deepchecks, a leading company in the MLOps space that has been focusing on testing AI systems, is thrilled to announce the launch of its innovative LLM Evaluation solution. This significant new solution is designed to address the unique challenges posed by Large Language Models (LLMs) and is set to revolutionize the way AI systems are validated.
Deepchecks has been at the forefront of AI system validation since the launch of its open-source package in January 2022 for testing ML models. The company has garnered widespread recognition, amassing over 3,000 GitHub stars and more than 900,000 downloads. The enthusiastic response from the AI and machine learning community motivated Deepchecks to expand its offerings beyond tabular data testing to meet the diverse needs of its growing user base.
The LLM Evaluation solution comes as a response to the increasing demand for effective evaluation tools for LLM-based applications. Deepchecks recognized the unique challenges that LLMs present, including assessing both accuracy and model safety (addressing bias, toxicity, PII leakage) and the need for flexible testing approaches due to the possibility of multiple valid responses for a single input.
Key features of Deepchecks' LLM Evaluation solution include:
- Dual Focus: Evaluating both the quality of LLM responses in terms of accuracy, relevance, and usefulness, as well as ensuring model safety by addressing bias, toxicity, and adherence to privacy policies.
- Flexible Testing: Adapting to scenarios where LLMs can produce multiple valid responses for a single input, making it essential to provide flexible testing approaches, including the use of curated "golden sets."
- Diverse User Base: Recognizing that LLM-based applications require input and control from a variety of stakeholders, including data curators, product managers, and business analysts, in addition to data scientists and machine learning engineers.
- Phased Approach: Acknowledging the distinct phases involved in LLM-based app development, including Experimentation/Development, Staging/Beta Testing, and Production, which require tailored evaluation strategies.
"From what we've been seeing in the market, companies are managing to build 'quick-and-dirty' POCs extremely quickly based on APIs such as OpenAI combined with prompt engineering." said Philip Tannor, CEO at Deepchecks. "However, the next steps leading to a production ready application are taking a lot longer than initial expectations, largely due to difficulties with quality, consistency and adherence to policies. We believe that our LLM Evaluation solution can really move the needle in terms of delivering LLM-based applications quickly and safely"
Deepchecks recently announced a $14M funding in a seed round. The investment was led by Alpha Wave Ventures with participation from Hetz Ventures and Grove Ventures.
Deepchecks invites organizations and individuals to visit their website to explore the new LLM Evaluation solution and elevate their AI validation processes. For more information or to request a demonstration, please visit https://www.deepchecks.com.
About Deepchecks:
Deepchecks is a leading company in the MLOps space, that is most commonly known for Testing Machine Learning. Since its inception, the company has been dedicated to pushing the boundaries of AI system validation, catering to a diverse community of data scientists, machine learning engineers, and business professionals. Deepchecks offers a range of tools and solutions designed to ensure the quality, safety, and ethical use of AI systems.
Deepchecks is building a comprehensive solution for Continuous Validation of AI Systems that includes testing, monitoring, and auditing. They've released an open-source package for testing ML models that became one of the fastest-growing MLOps packages to date (900K downloads + 3K GitHub stars) and have recently expanded their offering to evaluate generative applications that are based on LLMs.
SOURCE Deepchecks
Share this article