ResNet-50 Score Confirms Leading Inference Performance of Groq Processor

Performance Validations Point to Both Technical and Business Benefits of Groq Infrastructure

Linley Group Microprocessor Report "Groq Rocks Neural Networks" confirms Groq "Rolls Over Competitors"

News provided by

Jan 07, 2020, 14:21 ET

MOUNTAIN VIEW, California, Jan. 7, 2020 /PRNewswire/ -- Groq, the inventor of the Tensor Streaming Processor (TSP) architecture and a new class of compute, today announced that the Groq processor has achieved 21,700 inferences per second (IPS) for ResNet-50 v2 inference. Groq's level of inference performance exceeds that of other commercially available neural network architectures, with throughput that more than doubles the ResNet-50 score of the incumbent GPU-based architecture. ResNet-50 is an inference benchmark for image classification and is often used as a standard for measuring performance of machine learning accelerators.

In addition, earlier this week, the Linley Group released its latest Microprocessor Report titled "Groq Rocks Neural Networks," which concludes that Groq's "TSP stands out in both peak performance and ResNet-50 throughput," and that "Groq's [deep-learning] accelerator is the fastest available on the merchant market."

"These ResNet-50 results are a validation that Groq's unique architecture and approach to machine learning acceleration delivers substantially faster inference performance than our competitors," said Jonathan Ross, Groq's co-founder and CEO. "These real-world proof points, based on industry-standard benchmarks and not simulations or hardware emulation, confirm the measurable performance gains for machine learning and artificial intelligence applications made possible by Groq's technologies."

Significantly, the Groq platform doesn't require large batch sizes for optimal inference processing performance, with the TSP architecture achieving peak throughput even at batch size 1, when processing a single image at a time. The ability to quickly and efficiently process small batch sizes is especially important to minimize latency in real-time applications. While Groq's architecture is up to 2.5 times faster than GPU-based platforms at large batch sizes, its tensor streaming processor is up to 17 times faster at batch size 1.

With the Groq architecture providing more than a 2x performance advantage over GPU-based solutions, engineering managers can deploy processing platforms that offer twice the inference performance without doubling infrastructure costs. Reducing the number of deployed systems can also save datacenter space, lower power usage, and decrease system complexity overall.

About Groq:

Headquartered in Mountain View, CA, Groq delivers industry leading performance, accuracy and sub-millisecond latency with efficient, software-driven solutions for compute-intensive applications. Groq redefines compute by focusing on key technology innovations: software-defined compute, silicon innovation and developer velocity. For more information, visit: https://groq.com.

SOURCE Groq