Inspur Releases TensorFlow-Supported FPGA Compute Acceleration Engine TF2

News provided by

Inspur Electronic Information Industry Co., Ltd

Aug 24, 2018, 19:20 ET

FREMONT, Calif., Aug. 24, 2018 /PRNewswire/ -- On August 23, at KDD2018 London -- a premier global conference focused on artificial intelligence -- Inspur released the FPGA computing acceleration engine TF2 supporting TensorFlow, which helps AI customers quickly implement FPGAs based on mainstream AI training software and deep neural network model DNN on inference. It delivers high performance and low latency for AI applications through the world's first DNN shifting technology on FPGAs.

At present, using the FPGA technology to achieve customizable, low latency, high performance and high power-consumption ratio for AI inference application has become the technical route adopted by many AI companies. However, before FPGA technology enters into large-scale AI business deployment, there are still many challenges such as high software writing threshold, limited performance optimization, and difficult power control. The goal of Inspur's TF2 Compute Acceleration Engine is to solve these challenges for customers.

The TF2 computing acceleration engine consists of two parts. The first part is the model optimization conversion tool TF2 Transform Kit, which optimizes and transforms the deep neural network model data trained by the framework such as TensorFlow. It greatly reduces the size of the model data file, as it can compress 32-bit floating-point model data into a 4-bit integer data model, making the actual model data file size smaller than the original 1/8 and basically keeps the rule storage of the original model data. The second part is the FPGA intelligent running engine TF2 Runtime Engine. It can automatically convert the previously optimized model file into FPGA target running file. In order to eliminate the dependence of deep neural network such as CNN on FPGA floating-point computing power, Inspur designed the innovative shift computing technology, which can quantize 32-bit float-point into 8-bit integer data. Combined with the aforementioned 4-bit integer model data, the conversion convolution operation floating-point multiplication is calculated as an 8-bit integer shift operation, which greatly improves the FPGA for inference calculation performance and effectively reduces its actual operating power consumption. This is also the world's first case of implementing the shift operation of deep neural network DNN on FPGA under the premise of maintaining the accuracy of the original model.

The SqueezeNet model on the Inspur F10A FPGA card shows excellent computational performance for the TF2 computing acceleration engine. The F10A is the world's first half-height and half-length FPGA accelerator card to support the Arria 10 chip. SqueezeNet is a typical convolutional neural network architecture which is a streamlining model but its accuracy is comparable to AlexNet. It is especially suitable for image-based AI applications with high real-time requirements. Running the SqueezeNet model optimized by the TF2 engine on the F10A, the calculation time of a single picture is 0.674ms while maintaining the original accuracy. It is slightly better than the currently widely used GPU P4 accelerator card in terms of calculation accuracy and delay.

Device	Peak Power	Date Type	Top1	Top5	FPS (images/s)
F10A	45W	INT8	57.62%	79.98%	1484
P4	75W	FP32	58.14%	80.79%	1323
P4	75W	INT8	56.79%	79.76%	1456

TF2 w/ F10A VS GPU

The Inspur TF2 computing acceleration engine improves the AI calculation performance on the FPGA through the technical innovations such as shift calculation and model optimization, and lowers the AI software implementation threshold of the FPGA. It supports the FPGA to be widely used in the AI ecosystem to promote more AI applications. Inspur plans to open TF2 to its AI customers, and will continue to upgrade and develop optimization technologies that can support multiple models, the latest deep neural network model and FPGA accelerator cards using with the latest chip. It is expected that the performance of the next-generation high-performance FPGA accelerator card will be three times of F10A.

Inspur is the world's leading AI computing platform provider, offering a four-layer AI stack of computing hardware, management suite, framework optimization, and application acceleration to build an agile, efficient, and optimized AI infrastructure. Inspur has become the most important AI server supplier for Baidu, Ali and Tencent, and has maintained close collaboration in systems and applications with leading AI companies such as Iflytek, SenseTime, Fac++, Toutiao and Didi. Inspur strives to help AI customers achieve maximum application performance improvement in voice, image, video, search engine, and network. According to IDC's 2017 China AI Infrastructure Market Research Report, Inspur's AI server market share reached 57% in the last year.

SOURCE Inspur Electronic Information Industry Co., Ltd