This article was first published on the Wipro blog.
Today billions of edge devices are being deployed in various applications, such as consumer, industrial, IoT, automotive, medical, drones and surveillance; thanks to ever-growing speeds, shrinking geometries, and ultra-low power semiconductor technologies and System-on-Chip (SoC) devices.
Various sensors including imaging, motion and environmental are attached to these edge devices. These sensors at edge devices generate enormous amounts of data including images, video, speech and other non-imaging data, which need to be transmitted back to cloud. Even if there is abundant and reliable transmission channel bandwidth, the round trip delays in transmitting data to cloud and getting back commands to be executed at the edge device, is prohibitive in most of the real-time, latency-sensitive applications. Further, security and privacy are the biggest concerns in transferring user data from edge devices to the cloud. Hence, there is a huge demand in enabling intelligent decisions in next generation edge devices, either fully autonomous way or semi-autonomous way.
The autonomous level of the edge devices depends on applications’ criticality, latency sensitivity, security, privacy and available transmission bandwidth. For some applications, such as industrial, semi-autonomous edge devices may serve the purpose of several use cases by reducing volume of data to be transferred to cloud and by minimizing the round trip delays with distributed decision-making. For other mission-critical applications such as autonomous car, fully autonomous edge devices are necessary, with fusion of intelligence derived from various classes of sensors such as imaging, RADAR and LIDAR etc.
The entire spectrum of expected Machine Learning (ML) inference in edge devices can be categorized three fold- deriving intelligence out of imaging data, non-imaging data and their fusion.
Traditionally, ML and to be precise, Deep Learning (DL), is associated with the necessity of a huge number of powerful CPUs, GPUs, Cloud infrastructure and massive software packages. These DL frameworks need heavy compute requirement and longer inference run times up to several minutes. These compute requirements and run times make it infeasible to apply DL inference for real-time applications in edge devices. Nevertheless, the need of the hour is to bring down DL inference from cloud to next generation edge devices. The off-line machine learning process can still use compute farms or cloud, but the real-time inference shall run on edge-devices, which shall be very tiny, extremely low power consuming with minimal compute and storage needs. Further, additional learning by the machine is also possible during inference phase. Any such additional learning models shall be periodically downloaded into edge devices inference engine.
In the recent past, Artificial Neural Networks (ANN) based DL started to gain momentum and has shown promising results. However, there is no single magical ANN based DL solution for all the applications. There are several types of neural networks for classification, prediction, clustering and association, out of which convolutional neural networks and recurrent neural networks are mostly deployed. The accuracy of ANN models, coupled with authenticity and volume of training datasets available, dictate the outcome of acceptance and rejection ratios, which are the key parameters for success of DL in general and edge devices in particular. Another challenge with DL today is existence of several frameworks and models. The convergence of some of these are highly desired for near-standard way of deploying DL in edge devices, from ease of usability and interoperability.
The expectations from edge devices are ever growing – in terms of integrating more and more functionalities, and sensors, but with not much increase in either silicon area, power consumption or cost. In this context, bringing DL inference to edge devices should also adhere to these expectations. Any DL solution in the edge device, which needs power hungry and expensive discrete components including CPUs, GPUs and FPGAs (Field Programmable Gate Arrays), is detrimental to the said expectations. Even though such solutions can be a good proof-of-concept model, the power consumption and cost is prohibitive for many real-life applications.
The highly integrated and configurable SoC solutions with embedded DL hardware accelerators are necessary for DL inference needs of edge devices. It is also not practical to develop a single SoC edge device solution, which can cater to a broad range of applications. Different applications need different levels of DL inference complexity. At the same time, developing different DL based SoC edge device solutions for different applications is not practical either, unless the volumes are very high, from the ROI point of view. The more practical way is to develop DL based SoC edge device solutions, which can cater to a range of related applications. This would be a reasonable trade-off between power, performance and cost.
Check back soon for parts 2 &3. Part 2 will detail applications of Deep Learning in edge devices and Part 3 will elaborate architectural details of edge devices with Deep Learning.
Dr. Vijay Kumar K – Chief Architect and Distinguished Member of Technical Staff – VLSI Technology Practice Group of Product Engineering Services, Wipro
Vijay has been with Wipro for over 18 years and has been in VLSI industry for more than 23 years. He has done architecture and design of several complex cutting-edge SoCs, ASICs, FPGAs and systems, in various application areas, for top-notch semiconductor companies globally. He is also specialized in video domain and created several solutions around video compression, post-processing etc.
He has been granted 13 US Patents so far. He is currently working on next generation architectures of semiconductor devices including edge devices with machine learning.