Exploratory Devices and Circuits for Compute-in-Memory
Special Issue of IEEE Journal of Exploratory Solid-State Computational Devices and Circuits, June 2020
The IEEE Journal of Exploratory Solid-State Computational Devices and Circuits (JXCDC) is an open-access journal which publishes multi-disciplinary research in solid-state circuits using exploratory materials and devices for novel energy efficient computation beyond standard CMOS.
The June special issue is on Compute-in-Memory (CIM), with 10 papers selected by special editor Prof. Shimeng Yu of Georgia Tech.
Deep learning in neural networks has become one of the major applications of recent computing, in both cloud and edge systems. These networks require large matrix multiplications, with arrays of multiply-and-accumulate (MAC) circuits. These can be carried out conventionally in the digital mode, using CPUs or more efficiently with GPUs. A developing alternative carries out the matrix multiplication in the analog domain, in memory arrays. The input can be sent in on rows, and the output read out on columns, where the matrix multiplication follows naturally from the conversion of voltage to current. Of course, this requires the use of digital-to-analog converters (DACs) on the inputs and analog-to-digital converters (ADCs) on the output.
A number of alternative device technologies are being evaluated for CIM, mostly non-volatile memory arrays. These include memristors and resistive RAM (RRAM), spintronic memories, and phase-change memories. The articles in the special issue evaluate a number of these alternatives, in terms of power, density, scalability, and integration with CMOS digital processing.
Nascent Exascale Computers Offer Promise, Present Challenges
Exascale computers with massive parallelism expected to come online in several countries in the next two years
The Proceedings of the National Academy of Sciences recently published an overview by Adam Mann of the progress and prospects in new Exascale Computers becoming available in 2021 and 2022. These machines are designed to perform in excess of 1018 operations per second, and are being developed in major projects in the US, China, Japan, and the European Union.
Each of these systems is a highly parallel machine with about 135,000 GPUs and 50,000 CPUs, all working together. There is typically a tradeoff between programmability of CPUs and energy efficiency of GPUs. Special-purpose accelerator chips may also be used. But programming these machines and optimizing these tradeoffs will be very difficult.
These exascale machines are expected to offer improved solutions for such as problems as climate simulation and long-term weather prediction, protein folding and drug development, atomic plasma dynamics, brain simulation, and a variety of problems in AI.
Further information on the Exascale Computing Project in the United States is available at the Exascale Computing Project (ECP) website.
The Edge-to-Cloud Continuum
Virtual Roundtable with Experts from Industry and Academia
In the November issue of IEEE Computer, Dejan Milojicic of Hewlett Packard Labs interviewed several experts on computer system architectures (visit IEEE Xplore for the interview), on the subject of the future of edge computing, cloud computing, and how they will work together.
The panelists included Tom Bradicich of HP, Adam Drobot of OpenTechWorks, and Ada Gavrilovska of Georgia Tech.
Cloud computing refers to computing in large-scale data centers, while edge computing takes place at least partly in cell phones, laptops, desktops, and the Internet-of-Things.
While Cloud computing is often more computationally efficient, issues of latency and bandwidth generally require some data processing at the edge. In many cases, these are mobile devices, so that wireless protocols of 5G and beyond are essential. There are also important issues of security, privacy, and reliability at both levels and in the communication between the two. In most cases, there will be a variety of tradeoffs, depending on the type of application and on business considerations.
These issues are likely to continue to generate a dynamic computing environment for the foreseeable future.
IBM’s Quantum Race to One Million Qubits
The critical test for any integrated circuit technology is its ability to scale to increased integration level. With this in mind, IBM has announced a Technology Roadmap for its superconducting quantum bits (qubits) for the next decade. For further information, read more at HPCwire.
IBM is projecting to double the number of qubits per chip every year for the next decade. The names of their chips are based on birds, starting out with Hummingbird (65 qubit) this month, and expanding to Condor (>1000 qubits) in 2023. At this rate, they are expecting to approach one million qubits by 2030.
These chips are designed to operate in a special refrigerator known as a helium dilution refrigerator at temperatures of less than 0.1 K. Commercially available refrigerators do not have the capacity to cool such large systems of qubits. IBM has been exploring the design of a larger dilution refrigerator that could cool one million qubits, codenamed “Goldeneye”, so that it will be available for testing and packaging the future systems, when needed. Furthermore, they are projecting massively parallel quantum computing systems comprising multiple systems of this scale, linked by true quantum interconnects, for the 2030s.
This assumes, of course, that the performance of these systems will continue to improve exponentially with the scale as expected. This requires increasing the Quantum Volume benchmark and incorporating quantum error correction technology. IBM is confident that they can achieve this.
IRDS™ Executive Summary: Industry Highlights
The International Roadmap for Devices and Systems™ recently released its 2020 edition. This includes 10 topical reports from the International Focus Teams, two White Papers (on “More Than Moore” and “Packaging Integration”), and an Executive Summary.
Any or all of these documents can be downloaded from the IRDS™ Portal. One must join the IEEE IRDS™ Technical Community, but there is no charge to do so.
The Executive Summary is an extended overview of the entire roadmap, totaling 60 pages. This also includes a shorter introduction, Section 1.1 on Industry Highlights and Key Messages. This is a new Roadmap section that provides a list of 23 Key Messages about the present and future of the worldwide semiconductor industry, illustrated with a similar number of graphs and figures. Among the key messages are the following:
1) Despite all the predictions that Moore’s Law is ending, the number of transistors per die continues to grow exponentially, due in part to expansion into the 3rd dimension. This trend is likely to continue for at least another 10 years.
2) Artificial intelligence and machine learning (AI/ML) applications have become the key drivers for chip and system advanced development, and these are moving from data centers out to edge devices.
3) The semiconductor industry continues to grow strongly, with increases in both edge devices (smart phones and sensors) and data centers.
4) New logic and memory devices with improved performance for future systems are being developed.
5) Quantum computing remains in the research stage, with product manufacturing at least 10 years away.
Non-Silicon, Non-von-Neumann Computing – Part II
Access the overview by Sankar Basu, Randal Bryant, Giovanni de Micheli, Thomas Theis, and Lloyd Whitman in Proceedings of the IEEE, August 2020
The editors of the special issue are from the US National Science Foundation, Carnegie Mellon University, Swiss Federal Institute of Technology Lausanne (EPFL), and IBM.
This special issue is a continuation of new research articles on novel computer architectures and devices that was first introduced in a January 2019 special issue.
This is a very broad field, as reflected in the selection of articles included.
This includes articles on error correction in systems of unreliable devices, field programmable analog arrays, spintronic memories, spin-based stochastic logic, deep learning with photonic devices, and quantum computing in noisy systems.
While most of these systems are still in the research stage, and indeed, some may never prove to be practical, they illustrate some of the wide range of technologies that may be applied to non-silicon, non-von-Neumann processors in the next several decades.
IRDS™ Roadmap Chapter on Cryogenic Electronics and Quantum Information Processing (CEQIP)
The 2020 IRDS™ Roadmap includes a chapter on CEQIP chaired by Dr. Scott Holmes of Booz-Allen, IARPA, and the IEEE Council on Superconductivity.
This chapter describes several developing technologies, which do not yet have many mature products.
These include superconducting electronics, cryogenic semiconductor electronics, and quantum computing.
Superconducting electronic systems typically consist of medium-scale integrated circuits based on niobium Josephson junctions, which operate at cryogenic temperatures of around 4 K. Applications are developing in digital signal processing at radio frequencies, and ultra-low-power computers.
Cryogenic semiconductor electronics may be designed to operate below 100 K, or even less than 1 K. These are typically interface circuits for cryogenic sensor arrays and superconducting electronic systems.
Quantum computing systems are in the research stage, with many alternative technologies being explored for making arrays of quantum bits or “qubits”. The leading technologies at present are superconducting circuits and trapped ions, but others are surveyed as well.
Access the CEQIP chapter at the IRDS™ website.
This is available online without charge, however, users must first subscribe to the IRDS™ Technical Community.
Other IRDS™ Chapters are available at the IRDS™ website.
A video overview by Dr. Holmes about the CEQIP chapter last year is also available at IEEE.tv.
Prof. Chenming Hu and the FinFET
How 2020 IEEE Medal of Honor Recipient Helped Save Moore’s Law
Read the article in IEEE Spectrum, May 2020.
The workhorse device of computer chips has long been the silicon field-effect transistor or FET. Prof. Chenming Hu of the University of California at Berkeley noticed in the 1990s that traditional planar FETs would fail to scale properly when dimensions went to 25 nm and below. With funding from DARPA, he proposed a 3D structure known as the FinFET. In the past decade, FinFETs have become standard for computer chips on scales down to the several nm level.
Although Moore’s Law is again predicted to end soon, Prof. Hu argues that looking ahead, there are likely to be additional approaches to continue improvements in circuit density, power, and speed.
A video about Prof. Hu, FinFETs, and the IEEE Medal of Honor is also available at IEEE.tv.
An overview of all the IEEE Honorees in 2020 is available at the IEEE VIC Summit website.
A Density Metric for Semiconductor Technology
Access the article by H.S. Philip Wong, et al. in Proceedings of the IEEE, April 2020.
Researchers from Stanford, UC Berkeley, MIT, and Taiwan Semiconductor propose that a new metric is needed to track the scaling of transistors, beyond the traditional single metric of gate length. This should focus on functional parameters of circuit density, but which circuits? The authors propose a metric consisting of 3 parameters: logic density DL, memory density DM, and interconnect density DC. These densities can be measured in devices per square millimeter on a chip, so that they can properly characterize the newer 3D integrated circuits that can include multiple layers of logic and memory, sometimes on the same chip. The interconnects link the processor to the main memory, and represent a bottleneck for system performance, so that DC needs to increase as well. For example, one might have a system with [DL, DM, DC] = [40M, 400M, 10K].
Expressed in this way, semiconductor roadmaps can continue to project the future development of high-performance circuits into at least the next decade.
A brief overview of this article is provided by IEEE Spectrum.
A Retrospective and Prospective View of Approximate Computing
Access the article by W. Liu, F. Lombardi, and M. Schulte in Proceedings of the IEEE, March 2020.
Historically, computing has been designed to be as accurate and precise as possible. However, many applications do not require high precision, and excess precision has a major cost in terms of power, speed, and area on chip. This has become particularly important in applications such as AI in edge systems, where minimizing power and excess hardware are critical.
The authors survey the field of approximate computing, broadly defined as the variety of techniques in both software and hardware that can reduce precision to an acceptable level, without significantly reducing performance. Looking to the future, they indicate that capabilities for approximate computing can be integrated with tools for circuit and system design, test and verification, reliability, and security.
A future special issue of Proceedings of the IEEE with contributions on Approximate Computing is in preparation for later in 2020.
Accelerators for AI and HPC
Dr. Dejan Milojicic of Hewlett Packard Labs recently led a Virtual Roundtable Discussion on the present and future of accelerator chips for artificial intelligence (AI) and high-performance computing (HPC), which appeared in the February 2020 issue of Computer. The other participants were Paolo Faraboschi, Satoshi Matsuoki, and Avi Mendelson.
The central problem is how to deal with increasing complexity of heterogeneous hardware (CPUs, GPUs, FPGAs, ASICs, and multiple levels of memory) together with software that can efficiently use all of these resources to solve difficult computational problems. This is in addition to possible integration with new types of processors such as neuromorphic and quantum, which may become available in the next decade. All the participants agreed that continued improvements in performance will continue for the foreseeable future, both in small-scale (mobile) and large-scale (data center) computing, with continuing challenges along the way.
Benchmarking Delay and Energy of Neural Inference Circuits
By Dmitri Nikonov and Ian Young, Intel
In recent years, a wide variety of device technologies have been developed to implement neural network algorithms, for artificial intelligence and machine learning (AI/ML). These have included both digital and analog CMOS circuits, but also different beyond-CMOS devices, such as a range of non-volatile memory arrays. In determining which of these approaches may be preferred for low-power applications, it is important to develop benchmarks that permit quantitative comparison.
The authors first evaluate neural switching on the device level, and compute the switching energy and delay for each technology, on the same series of plots. The results differ by orders of magnitude between different technologies, and even for different devices in similar technologies. They then perform similar computations for total energy and time delay for various prototype neural network chips to perform the same inference algorithm. Again, the results vary by large factors. Analog neural networks are found to be somewhat faster and lower power than digital circuits, for the same degree of precision. While this technology is still developing, this sort of analysis may be useful in evaluating the most promising approaches.
Grand Challenge: Applying Artificial Intelligence and Machine Learning to Cybersecurity
Access the article by Kirk Bresniker, Ada Gavrilovska, James Holt, Dejan Milojicic, and Trung Tran in IEEE Xplore.
Providing future cybersecurity will require integration of AI/ML throughout the network worldwide. Initiating a series of Grand Challenges on this topic will help the community achieve this goal.
The December issue of Computer has a set of open-access feature articles on Technology Predictions. One of these is by Bresniker et al., on how AI/ML can help to address the pervasive and growing problem of cyberattacks. This follows a set of earlier workshops and a 2018 report (PDF, 1 MB) on a similar topic by some of the same authors.
The authors argue that given the massive scale of the problem, that it is continuously changing, and that rapid responses are needed, this can only be handled by a system of ubiquitous AI agents capable of machine learning. However, these autonomous AI agents must quickly incorporate the insights of the best human cyber analysts, many of whom work privately on non-public data sets. The authors propose that an annual Grand Challenge, with prizes as motivation, can help to bring about the necessary collaborations and competition to achieve this goal. Given the critical nature of the problem to business and government, this should be initiated as soon as possible.