LPIRC 2019 Workshop Invited Speeches
9:40AM - 10:05AM
Award-winning Methods for Interactive Object Detection and Image Classification Challenges at LPIRC-II 2018
Tao Sheng (firstname.lastname@example.org)
The LPIRC is an annual competition for the best technologies in image classification and object detection measured by both efficiency and accuracy. As the winners were announced at LPIRC-II 2018, our Amazon EdgeFlow team has won two awards: 1st prize for interactive object detection challenge and 2nd prize for interactive image classification challenge. We would like to share our award-winning methods in this paper, which can be summarized as four major steps. First, 8-bit quantization friendly model is one of the key winning points to achieve the short execution time while maintaining the high accuracy on edge devices. Second, network architecture optimization is another winning keypoint. We optimized the network architecture to meet the 100ms latency requirement on Pixel2 phone. The third one is dataset filtering. We removed the images with small objects from the training dataset after deeply analyzing the training curves, which significantly improved the overall accuracy. And the forth one is non-maximum suppression optimization. By combining all the above steps together with the other training techniques, for example, cosine learning function and transfer learning, our final solutions were able to win the two prizes out of 94 submitted solutions.
Dr. Tao Sheng is a Senior Deep Learning Engineer at Amazon. He has been working on multiple cutting-edge projects in the research areas of computer vision, machine learning in more than 10 years. Prior to Amazon, he worked with Qualcomm and Intel. He has strong interests in deep learning for mobile vision, edge AI, etc. He has published ten US and International patents and eight papers. Most recently, he led the team to win the First Prize of IEEE International Low-Power Image Recognition Challenge (LPIRC-I) at CVPR 2018 and win top two Prizes at LPIRC-II 2018 among a wide variety of global competitors.
The key to winning the competition lpirc-II was the ability to quantize neural networks without loss of accuracy. The available options in the TF environment decrease accuracy for mobile network architectures, while requiring a long time to fine-tune the neural network, which significantly reduces the possible number of attempts under the conditions of time constraints of the competition. To overcome this circumstance we incorporate our technique for neural network uniform quantization. Main idea of this method is tuning quantization parameters with State Through Estimator of gradients of discrete functions (round, clip) without updating weights of the neural networks instead of joint training of neural network and tuning quantization parameters. Applying proposed approach we speed up our quantization pipeline by reducing fine tuning time and amount of training data to approximately 10% of Imagenet 2012 train data.
Sergey Alyamkin currently is Chief Technology Officer at Expasoft LLC, company that develop products and provide services using state-of-the-art methods of Deep Learning and Machine learning relies on deep domain expertize. Previously Sergey was a Data Scientist at Baker Hughes for 2 years. He developed machine learing algorithms for different branches of BH busines: chemistry, artificial lift, logistics, production optimization. During this time Sergey is co-author of some scientific publication. From 2008-2014 was a Chief of RnD branch of Uniscan company. Sergey and his team developed set of novel algorithms for seismic sensors that allows to process multichannel seismic data in real-time on low-power DSP. Sergey is author of 10 scientific publications, related to Data Science and Machine Learning applications, author of patent for new type of seismic sensor. Sergey earned his B.A from Novosibirsk State University in field of semiconductor Physics. His M. A., Ph.D are related advance methods for machine learning in seismic signal processing
Alexander Goncharenko currently is Senior Deep Learning Researcher at Expasoft LLC, company that develop products and provide services using state-of-the-art methods of Deep Learning and Machine learning relies on deep domain expertize. Previously Alexander was a software developer in computer vision field. His research filed was steganography i.e. incorporation of hidden message into images. Now Alexander is Ph.D. student fo 3rd year education in Novosibirsk State University. His research interests lie in optimization and speed up neural networks
10:30AM - 11:10AM
Rethinking the Computations in Computer Vision (and the Hardware that Computes Them)
Kurt Keutzer, Berkeley
Computer vision has a long history, and for that most of that history the focus has been on accuracy not computational efficiency.
Following improvements in accuracy due to Deep Learning, latency and energy have finally become primary considerations, particularly at the edge.
In this talk we'll take a step back and reconsider what we need to compute to create accurate Deep Neural Nets that are fast and energy efficient.
In the process we will review the relative computational efficiency of both convolutional and non-convolutional layer structures. We will then turn to examine implications of these structures on the architectures of Neural Net accelerators that will ultimately compute these DNNs. Finally, we will look at the potential to automate the matchmaking between Deep Neural Nets and Neural Net accelerators.
Kurt Keutzer is Professor of the Graduate School at University of California, Berkeley. His research focuses on accelerating the training of Deep Neural Nets in the cloud and on all aspects of creating efficient Deep Learning systems at the edge.
11:10AM - 11:35AM
Turbocharge Deep Learning Inference using Reconfigurable Platforms
Ashish Sirasao (email@example.com), Xilinx
Deep learning algorithms and their combinations are now being used to solve a broad variety of problems. With the increasing scale of deployment, efficient computational hardware has come to the forefront of the large-scale implementation and deployment of deep learning algorithms. A practical implementation requires not only design space exploration of compute units and their connectivity but also the context of full system architectures in terms of the relationship of deep learning with applications like video encoding/decoding or analytics, speech recognition, anomaly detection in a network switch, and a multitude of other potential applications. Xilinx reconfigurable platforms enable optimization of deep learning inference using variable precision arithmetic, customized memory hierarchy, and model compression to address broader figures of merits, namely throughput, latency, energy, and accuracy. In this presentation, we present how reconfigurable platforms can enable optimized hardware and software co-design methodology for efficient implementation of a wide range of deep learning algorithms in real applications on cloud and edge.
Ashish Sirasao (M. Tech, EE, IIT Mumbai, 1993) is a Fellow Engineer in the Xilinx Software and IP team. He is currently involved in defining and implementing hardware and software architectures for high-performance accelerators in the area of Deep Learning, Data Analytics, Computer Vision, and Video Codecs on Xilinx FPGAs.
11:35AM - 12:00PM
The Art of MobileNet Design
Andrew Howard (firstname.lastname@example.org), Google
We present the background and technical motivations for designing extremely fast neural networks for mobile and embedded vision applications. We cover the design choices for Google’s MobileNet models and what makes them extremely efficient and well matched to the mobile use case in practice. We make theoretical connections between MobileNet building blocks and tensor factorizations. We then describe how to effectively quantize mobile models to run on fixed point devices. We conclude with a look to the future and first steps towards automatically tailoring models to next generation neural network accelerators.
Andrew Howard is a Staff Software Engineer at Google Research working on efficient computer vision models for mobile applications. He is the originator of Google’s popular MobileNet models. He received his PhD from Columbia University in computer science focusing on machine learning.
12:40PM - 1:10PM
Understanding the Challenges of Algorithm and Hardware Co-design for Deep Neural Networks
Vivienne Sze (email@example.com), MIT
The co-design of algorithm and hardware has become an increasingly important approach for addressing the computational complexity of Deep Neural Networks (DNNs). There are several open problems and challenges in the co-design process and application; for instance, what metrics should be used to drive the algorithm design, how to automate the process in a simple way, how to extend these approaches to tasks beyond image classification, and how to design flexible hardware to support these different approaches. In this talk, we highlight recent and ongoing work that aim to address these challenges, namely energy-aware pruning and NetAdapt that automatically incorporate direct metrics such as latency and energy into the training and design of the DNN; FastDepth that extends the co-design approaches to a depth estimation task; and a flexible hardware accelerator called Eyeriss v2 that is computationally efficient across a wide range of diverse DNNs.
Vivienne Sze is an Associate Professor at MIT in the Electrical Engineering and Computer Science Department. Her research interests include energy-aware signal processing algorithms, and low-power circuit and system design for portable multimedia applications, including computer vision, deep learning, autonomous navigation, and video process/coding. Prior to joining MIT, she was a Member of Technical Staff in the R&D Center at TI, where she designed low-power algorithms and architectures for video coding. She also represented TI in the JCT-VC committee of ITU-T and ISO/IEC standards body during the development of High Efficiency Video Coding (HEVC), which received a Primetime Engineering Emmy Award. She is a co-editor of the book entitled “High Efficiency Video Coding (HEVC): Algorithms and Architectures” (Springer, 2014).
Prof. Sze received the B.A.Sc. degree from the University of Toronto in 2004, and the S.M. and Ph.D. degree from MIT in 2006 and 2010, respectively. In 2011, she received the Jin-Au Kong Outstanding Doctoral Thesis Prize in Electrical Engineering at MIT. She is a recipient of the 2019 Edgerton Faculty Award, the 2018 Facebook Faculty Award, the 2018 & 2017 Qualcomm Faculty Award, the 2018 & 2016 Google Faculty Research Award, the 2016 AFOSR Young Investigator Research Program (YIP) Award, the 2016 3M Non-Tenured Faculty Award, the 2014 DARPA Young Faculty Award, the 2007 DAC/ISSCC Student Design Contest Award, and a co-recipient of the 2017 CICC Outstanding Invited Paper Award, the 2016 IEEE Micro Top Picks Award and the 2008 A-SSCC Outstanding Design Award.
For more information about research in the Energy-Efficient Multimedia Systems Group at MIT visit: http://www.rle.mit.edu/eems/
1:10PM - 1:40PM
Efficient Deep Learning: Quantizing models without re-training
Tijmen Blankevoort (firstname.lastname@example.org), Qualcomm
In this talk we’ll cover techniques to do post-training quantization that can improve model accuracy for 8-bit quantization significantly. These techniques are especially useful when training/fine-tuning is not possible, as case that arises very frequently in commercial applications. No training pipeline, optimized hyperparameters, nor full training datasets are needed. We show the effectiveness of these techniques for popular models used for inference on resource constrained devices.
Tijmen Blankevoort is the team lead for compression and quantization research in Qualcomm. With a background in Mathematics and Artificial Intelligence; he started a deep learning start-up in 2013 together with Prof. Max Welling, which is now part of Qualcomm. The Amsterdam and San Diego based compression/quantization research team focusses on making models deployed to device more efficient, and ensures that low-bit quantization can be done without a problem. With a special focus on making the process as automatic as possible. Tijmen and his team are conducting new research in this area, and simultaneously bridging the gap between research and practice. In his spare time, Tijmen loves to play Magic: The Gathering, and is a fervent molecular gastronomy cook.
Microcontrollers have extremely stringent memory and computational capabilities. In the Visual Wake Words challenge, we solicited model submissions (in TF Lite format) that fit within these resource constraints, specifically peak memory usage, model size and computational constraints. The accuracy metric for model submissions was on a two-class image classification task on Visual Wake Words dataset released by Google with two labeled classes: person/not-person. The talk will present the design space, the top submissions in the challenge, and the available tools from Tensorflow Lite for deployment on microcontrollers.
Aakanksha Engineer is ML Engineer on Tensorflow’s Mobile and Embedded team, and has previously received her PhD at Stanford University in 2013.
Pete Warden is a technical lead of TensorFlow's Mobile and Embedded team, and was previously CTO of Jetpac, acquired by Google in 2014.
2:00PM - 2:20PM
Are You Paying Attention? Classifying Attention in Pivotal Response Treatment Videos
Corey D C Heath (email@example.com), Hemanth Venkateswara (firstname.lastname@example.org), Sethuraman Panchanathan, (email@example.com)
Pivotal response treatment (PRT) has been empirically shown to aid children with autism spectrum disorder ASD improve their communication skills. The child’s primary caregivers can effectively implement PRT when provided with training and support, leading to greater opportunities for the child to improve. Utilization of computer vision technology is a critical component of creating more opportunities to support individuals implementing PRT. Automatically extracting data from videos of caregivers’ interactions with their child during PRT sessions would alleviate the human effort required to provide assessment and feedback, which would allow experts to provide greater support to more individuals. Additionally, this data could be used to provide immediate automated feedback. The process of extracting data from PRT videos is complicated and provides excellent context for a computer vision challenge. PRT videos consist of ’in-the-wild’ conditions of dyadic interactions recorded on ubiquitously available devices, and vary in filming quality. The challenge presented tasks researchers with inferring the child’s attention state in relation to the caregiver in the video based on body pose information and visual cues. Approaches will be evaluated based on accuracy metrics, however, the algorithm’s speed is also important. Having fast algorithms will reduce the time between performance and assessment, allowing for greater opportunities to situate feedback in the context of the learning activity. Low-power solutions are also necessary to accommodate delivering results on mobile devices.
2:20PM - 2:40PM
Designing efficient on-device AI using Tensorflow Lite
Aakanksha Chowdhery and Bo Chen, Google
LPIRC workshop encourages researchers to design computer vision models that are efficient in their resource usage on mobile and embedded devices, in terms of latency, energy and accuracy. We briefly discuss ML tools available to design such efficient models making it easy for you to iterate on your model design and give specific examples from object detection and image classification tasks. We conclude with the roadmap of Tensorflow Lite and engage with the community to learn about their requirements.
Bo Chen is a software engineer at Google Inc. He received a PhD in Computation and Neural Systems from the California Institute of Technology. His PhD research focused on speed, energy and accuracy trade-off of biological and artificial visual systems.