| Mario Guenzel, Kuan-Hsun Chen, Niklas Ueter, Georg Brüggen, Marco Duerr and Jian-Jia Chen. Timing Analysis of Asynchronized Distributed Cause-Effect Chains. In Real-Time and Embedded Technology and Applications Symposium (RTAS) 2021 [BibTeX][Abstract]@inproceedings { guenzel2021e2e,
author = {Guenzel, Mario and Chen, Kuan-Hsun and Ueter, Niklas and Br\"uggen, Georg and Duerr, Marco and Chen, Jian-Jia},
title = {Timing Analysis of Asynchronized Distributed Cause-Effect Chains},
booktitle = {Real-Time and Embedded Technology and Applications Symposium (RTAS)},
year = {2021},
keywords = {kuan, georg},
confidential = {n},
abstract = {Real-time systems require the formal guarantee of timing-constraints, not only for the individual tasks but also for the data-propagation paths. A cause-effect chain describes the data flow among multiple tasks, e.g., from sensors to actuators, independent from the priority order of the tasks. In this paper, we provide an end-to-end timing-analysis for cause-effect chains on asynchronized distributed systems with periodic task activations, considering the maximum reaction time (duration of data processing) and the maximum data age (worst-case data freshness). On one local electronic control unit (ECU), we present how to compute the exact local (worst-case) end-to-end latencies when the execution time of the periodic tasks is fixed. We further extend our analysis to globally asynchronized systems by combining the local results. Throughout synthesized data based on an automotive benchmark as well as on randomized parameters, we show that our analytical results improve the state-of-the-art for periodic task activations.},
} Real-time systems require the formal guarantee of timing-constraints, not only for the individual tasks but also for the data-propagation paths. A cause-effect chain describes the data flow among multiple tasks, e.g., from sensors to actuators, independent from the priority order of the tasks. In this paper, we provide an end-to-end timing-analysis for cause-effect chains on asynchronized distributed systems with periodic task activations, considering the maximum reaction time (duration of data processing) and the maximum data age (worst-case data freshness). On one local electronic control unit (ECU), we present how to compute the exact local (worst-case) end-to-end latencies when the execution time of the periodic tasks is fixed. We further extend our analysis to globally asynchronized systems by combining the local results. Throughout synthesized data based on an automotive benchmark as well as on randomized parameters, we show that our analytical results improve the state-of-the-art for periodic task activations.
|
| Hsiang-Yun Cheng, Chun-Feng Wu, Christian Hakert, Kuan-Hsun Chen, Yuan-Hao Chang, Jian-Jia Chen, Chia-Lin Yang and Tei-Wei Kuo. Future Computing Platform Design: A Cross-Layer Design Approach. In Design, Automation and Test in Europe Conference (DATE) 2021 [BibTeX][PDF][Abstract]@inproceedings { Cheng/etal/2021,
author = {Cheng, Hsiang-Yun and Wu, Chun-Feng and Hakert, Christian and Chen, Kuan-Hsun and Chang, Yuan-Hao and Chen, Jian-Jia and Yang, Chia-Lin and Kuo, Tei-Wei},
title = {Future Computing Platform Design: A Cross-Layer Design Approach},
booktitle = {Design, Automation and Test in Europe Conference (DATE)},
year = {2021},
keywords = {kuan, nvm-oma},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2021datecross.pdf},
confidential = {n},
abstract = {Future computing platforms are facing a paradigm shift with the emerging resistive memory technologies. First, they offer fast memory accesses and data persistence in a single large-capacity device deployed on the memory bus, blurring the boundary between memory and storage. Second, they enable computing-in-memory for neuromorphic computing to mitigate costly data movements. Due to the non-ideality of these resistive memory devices at the moment, we envision that cross-layer design is essential to bring such a system into practice. In this paper, we showcase a few examples to demonstrate how cross-layer design can be developed to fully exploit the potential of resistive memories and accelerate its adoption for future computing platforms.},
} Future computing platforms are facing a paradigm shift with the emerging resistive memory technologies. First, they offer fast memory accesses and data persistence in a single large-capacity device deployed on the memory bus, blurring the boundary between memory and storage. Second, they enable computing-in-memory for neuromorphic computing to mitigate costly data movements. Due to the non-ideality of these resistive memory devices at the moment, we envision that cross-layer design is essential to bring such a system into practice. In this paper, we showcase a few examples to demonstrate how cross-layer design can be developed to fully exploit the potential of resistive memories and accelerate its adoption for future computing platforms.
|
| Mikail Yayla, Kuan-Hsun Chen, Georgios Zervakis, Jörg Henkel, Jian-Jia Chen and Hussam Amrouch. FeFET and NCFET for Future Neural Networks: Visions and Opportunities. In Design, Automation and Test in Europe Conference (DATE) 2021 [BibTeX][PDF][Abstract]@inproceedings { yayla/etal/2021,
author = {Yayla, Mikail and Chen, Kuan-Hsun and Zervakis, Georgios and Henkel, J\"org and Chen, Jian-Jia and Amrouch, Hussam},
title = {FeFET and NCFET for Future Neural Networks: Visions and Opportunities},
booktitle = {Design, Automation and Test in Europe Conference (DATE)},
year = {2021},
keywords = {kuan, nvm-oma},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2021datefefet.pdf},
confidential = {n},
abstract = {The goal of this special session paper is to introduce and discuss different emerging technologies for logic circuitry and memory as well as new lightweight architectures for neural networks. We demonstrate how the ever-increasing complexity in Artificial Intelligent (AI) applications, resulting in an immense increase in the computational power, necessitates inevitably employing innovations starting from the underlying devices all the way up to the architectures. Two different promising emerging technologies will be presented: (i) Negative Capacitance Field-Effect Transistor (NCFET) as a new beyond-CMOS technology with advantages for offering low power and/or higher accuracy for neural network inference. (ii) Ferroelectric FET (FeFET) as a novel non-volatile, area-efficient and ultra-low power memory device. In addition, we demonstrate how Binary Neural Networks (BNNs) offer a promising alternative for traditional Deep Neural Networks (DNNs) due to its lightweight hardware implementation. Finally, we present the challenges from combining FeFET-based NVM with NNs and summarize our perspectives for future NNs and the vital role that emerging technologies may play.},
} The goal of this special session paper is to introduce and discuss different emerging technologies for logic circuitry and memory as well as new lightweight architectures for neural networks. We demonstrate how the ever-increasing complexity in Artificial Intelligent (AI) applications, resulting in an immense increase in the computational power, necessitates inevitably employing innovations starting from the underlying devices all the way up to the architectures. Two different promising emerging technologies will be presented: (i) Negative Capacitance Field-Effect Transistor (NCFET) as a new beyond-CMOS technology with advantages for offering low power and/or higher accuracy for neural network inference. (ii) Ferroelectric FET (FeFET) as a novel non-volatile, area-efficient and ultra-low power memory device. In addition, we demonstrate how Binary Neural Networks (BNNs) offer a promising alternative for traditional Deep Neural Networks (DNNs) due to its lightweight hardware implementation. Finally, we present the challenges from combining FeFET-based NVM with NNs and summarize our perspectives for future NNs and the vital role that emerging technologies may play.
|
| Jian-Jia Chen and Christian Hakert. Tutorial for Full System Simulations of Non-Volatile Main Memories. In Design, Automation and Test in Europe Conference 2021 [BibTeX][Link]@inproceedings { date2021tutorial,
author = {Chen, Jian-Jia and Hakert, Christian},
title = {Tutorial for Full System Simulations of Non-Volatile Main Memories},
booktitle = {Design, Automation and Test in Europe Conference},
year = {2021},
url = {https://video.tu-dortmund.de/m/c25c3c9373cc554bf8007973fe6e81fbb78c9b1f221b9b1b9cce780f9bbe4186676e7036389eb22bc89f1f605c618ab1cf5bf0d7ead76af129bd412b82bb7601},
keywords = {nvm-oma},
confidential = {n},
} |
| Sebastian Buschjäger, Jian-Jia Chen, Kuan-Hsun Chen, Mario Günzel, Katharina Morik, Rodion Novkin, Lukas Pfahler and Mikail Yayla. Bit Error Tolerance Metrics for Binarized Neural Networks. In Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA) 2021 [BibTeX][Link][Abstract]@inproceedings { buschjaegerSLOHA2021,
author = {Buschj\"ager, Sebastian and Chen, Jian-Jia and Chen, Kuan-Hsun and G\"unzel, Mario and Morik, Katharina and Novkin, Rodion and Pfahler, Lukas and Yayla, Mikail},
title = {Bit Error Tolerance Metrics for Binarized Neural Networks},
booktitle = {Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA)},
year = {2021},
url = {https://arxiv.org/abs/2102.00818},
keywords = {kuan},
confidential = {n},
abstract = {To reduce the resource demand of neural network (NN) inference systems, it has been proposed to use approximate
memory, in which the supply voltage and the timing parameters are tuned trading accuracy with energy consumption and performance. Tuning these parameters aggressively leads to bit errors, which can be tolerated by NNs when bit flips are injected during training. However, bit flip training, which is the state of the art for achieving bit error tolerance, does not scale well; it leads to massive overheads and cannot be applied for high bit error rates (BERs). Alternative methods to achieve bit error tolerance in NNs are needed, but the underlying principles behind the bit error tolerance of NNs have not been reported yet. With this lack of understanding, further progress in the research on NN bit error tolerance will be restrained.
In this study, our objective is to investigate the internal changes in the NNs that bit flip training causes, with a focus on Binarized NNs (BNNs). To this end, we quantify the properties of bit error tolerant BNNs with two metrics. First, we propose a neuron-level bit error tolerance metric, which calculates the margin between the pre-activation values and batch normalization thresholds. Secondly, to capture the effects of bit error tolerance on the interplay of neurons, we propose an inter-neuron bit error tolerance metric, which measures the importance of each neuron and computes the variance over all importance values. Our experimental results support that these two metrics are strongly related to bit error tolerance.},
} To reduce the resource demand of neural network (NN) inference systems, it has been proposed to use approximate
memory, in which the supply voltage and the timing parameters are tuned trading accuracy with energy consumption and performance. Tuning these parameters aggressively leads to bit errors, which can be tolerated by NNs when bit flips are injected during training. However, bit flip training, which is the state of the art for achieving bit error tolerance, does not scale well; it leads to massive overheads and cannot be applied for high bit error rates (BERs). Alternative methods to achieve bit error tolerance in NNs are needed, but the underlying principles behind the bit error tolerance of NNs have not been reported yet. With this lack of understanding, further progress in the research on NN bit error tolerance will be restrained.
In this study, our objective is to investigate the internal changes in the NNs that bit flip training causes, with a focus on Binarized NNs (BNNs). To this end, we quantify the properties of bit error tolerant BNNs with two metrics. First, we propose a neuron-level bit error tolerance metric, which calculates the margin between the pre-activation values and batch normalization thresholds. Secondly, to capture the effects of bit error tolerance on the interplay of neurons, we propose an inter-neuron bit error tolerance metric, which measures the importance of each neuron and computes the variance over all importance values. Our experimental results support that these two metrics are strongly related to bit error tolerance.
|
| Christian Hakert and Jian-Jia Chen. [Demo] Tutorial for Full System Simulations of Non-Volatile Main Memories. In Design, Automation and Test in Europe Conference 2021 [BibTeX][Link]@inproceedings { date2021tutorialdemo,
author = {Hakert, Christian and Chen, Jian-Jia},
title = {[Demo] Tutorial for Full System Simulations of Non-Volatile Main Memories},
booktitle = {Design, Automation and Test in Europe Conference},
year = {2021},
url = {https://video.tu-dortmund.de/m/d62742ad8a171810b7af16e983f4e1349ba7ea83a0918ae3944d1659b9c5ae4d0730d7cac39cba40f5851f2001a6c17f5f33b2f67802cb722eb6295618357ddb},
keywords = {nvm-oma},
confidential = {n},
} |
| Sebastian Buschjäger, Jian-Jia Chen, Kuan-Hsun Chen, Mario Günzel, Christian Hakert, Katharina Morik, Rodion Novkin, Lukas Pfahler and Mikail Yayla. Margin-Maximization in Binarized Neural Networks for Optimizing Bit Error Tolerance. In Design, Automation and Test in Europe Conference (DATE), accepted 2021, Best Paper Award Candidate [BibTeX][PDF][Abstract]@inproceedings { buschjaeger/etal/2021,
author = {Buschj\"ager, Sebastian and Chen, Jian-Jia and Chen, Kuan-Hsun and G\"unzel, Mario and Hakert, Christian and Morik, Katharina and Novkin, Rodion and Pfahler, Lukas and Yayla, Mikail},
title = {Margin-Maximization in Binarized Neural Networks for Optimizing Bit Error Tolerance},
booktitle = {Design, Automation and Test in Europe Conference (DATE), accepted},
year = {2021},
note = {Best Paper Award Candidate},
keywords = {kuan, nvm-oma},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2021dateyayla.pdf},
confidential = {n},
abstract = {To overcome the memory wall in neural network (NN) inference systems, recent studies have proposed to use approximate memory, in which the supply voltage and access latency parameters are tuned, for lower energy consumption and faster access at the cost of reliability. To tolerate the occuring bit errors, the state-of-the-art approaches apply bit flip injections to the NNs during training, which require high overheads and do not scale well for large NNs and high bit error rates. In this work, we focus on binarized NNs (BNNs), whose simpler structure allows better exploration of bit error tolerance metrics based on margins. We provide formal proofs to quantify the maximum number of bit flips that can be tolerated. With the proposed margin-based metrics and the well-known hinge loss for maximum margin classification in support vector machines (SVMs), we are able to construct a modified hinge loss (MHL) to train BNNs for bit error tolerance without any bit flip injections. Our experimental results indicate that the MHL enables the possibility for BNNs to tolerate higher bit error rates than with bit flip training and, therefore, allows to further lower the requirements on approximate memories used for BNNs. },
} To overcome the memory wall in neural network (NN) inference systems, recent studies have proposed to use approximate memory, in which the supply voltage and access latency parameters are tuned, for lower energy consumption and faster access at the cost of reliability. To tolerate the occuring bit errors, the state-of-the-art approaches apply bit flip injections to the NNs during training, which require high overheads and do not scale well for large NNs and high bit error rates. In this work, we focus on binarized NNs (BNNs), whose simpler structure allows better exploration of bit error tolerance metrics based on margins. We provide formal proofs to quantify the maximum number of bit flips that can be tolerated. With the proposed margin-based metrics and the well-known hinge loss for maximum margin classification in support vector machines (SVMs), we are able to construct a modified hinge loss (MHL) to train BNNs for bit error tolerance without any bit flip injections. Our experimental results indicate that the MHL enables the possibility for BNNs to tolerate higher bit error rates than with bit flip training and, therefore, allows to further lower the requirements on approximate memories used for BNNs.
|
| Christian Hakert, Asif-Ali Khan, Kuan-Hsun Chen, Fazal, Jeronimo Castrillon and Jian-Jia Chen. BLOwing Trees to the Ground: Layout Optimization of Decision Trees on Racetrack Memory. In 58th ACM/IEEE Design Automation Conference (DAC), accepted 2021 [BibTeX][Abstract]@inproceedings { HakertDAC21,
author = {Hakert, Christian and Khan, Asif-Ali and Chen, Kuan-Hsun and Fazal, and Castrillon, Jeronimo and Chen, Jian-Jia},
title = {BLOwing Trees to the Ground: Layout Optimization of Decision Trees on Racetrack Memory},
booktitle = {58th ACM/IEEE Design Automation Conference (DAC), accepted},
year = {2021},
keywords = {kuan, nvm-oma},
confidential = {n},
abstract = {Modern embedded-systems integrate machine learning algorithms. In resource constrained setups, execution has to be optimized for execution time and energy. In order to access data in RTM, it needs to be shifted to the access port. We propose a novel domain specific approach for placing decision trees in RTMs. We reduce the total amount of shits by exploiting the tree structure. We prove that the theoretical optimal decision tree placement is at most 4× better in terms of shifts than our proposed approach. Throughout extensive experiments, we show that our method outperforms the state-of-the-art methods.},
} Modern embedded-systems integrate machine learning algorithms. In resource constrained setups, execution has to be optimized for execution time and energy. In order to access data in RTM, it needs to be shifted to the access port. We propose a novel domain specific approach for placing decision trees in RTMs. We reduce the total amount of shits by exploiting the tree structure. We prove that the theoretical optimal decision tree placement is at most 4× better in terms of shifts than our proposed approach. Throughout extensive experiments, we show that our method outperforms the state-of-the-art methods.
|