| Qiao Yu, Kuan-Hsun Chen and Jian-Jia Chen. Using a Set of Triangle Inequalities to Accelerate K-means Clustering. In Similarity Search and Applications - 13th International Conference (SISAP) Virtual Conference, Sep 30 - Oct 2 2020 [BibTeX][Abstract]@inproceedings { yu2020,
author = {Yu, Qiao and Chen, Kuan-Hsun and Chen, Jian-Jia},
title = {Using a Set of Triangle Inequalities to Accelerate K-means Clustering},
booktitle = {Similarity Search and Applications - 13th International Conference (SISAP)},
year = {2020},
editor = {Shin'ichi Satoh, Lucia Vadicamo, Arthur Zimek, Fabio Carrara, Ilaria Bartolini, Martin Aum\"uller, Bj\"orn Þór Jónsson, Rasmus Pagh},
address = {Virtual Conference},
month = {Sep 30 - Oct 2},
publisher = {Springer},
keywords = {kuan},
confidential = {n},
abstract = {The k-means clustering is a well-known problem in data mining and machine learning. However, the de facto standard, i.e., Lloyd’s k-mean algorithm, suffers from a large amount of time on the distance calculations. Elkan’s k-means algorithm as one prominent approach exploits triangle inequality to greatly reduce such distance calculations between points and centers, while achieving the exactly same clustering results with significant speed improvement, especially on high-dimensional datasets. In this paper, we propose a set of triangle inequalities to enhance the filtering step of Elkan’s k-means algorithm. With our new
filtering bounds, a filtering-based Elkan (FB-Elkan) is proposed, which preserves the same results as Lloyd’s k-means algorithm and additionally prunes unnecessary distance calculations. In addition, a memory-optimized Elkan (MO-Elkan) is provided, where the space complexity is greatly reduced by trading-off the maintenance of lower bounds and the run-time efficiency. Throughout evaluations with real-world datasets, FB-Elkan in general accelerates the original Elkan’s k-means algorithm
for high-dimensional datasets (up to 1.69x), whereas MO-Elkan outperforms the others for low-dimensional datasets (up to 2.48x). Specifically, when the datasets have a large number of points, i.e., n ≥ 5M, MO-Elkan still can derive the exact clustering results, while the original Elkan’s k-means algorithm is not applicable due to memory limitation.},
} The k-means clustering is a well-known problem in data mining and machine learning. However, the de facto standard, i.e., Lloyd’s k-mean algorithm, suffers from a large amount of time on the distance calculations. Elkan’s k-means algorithm as one prominent approach exploits triangle inequality to greatly reduce such distance calculations between points and centers, while achieving the exactly same clustering results with significant speed improvement, especially on high-dimensional datasets. In this paper, we propose a set of triangle inequalities to enhance the filtering step of Elkan’s k-means algorithm. With our new
filtering bounds, a filtering-based Elkan (FB-Elkan) is proposed, which preserves the same results as Lloyd’s k-means algorithm and additionally prunes unnecessary distance calculations. In addition, a memory-optimized Elkan (MO-Elkan) is provided, where the space complexity is greatly reduced by trading-off the maintenance of lower bounds and the run-time efficiency. Throughout evaluations with real-world datasets, FB-Elkan in general accelerates the original Elkan’s k-means algorithm
for high-dimensional datasets (up to 1.69x), whereas MO-Elkan outperforms the others for low-dimensional datasets (up to 2.48x). Specifically, when the datasets have a large number of points, i.e., n ≥ 5M, MO-Elkan still can derive the exact clustering results, while the original Elkan’s k-means algorithm is not applicable due to memory limitation.
|
| Yunfeng Huang, Fang-Jing Wu, Christian Hakert, Georg Brüggen, Kuan-Hsun Chen, Jian-Jia Chen, Patrick Böcker, Petr Chernikov, Luis Cruz, Zeyi Duan, Ahmed Gheith, Anand Gopalan Yantao Gong, Karthik Prakash, Ammar Tauqir and Yue Wang. Demo Abstract: Perception vs. Reality - Never Believe in What You See. In 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN) Virtual Conference, 2020 [BibTeX][PDF][Abstract]@inproceedings { ipsndemo2020,
author = {Huang, Yunfeng and Wu, Fang-Jing and Hakert, Christian and Br\"uggen, Georg and Chen, Kuan-Hsun and Chen, Jian-Jia and B\"ocker, Patrick and Chernikov, Petr and Cruz, Luis and Duan, Zeyi and Gheith, Ahmed and Yantao Gong, Anand Gopalan and Prakash, Karthik and Tauqir, Ammar and Wang, Yue},
title = {Demo Abstract: Perception vs. Reality - Never Believe in What You See},
booktitle = {19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)},
year = {2020 },
address = {Virtual Conference},
keywords = {kuan, georg},
file = {https://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2020-ipsn.pdf},
confidential = {n},
abstract = {The increasing availability of heterogeneous ambient sensing systems challenges the according information processing systems to analyse and compare a variety of different systems in a single scenario. For instance, localization of objects can be performed by image processing systems as well as by radio based localization. If such systems are utilized to localize the same objects, synergy of the outputs is important to enable comparable and meaningful analysis.This demo showcases the practical deployment and challenges ofsuch an example system.},
} The increasing availability of heterogeneous ambient sensing systems challenges the according information processing systems to analyse and compare a variety of different systems in a single scenario. For instance, localization of objects can be performed by image processing systems as well as by radio based localization. If such systems are utilized to localize the same objects, synergy of the outputs is important to enable comparable and meaningful analysis.This demo showcases the practical deployment and challenges ofsuch an example system.
|
| Christian Hakert, Kuan-Hsun Chen, Mikail Yayla, Georg von der Brüggen, Sebastian Bloemeke and Jian-Jia Chen. Software-Based Memory Analysis Environments for In-Memory Wear-Leveling. In 25th Asia and South Pacific Design Automation Conference ASP-DAC 2020, Invited Paper Beijing, China, 2020 [BibTeX][PDF][Abstract]@inproceedings { nvmsimulator,
author = {Hakert, Christian and Chen, Kuan-Hsun and Yayla, Mikail and Br\"uggen, Georg von der and Bloemeke, Sebastian and Chen, Jian-Jia},
title = {Software-Based Memory Analysis Environments for In-Memory Wear-Leveling},
booktitle = {25th Asia and South Pacific Design Automation Conference ASP-DAC 2020, Invited Paper},
year = {2020},
address = {Beijing, China},
keywords = {kuan, nvm-oma, georg},
file = {https://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2020-aspdac-nvm.pdf},
confidential = {n},
abstract = {Emerging non-volatile memory (NVM) architectures are considered as a replacement for DRAM and storage in the near future, since NVMs provide low power consumption, fast access speed, and low unit cost. Due to the lower write-endurance of NVMs, several in-memory wear-leveling techniques have been studied over the last years. Since most approaches propose or rely on specialized hardware, the techniques are often evaluated based on assumptions and in-house simulations rather than on real systems. To address this issue, we develop a setup consisting of a gem5 instance and an NVMain2.0 instance, which simulates an entire system (CPU, peripherals, etc.) together with an NVM plugged into the system. Taking a recorded memory access pattern from a low-level simulation into consideration to design and optimize wear-leveling techniques as operating system services allows a cross-layer design of wear- leveling techniques. With the insights gathered by analyzing the recorded memory access patterns, we develop a software-only wear-leveling solution, which does not require special hardware at all. This algorithm is evaluated afterwards by the full system simulation.},
} Emerging non-volatile memory (NVM) architectures are considered as a replacement for DRAM and storage in the near future, since NVMs provide low power consumption, fast access speed, and low unit cost. Due to the lower write-endurance of NVMs, several in-memory wear-leveling techniques have been studied over the last years. Since most approaches propose or rely on specialized hardware, the techniques are often evaluated based on assumptions and in-house simulations rather than on real systems. To address this issue, we develop a setup consisting of a gem5 instance and an NVMain2.0 instance, which simulates an entire system (CPU, peripherals, etc.) together with an NVM plugged into the system. Taking a recorded memory access pattern from a low-level simulation into consideration to design and optimize wear-leveling techniques as operating system services allows a cross-layer design of wear- leveling techniques. With the insights gathered by analyzing the recorded memory access patterns, we develop a software-only wear-leveling solution, which does not require special hardware at all. This algorithm is evaluated afterwards by the full system simulation.
|
| Wei-Chun Cheng, Shuo-Han Chen, Yuan-Hao Chang, Kuan-Hsun Chen, Jian-Jia Chen, Tseng-Yi Chen, Ming-Chang Yang and Wei-Kuan Shih. NS-FTL: Alleviating the Uneven Bit-Level Wearing of NVRAM-based FTL via NAND-SPIN. In 9th Non-Volatile Memory Systems and Applications Symposium (NVMSA) Virtual Conference, 2020 [BibTeX][PDF][Abstract]@inproceedings { most2020nvmsa,
author = {Cheng, Wei-Chun and Chen, Shuo-Han and Chang, Yuan-Hao and Chen, Kuan-Hsun and Chen, Jian-Jia and Chen, Tseng-Yi and Yang, Ming-Chang and Shih, Wei-Kuan},
title = {NS-FTL: Alleviating the Uneven Bit-Level Wearing of NVRAM-based FTL via NAND-SPIN},
booktitle = {9th Non-Volatile Memory Systems and Applications Symposium (NVMSA)},
year = {2020},
address = {Virtual Conference},
keywords = {kuan, nvm-oma, },
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2020nvmsa-ftl.pdf},
confidential = {n},
abstract = {Non-Volatile random access memory (NVRAM) has been regarded as a promising DRAM alternative with its non volatility, near-zero idle power consumption, and byte addressability. In particular, some NVRAM devices, such as Spin Torque Transfer (STT) RAM, can provide the same or better access performance and lower power consumption when compared with dynamic random access memory (DRAM). These nice features make NVRAM become an attractive DRAM replacement on NAND flash storage for resolving the management overhead of the flash translation layer (FTL). For instance, when adopting NVRAM for storing the mapping entries of FTL, the overheads of loading and storing the mapping entries between the non-volatile NAND flash and the volatile DRAM can be eliminated. Nevertheless, due to the limited lifetime constraint of NVRAM, the bit-level update behavior of FTL may lead to the issue of uneven bit-level wearing and the lifetime capacity of those less-worn NVRAM cells could be underutilized. Such an observation motivates this study to utilize the emerging NAND-like Spin Torque Transfer memory (NAND-SPIN) for alleviating the uneven bit-level wearing of NVRAM-based FTL and making the best of the lifetime capacity of each NAND-SPIN cell. The experimental results show that the proposed design can effectively avoid the uneven bit-level wearing, when compared with page-based FTL on NAND-SPIN.},
} Non-Volatile random access memory (NVRAM) has been regarded as a promising DRAM alternative with its non volatility, near-zero idle power consumption, and byte addressability. In particular, some NVRAM devices, such as Spin Torque Transfer (STT) RAM, can provide the same or better access performance and lower power consumption when compared with dynamic random access memory (DRAM). These nice features make NVRAM become an attractive DRAM replacement on NAND flash storage for resolving the management overhead of the flash translation layer (FTL). For instance, when adopting NVRAM for storing the mapping entries of FTL, the overheads of loading and storing the mapping entries between the non-volatile NAND flash and the volatile DRAM can be eliminated. Nevertheless, due to the limited lifetime constraint of NVRAM, the bit-level update behavior of FTL may lead to the issue of uneven bit-level wearing and the lifetime capacity of those less-worn NVRAM cells could be underutilized. Such an observation motivates this study to utilize the emerging NAND-like Spin Torque Transfer memory (NAND-SPIN) for alleviating the uneven bit-level wearing of NVRAM-based FTL and making the best of the lifetime capacity of each NAND-SPIN cell. The experimental results show that the proposed design can effectively avoid the uneven bit-level wearing, when compared with page-based FTL on NAND-SPIN.
|
| Jian-Jia Chen, Wen-Hung Huang, Georg Brüggen, Kuan-Hsun Chen and Niklas Ueter. A Journey of Embedded and Cyber-Physical Systems. 2020 [BibTeX][Link][Abstract]@inbook { Chen/etal/2020,
author = {Chen, Jian-Jia and Huang, Wen-Hung and Br\"uggen, Georg and Chen, Kuan-Hsun and Ueter, Niklas},
title = {A Journey of Embedded and Cyber-Physical Systems},
editor = {Chen, Jian-Jia},
chapter = {On the Formalism and Properties of Timing Analyses in Real-Time Embedded Systems},
pages = {37-55},
publisher = {Springer},
year = {2020},
url = {https://library.oapen.org/bitstream/handle/20.500.12657/41302/2021_Book_AJourneyOfEmbeddedAndCyber-Phy.pdf?sequence=1#page=49},
keywords = {kuan},
confidential = {n},
abstract = {The advanced development of embedded computing devices, accessible networks, and sensor devices has triggered the emergence of complex cyber-physical systems (CPS). In such systems, advanced embedded computing and information processing systems heavily interact with the physical world. Cyber-physical systems are integrations of computation, networking, and physical processes to achieve high stability, performance, reliability, robustness, and efficiency [26]. A cyberphysical system continuously monitors and affects the physical environment which also interactively imposes feedback to the information processing system. The applications of CPS include healthcare, automotive systems, aerospace, power grids, water distribution, disaster recovery, etc.
Due to their intensive interaction with the physical world, in which time naturally progresses, timeliness is an essential requirement of correctness for CPS. Communication and computation of safety-critical tasks should be finished within a specified amount of time, called deadline. Otherwise, even if the results are correctly delivered from the functional perspective, the reaction of the CPS may be too late and have catastrophic consequences. One example is the release of an airbag in a vehicle, which only functions properly if the bag is filled with the correct amount of air in the correct time interval after a collision, even in the worst-case timing scenario. While in an entertainment gadget a delayed computation result is inconvenient, in the control of a vehicle it can be fatal. Therefore, a modern society cannot adopt a technological advance when it is not safe.},
} The advanced development of embedded computing devices, accessible networks, and sensor devices has triggered the emergence of complex cyber-physical systems (CPS). In such systems, advanced embedded computing and information processing systems heavily interact with the physical world. Cyber-physical systems are integrations of computation, networking, and physical processes to achieve high stability, performance, reliability, robustness, and efficiency [26]. A cyberphysical system continuously monitors and affects the physical environment which also interactively imposes feedback to the information processing system. The applications of CPS include healthcare, automotive systems, aerospace, power grids, water distribution, disaster recovery, etc.
Due to their intensive interaction with the physical world, in which time naturally progresses, timeliness is an essential requirement of correctness for CPS. Communication and computation of safety-critical tasks should be finished within a specified amount of time, called deadline. Otherwise, even if the results are correctly delivered from the functional perspective, the reaction of the CPS may be too late and have catastrophic consequences. One example is the release of an airbag in a vehicle, which only functions properly if the bag is filled with the correct amount of air in the correct time interval after a collision, even in the worst-case timing scenario. While in an entertainment gadget a delayed computation result is inconvenient, in the control of a vehicle it can be fatal. Therefore, a modern society cannot adopt a technological advance when it is not safe.
|
| Christian Hakert, Kuan-Hsun Chen, Simon Kuenzer, Sharan Santhanam, Shuo-Han Chen, Yuan-Hao Chang, Felipe Huici and Jian-Jia Chen. Split’n Trace NVM: Leveraging Library OSes for Semantic Memory Tracing. In 9th Non-Volatile Memory Systems and Applications Symposium (NVMSA) Virtual Conference, 2020 [BibTeX][PDF][Abstract]@inproceedings { hakert2020nvmsa,
author = {Hakert, Christian and Chen, Kuan-Hsun and Kuenzer, Simon and Santhanam, Sharan and Chen, Shuo-Han and Chang, Yuan-Hao and Huici, Felipe and Chen, Jian-Jia},
title = {Split’n Trace NVM: Leveraging Library OSes for Semantic Memory Tracing},
booktitle = {9th Non-Volatile Memory Systems and Applications Symposium (NVMSA)},
year = {2020},
address = {Virtual Conference},
keywords = {kuan, nvm-oma, },
file = {https://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2020-nvmsa-hakert.pdf},
confidential = {n},
abstract = {With the rise of non-volatile memory (NVM) as a replacement for traditional main memories (e.g. DRAM), memory access analysis is becoming an increasingly important topic. NVMs suffer from technical shortcomings as such as reduced cell endurance which call for precise memory access analysis in order to design maintenance strategies that can extend the memory’s lifetime. While existing memory access analyzers trace memory accesses at various levels, from the application level with code instrumentation, down to the hardware level where software is executed on special analysis hardware, they usually interpret main memory as a consecutive area, without investigating the application semantics of different memory regions.
In contrast, this paper presents a memory access simulator, which splits the main memory into semantic regions and enriches the simulation result with semantics from the analyzed application. We leverage a library-based operating system called Unikraft by ascribing memory regions of the simulation to the relevant OS libraries. This novel approach allows us to derive a detailed analysis of which libraries (and thus functionalities) are responsible for which memory access patterns. Through offline profiling with our simulator, we provide a fine-granularity analysis of memory access patterns that provide insights for the design of efficient NVM maintenance strategies.},
} With the rise of non-volatile memory (NVM) as a replacement for traditional main memories (e.g. DRAM), memory access analysis is becoming an increasingly important topic. NVMs suffer from technical shortcomings as such as reduced cell endurance which call for precise memory access analysis in order to design maintenance strategies that can extend the memory’s lifetime. While existing memory access analyzers trace memory accesses at various levels, from the application level with code instrumentation, down to the hardware level where software is executed on special analysis hardware, they usually interpret main memory as a consecutive area, without investigating the application semantics of different memory regions.
In contrast, this paper presents a memory access simulator, which splits the main memory into semantic regions and enriches the simulation result with semantics from the analyzed application. We leverage a library-based operating system called Unikraft by ascribing memory regions of the simulation to the relevant OS libraries. This novel approach allows us to derive a detailed analysis of which libraries (and thus functionalities) are responsible for which memory access patterns. Through offline profiling with our simulator, we provide a fine-granularity analysis of memory access patterns that provide insights for the design of efficient NVM maintenance strategies.
|
| Niklas Ueter, Kuan-Hsun Chen and Jian-Jia Chen. Project-Based CPS Education: A Case Study of an Autonomous Driving Student Project. IEEE Design & Test 2020 [BibTeX][PDF][Link][Abstract]@article { CPSeducation2020,
author = {Ueter, Niklas and Chen, Kuan-Hsun and Chen, Jian-Jia},
title = {Project-Based CPS Education: A Case Study of an Autonomous Driving Student Project},
journal = {IEEE Design \& Test},
year = {2020},
url = {https://ieeexplore.ieee.org/abstract/document/9149697},
keywords = {Kuan},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/dt-cps-education.pdf},
confidential = {n},
abstract = {The classic lecture and exercise based teaching is the predominant way to educate computer science and engineering students at university. This is partly due to the time constraints resulting from the extensive curricula. We show that project-based cyber-physical systems (CPS) education (based on a relatively extensive and complex engineering problem) helps the students to learn transferring their acquired fundamental knowledge to real-world application, and learn how to handle non-idealized problems. In this paper, we explain the educational concepts, theoretic foundations, and report the students’ results of our autonomous driving project-based course. We show the students achieved, e.g., management framework, simulation environment, navigation algorithms, etc. To evaluate the conjecture of our proposed concept, we review the anonymous ratings conducted by the university faculty and discuss the final results.},
} The classic lecture and exercise based teaching is the predominant way to educate computer science and engineering students at university. This is partly due to the time constraints resulting from the extensive curricula. We show that project-based cyber-physical systems (CPS) education (based on a relatively extensive and complex engineering problem) helps the students to learn transferring their acquired fundamental knowledge to real-world application, and learn how to handle non-idealized problems. In this paper, we explain the educational concepts, theoretic foundations, and report the students’ results of our autonomous driving project-based course. We show the students achieved, e.g., management framework, simulation environment, navigation algorithms, etc. To evaluate the conjecture of our proposed concept, we review the anonymous ratings conducted by the university faculty and discuss the final results.
|
| Christian Hakert, Kuan-Hsun Chen, Paul R. Genssler, Georg Brüggen, Lars Bauer, Hussam Amrouch, Jian-Jia Chen and Jörg Henkel. SoftWear: Software-Only In-Memory Wear-Leveling for Non-Volatile Main Memory. CoRR abs/2004.03244 2020 [BibTeX][Link][Abstract]@article { hakert2020softwear,
author = {Hakert, Christian and Chen, Kuan-Hsun and Genssler, Paul R. and Br\"uggen, Georg and Bauer, Lars and Amrouch, Hussam and Chen, Jian-Jia and Henkel, J\"org},
title = {SoftWear: Software-Only In-Memory Wear-Leveling for Non-Volatile Main Memory},
journal = {CoRR},
year = {2020},
volume = {abs/2004.03244},
url = {https://arxiv.org/pdf/2004.03244.pdf},
keywords = {kuan, nvm-oma, georg},
confidential = {n},
abstract = {Several emerging technologies for byte-addressable non-volatile memory (NVM) have been considered to replace DRAM as the main memory in computer systems during the last years. The disadvantage of a lower write endurance, compared to DRAM, of NVM technologies like Phase-Change Memory (PCM) or Ferroelectric RAM (FeRAM) has been addressed in the literature. As a solution, in-memory wear-leveling techniques have been proposed, which aim to balance the wear-level over all memory cells to achieve an increased memory lifetime. Generally, to apply such advanced aging-aware wear-leveling techniques proposed in the literature, additional special hardware is introduced into the memory system to provide the necessary information about the cell age and thus enable aging-aware wear-leveling decisions.
This paper proposes software-only aging-aware wear-leveling based on common CPU features and does not rely on any additional hardware support from the memory subsystem. Specifically, we exploit the memory management unit (MMU), performance counters, and interrupts to approximate the memory write counts as an aging indicator. Although the software-only approach may lead to slightly worse wear-leveling, it is applicable on commonly available hardware. We achieve page-level coarse-grained wear-leveling by approximating the current cell age through statistical sampling and performing physical memory remapping through the MMU. This method results in non-uniform memory usage patterns within a memory page. Hence, we further propose a fine-grained wear-leveling in the stack region of C / C++ compiled software.
By applying both wear-leveling techniques, we achieve up to 78.43% of the ideal memory lifetime, which is a lifetime improvement of more than a factor of 900 compared to the lifetime without any wear-leveling. },
} Several emerging technologies for byte-addressable non-volatile memory (NVM) have been considered to replace DRAM as the main memory in computer systems during the last years. The disadvantage of a lower write endurance, compared to DRAM, of NVM technologies like Phase-Change Memory (PCM) or Ferroelectric RAM (FeRAM) has been addressed in the literature. As a solution, in-memory wear-leveling techniques have been proposed, which aim to balance the wear-level over all memory cells to achieve an increased memory lifetime. Generally, to apply such advanced aging-aware wear-leveling techniques proposed in the literature, additional special hardware is introduced into the memory system to provide the necessary information about the cell age and thus enable aging-aware wear-leveling decisions.
This paper proposes software-only aging-aware wear-leveling based on common CPU features and does not rely on any additional hardware support from the memory subsystem. Specifically, we exploit the memory management unit (MMU), performance counters, and interrupts to approximate the memory write counts as an aging indicator. Although the software-only approach may lead to slightly worse wear-leveling, it is applicable on commonly available hardware. We achieve page-level coarse-grained wear-leveling by approximating the current cell age through statistical sampling and performing physical memory remapping through the MMU. This method results in non-uniform memory usage patterns within a memory page. Hence, we further propose a fine-grained wear-leveling in the stack region of C / C++ compiled software.
By applying both wear-leveling techniques, we achieve up to 78.43% of the ideal memory lifetime, which is a lifetime improvement of more than a factor of 900 compared to the lifetime without any wear-leveling.
|
| Sebastian Buschjäger, Jian-Jia Chen, Kuan-Hsun Chen, Mario Günzel, Christian Hakert, Katharina Morik, Rodion Novkin, Lukas Pfahler and Mikail Yayla. Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks. CoRR abs/2002.00909 2020 [BibTeX][Link][Abstract]@article { buschjger2020explainable,
author = {Buschj\"ager, Sebastian and Chen, Jian-Jia and Chen, Kuan-Hsun and G\"unzel, Mario and Hakert, Christian and Morik, Katharina and Novkin, Rodion and Pfahler, Lukas and Yayla, Mikail},
title = {Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks},
journal = {CoRR},
year = {2020},
volume = {abs/2002.00909},
url = {https://arxiv.org/pdf/2002.00909.pdf},
keywords = {kuan, nvm-oma, mario},
confidential = {n},
abstract = {Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy-efficient storage, especially for low-power machine learning models on the edge. It is reported, however, that the bit error rate of RRAMs can be up to 3.3% in the ultra low-power setting, which might be crucial for many use cases. Binary neural networks (BNNs), a resource efficient variant of neural networks (NNs), can tolerate a certain percentage of errors without a loss in accuracy and demand lower resources in computation and storage. The bit error tolerance (BET) in BNNs can be achieved by flipping the weight signs during training, as proposed by Hirtzlin et al., but their method has a significant drawback, especially for fully connected neural networks (FCNN): The FCNNs overfit to the error rate used in training, which leads to low accuracy under lower error rates. In addition, the underlying principles of BET are not investigated. In this work, we improve the training for BET of BNNs and aim to explain this property. We propose straight-through gradient approximation to improve the weight-sign-flip training, by which BNNs adapt less to the bit error rates. To explain the achieved robustness, we define a metric that aims to measure BET without fault injection. We evaluate the metric and find that it correlates with accuracy over error rate for all FCNNs tested. Finally, we explore the influence of a novel regularizer that optimizes with respect to this metric, with the aim of providing a configurable trade-off in accuracy and BET.},
} Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy-efficient storage, especially for low-power machine learning models on the edge. It is reported, however, that the bit error rate of RRAMs can be up to 3.3% in the ultra low-power setting, which might be crucial for many use cases. Binary neural networks (BNNs), a resource efficient variant of neural networks (NNs), can tolerate a certain percentage of errors without a loss in accuracy and demand lower resources in computation and storage. The bit error tolerance (BET) in BNNs can be achieved by flipping the weight signs during training, as proposed by Hirtzlin et al., but their method has a significant drawback, especially for fully connected neural networks (FCNN): The FCNNs overfit to the error rate used in training, which leads to low accuracy under lower error rates. In addition, the underlying principles of BET are not investigated. In this work, we improve the training for BET of BNNs and aim to explain this property. We propose straight-through gradient approximation to improve the weight-sign-flip training, by which BNNs adapt less to the bit error rates. To explain the achieved robustness, we define a metric that aims to measure BET without fault injection. We evaluate the metric and find that it correlates with accuracy over error rate for all FCNNs tested. Finally, we explore the influence of a novel regularizer that optimizes with respect to this metric, with the aim of providing a configurable trade-off in accuracy and BET.
|
| Lea Schönberger, Georg Brüggen, Kuan-Hsun Chen, Benjamin Sliwa, Hazem Youssef, Aswin Ramachandran, Christian Wietfeld, Michael Hompel and Jian-Jia Chen. Offloading Safety- and Mission-Critical Tasks via Unreliable Connections. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020) Virtual Conference, June 2020 [BibTeX][Link][Abstract]@inproceedings { schoenberger2020ecrts,
author = {Sch\"onberger, Lea and Br\"uggen, Georg and Chen, Kuan-Hsun and Sliwa, Benjamin and Youssef, Hazem and Ramachandran, Aswin and Wietfeld, Christian and Hompel, Michael and Chen, Jian-Jia},
title = {Offloading Safety- and Mission-Critical Tasks via Unreliable Connections},
booktitle = {32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)},
year = {2020},
address = {Virtual Conference},
month = {June},
url = {https://drops.dagstuhl.de/opus/volltexte/2020/12381/},
keywords = {lea, georg, kuan},
confidential = {n},
abstract = {For many cyber-physical systems, e.g., IoT systems and autonomous vehicles, offloading workload to auxiliary processing units has become crucial. However, since this approach highly depends on network connectivity and responsiveness, typically only non-critical tasks are offloaded, which have less strict timing requirements than critical tasks. In this work, we provide two protocols allowing to offload critical and non-critical tasks likewise, while providing different service levels for non-critical tasks in the event of an unsuccessful offloading operation, depending on the respective system requirements. We analyze the worst-case timing behavior of the local cyber-physical system and, based on these analyses, we provide a sufficient schedulability test for each of the proposed protocols. In the course of comprehensive experiments, we show that our protocols have reasonable acceptance ratios under the provided schedulability tests. Moreover, we demonstrate that the system behavior under our proposed protocols is strongly dependent on probability of unsuccessful offloading operations, the percentage of critical tasks in the system, and the amount of offloaded workload.},
} For many cyber-physical systems, e.g., IoT systems and autonomous vehicles, offloading workload to auxiliary processing units has become crucial. However, since this approach highly depends on network connectivity and responsiveness, typically only non-critical tasks are offloaded, which have less strict timing requirements than critical tasks. In this work, we provide two protocols allowing to offload critical and non-critical tasks likewise, while providing different service levels for non-critical tasks in the event of an unsuccessful offloading operation, depending on the respective system requirements. We analyze the worst-case timing behavior of the local cyber-physical system and, based on these analyses, we provide a sufficient schedulability test for each of the proposed protocols. In the course of comprehensive experiments, we show that our protocols have reasonable acceptance ratios under the provided schedulability tests. Moreover, we demonstrate that the system behavior under our proposed protocols is strongly dependent on probability of unsuccessful offloading operations, the percentage of critical tasks in the system, and the amount of offloaded workload.
|
| Marcel Ebbrecht, Kuan-Hsun Chen and Jian-Jia Chen. Bucket of Ignorance: A Hybrid Data Structure for Timing Mechanism in Real-Time Operating Systems. In 26th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Brief Presentations Track (BP) Virtual Conference (accepted for presentation), April 2020 [BibTeX][Link][Abstract]@inproceedings { ebbrecht2020timer,
author = {Ebbrecht, Marcel and Chen, Kuan-Hsun and Chen, Jian-Jia},
title = {Bucket of Ignorance: A Hybrid Data Structure for Timing Mechanism in Real-Time Operating Systems},
booktitle = {26th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Brief Presentations Track (BP)},
year = {2020},
address = {Virtual Conference (accepted for presentation)},
month = {April},
url = {https://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/ebrrechttimer.pdf},
keywords = {kuan},
confidential = {n},
abstract = {To maintain deterministic timing behaviors, RealTime Operating Systems (RTOSes) require not only a task scheduler but also a timing mechanism for the periodicity of recurrent tasks. Most of existing open-source RTOSes implement either a tree-based or a list-based mechanism to track which task is ready to release on-the-fly. Although tree-based mechanisms are known to be efficient in time complexity for searching operations, the additional effort processing removals and insertions are also not negligible, which may countervail the advantage, compared to list-based timer-managers, even with small task sets. In this work, we provide a simulation framework, which is ready to be released, to investigate existing timing mechanisms and analyze how do they perform under certain conditions. Throughout extensive simulations, we show that our proposed solution indeed requires less computation effort than conventional timing mechanisms when the size of task set is in the range of 16 to 208. },
} To maintain deterministic timing behaviors, RealTime Operating Systems (RTOSes) require not only a task scheduler but also a timing mechanism for the periodicity of recurrent tasks. Most of existing open-source RTOSes implement either a tree-based or a list-based mechanism to track which task is ready to release on-the-fly. Although tree-based mechanisms are known to be efficient in time complexity for searching operations, the additional effort processing removals and insertions are also not negligible, which may countervail the advantage, compared to list-based timer-managers, even with small task sets. In this work, we provide a simulation framework, which is ready to be released, to investigate existing timing mechanisms and analyze how do they perform under certain conditions. Throughout extensive simulations, we show that our proposed solution indeed requires less computation effort than conventional timing mechanisms when the size of task set is in the range of 16 to 208.
|