| Daniel Cordes and Peter Marwedel. Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms. In Proceedings of Design, Automation and Test in Europe (DATE 2012) Dresden, Germany, March 2012 [BibTeX][PDF][Abstract]@inproceedings { cordes:12:date,
author = {Cordes, Daniel and Marwedel, Peter},
title = {Multi-Objective Aware Extraction of Task-Level Parallelism Using Genetic Algorithms},
booktitle = {Proceedings of Design, Automation and Test in Europe (DATE 2012)},
year = {2012},
address = {Dresden, Germany},
month = {mar},
keywords = {Automatic Parallelization, Embedded Software, Multi-Objective, Genetic Algorithms, Task-Level Parallelism, Energy awareness},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2012-date-cordes.pdf},
confidential = {n},
abstract = {A large amount of research work has been done in the area of automatic parallelization for decades, resulting in a huge amount of tools, which should relieve the designer from the burden of manually parallelizing an application. Unfortunately, most of these tools are only optimizing the execution time by splitting up applications into concurrently executed tasks. In the domain of embedded devices, however, it is not sufficient to look only at this criterion. Since most of these devices are constraint-driven regarding execution time, energy consumption, heat dissipation and other objectives, a good trade-off has to be found to efficiently map applications to multiprocessor system on chip (MPSoC) devices. Therefore, we developed a fully automated multi-objective aware parallelization framework, which optimizes different objectives at the same time. The tool returns a Pareto-optimal front of solutions of the parallelized application to the designer, so that the solution with the best trade-off can be chosen.},
} A large amount of research work has been done in the area of automatic parallelization for decades, resulting in a huge amount of tools, which should relieve the designer from the burden of manually parallelizing an application. Unfortunately, most of these tools are only optimizing the execution time by splitting up applications into concurrently executed tasks. In the domain of embedded devices, however, it is not sufficient to look only at this criterion. Since most of these devices are constraint-driven regarding execution time, energy consumption, heat dissipation and other objectives, a good trade-off has to be found to efficiently map applications to multiprocessor system on chip (MPSoC) devices. Therefore, we developed a fully automated multi-objective aware parallelization framework, which optimizes different objectives at the same time. The tool returns a Pareto-optimal front of solutions of the parallelized application to the designer, so that the solution with the best trade-off can be chosen.
|
| Olivera Jovanovic, Nils Kneuper, Peter Marwedel and Michael Engel. ILP-based Memory-Aware Mapping Optimization for MPSoCs. In The 10th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing Paphos, Cyprus, December 2012 [BibTeX][PDF][Abstract]@inproceedings { jovanovic:2012b,
author = {Jovanovic, Olivera and Kneuper, Nils and Marwedel, Peter and Engel, Michael},
title = {ILP-based Memory-Aware Mapping Optimization for MPSoCs},
booktitle = {The 10th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing},
year = {2012},
address = {Paphos, Cyprus},
month = {December},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2012-cse-jovanovic.pdf},
confidential = {n},
abstract = {The mapping of applications onto multiprocessor system-on-chip (MPSoC) devices is an important and complex optimization task. The goal is to efficiently distribute application tasks to available processors while optimizing for energy or runtime. Unfortunately, the influence of memories or memory hierarchies is not considered in existing mapping optimizations so far, even though it is a well-known fact that memories have a drastic impact on runtime and energy consumption of the system.
In this paper, we address the challenge of finding an efficient application to MPSoC mapping while explicitly considering the
underlying memory subsystem and an efficient mapping of task’s memory objects to memories. For this purpose, we developed a memory-aware mapping tool based on ILP optimization. Evaluations on various benchmarks show that our memory-aware mapping tool outperforms state-of-the-art mapping optimizations by reducing the runtime up to 18%, and energy consumption up to 21%.},
} The mapping of applications onto multiprocessor system-on-chip (MPSoC) devices is an important and complex optimization task. The goal is to efficiently distribute application tasks to available processors while optimizing for energy or runtime. Unfortunately, the influence of memories or memory hierarchies is not considered in existing mapping optimizations so far, even though it is a well-known fact that memories have a drastic impact on runtime and energy consumption of the system.
In this paper, we address the challenge of finding an efficient application to MPSoC mapping while explicitly considering the
underlying memory subsystem and an efficient mapping of task’s memory objects to memories. For this purpose, we developed a memory-aware mapping tool based on ILP optimization. Evaluations on various benchmarks show that our memory-aware mapping tool outperforms state-of-the-art mapping optimizations by reducing the runtime up to 18%, and energy consumption up to 21%.
|
| Sascha Plazar, Jan Kleinsorge, Heiko Falk and Peter Marwedel. WCET-aware Static Locking of Instruction Caches. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 44-52 San Jose, CA, USA, April 2012 [BibTeX][Link][Abstract]@inproceedings { plazar:2012:cgo,
author = {Plazar, Sascha and Kleinsorge, Jan and Falk, Heiko and Marwedel, Peter},
title = {WCET-aware Static Locking of Instruction Caches},
booktitle = {Proceedings of the International Symposium on Code Generation and Optimization (CGO)},
year = {2012},
pages = {44-52},
address = {San Jose, CA, USA},
month = {apr},
url = {http://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.050/profile/profil_hfalk/publications/20120402-cgo-plazar.pdf},
keywords = {wcet},
confidential = {n},
abstract = {In the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches. This step was necessary in order to bridge the increasingly growing gap between processor and memory system performance. Static analysis techniques had to be developed to allow the estimation of the cache behavior and an upper bound of the execution time of a program. This bound is called worst-case execution time (WCET). Its knowledge is crucial to verify whether hard real-time systems satisfy their timing constraints, and the WCET is a key parameter for the design of embedded systems.
In this paper, we propose a WCET-aware optimization technique for static I-cache locking which improves a program’s performance and predictability. To select the memory blocks to lock into the cache and avoid time consuming repetitive WCET analyses, we developed a new algorithm employing integer-linear programming (ILP). The ILP models the worst-case execution path (WCEP) of a program and takes the influence of locked cache contents into account. By modeling the effect of locked memory blocks on the runtime of basic blocks, the overall WCET of a program can be minimized. We show that our optimization is able to reduce the WCET of real-life benchmarks by up to 40.8%. At the same time, our proposed approach is able to outperform a regular cache by up to 23.8% in terms of WCET.},
} In the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches. This step was necessary in order to bridge the increasingly growing gap between processor and memory system performance. Static analysis techniques had to be developed to allow the estimation of the cache behavior and an upper bound of the execution time of a program. This bound is called worst-case execution time (WCET). Its knowledge is crucial to verify whether hard real-time systems satisfy their timing constraints, and the WCET is a key parameter for the design of embedded systems.
In this paper, we propose a WCET-aware optimization technique for static I-cache locking which improves a program’s performance and predictability. To select the memory blocks to lock into the cache and avoid time consuming repetitive WCET analyses, we developed a new algorithm employing integer-linear programming (ILP). The ILP models the worst-case execution path (WCEP) of a program and takes the influence of locked cache contents into account. By modeling the effect of locked memory blocks on the runtime of basic blocks, the overall WCET of a program can be minimized. We show that our optimization is able to reduce the WCET of real-life benchmarks by up to 40.8%. At the same time, our proposed approach is able to outperform a regular cache by up to 23.8% in terms of WCET.
|
| Andreas Heinig, Vincent J. Mooney, Florian Schmoll, Peter Marwedel, Krishna Palem and Michael Engel. Classification-based Improvement of Application Robustness and Quality of Service in Probabilistic Computer Systems. In Proceedings of ARCS 2012 - International Conference on Architecture of Computing Systems Munich, Germany, March 2012, -- ARCS 2012 Best Paper Award Winner -- [BibTeX][PDF][Abstract]@inproceedings { heinig:2012:arcs,
author = {Heinig, Andreas and Mooney, Vincent J. and Schmoll, Florian and Marwedel, Peter and Palem, Krishna and Engel, Michael},
title = {Classification-based Improvement of Application Robustness and Quality of Service in Probabilistic Computer Systems},
booktitle = {Proceedings of ARCS 2012 - International Conference on Architecture of Computing Systems},
year = {2012},
address = {Munich, Germany},
month = {mar},
note = {-- ARCS 2012 Best Paper Award Winner --},
keywords = {ders},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2012-arcs-heinig.pdf},
confidential = {n},
abstract = {Future semiconductors no longer guarantee permanent de- terministic operation. They are expected to show probabilistic behavior due to lowered voltages and shrinking structures. Compared to radiation-induced errors, probabilistic systems face increa- sed error frequencies leading to unexpected bit-flips. Approaches like probabilistic CMOS provide methods to control error distributions which reduce the error probability in more significant bits. However, instruc- tions handling control flow or pointers still require determinism, requiring a classification to identify these instructions.
We apply our transient error classification to probabilistic circuits using differing voltage distributions. Static analysis ensures that probabilistic effects only affect unreliable operations which accept a certain level of impreciseness, and that errors in probabilistic components will never propagate to critical operations.
To evaluate, we analyze robustness and quality-of-service of an H.264 video decoder. Using classification results, we map unreliable arithmetic operations onto probabilistic components of an MPARM model, while remaining operations use deterministic components.},
} Future semiconductors no longer guarantee permanent de- terministic operation. They are expected to show probabilistic behavior due to lowered voltages and shrinking structures. Compared to radiation-induced errors, probabilistic systems face increa- sed error frequencies leading to unexpected bit-flips. Approaches like probabilistic CMOS provide methods to control error distributions which reduce the error probability in more significant bits. However, instruc- tions handling control flow or pointers still require determinism, requiring a classification to identify these instructions.
We apply our transient error classification to probabilistic circuits using differing voltage distributions. Static analysis ensures that probabilistic effects only affect unreliable operations which accept a certain level of impreciseness, and that errors in probabilistic components will never propagate to critical operations.
To evaluate, we analyze robustness and quality-of-service of an H.264 video decoder. Using classification results, we map unreliable arithmetic operations onto probabilistic components of an MPARM model, while remaining operations use deterministic components.
|
| Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury, Timon Kelter, Heiko Falk and Peter Marwedel. A Unified WCET Analysis Framework for Multi-core Platforms. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 99-108 Beijing, China, April 2012 [BibTeX][PDF][Link][Abstract]@inproceedings { kelter:2012:rtas,
author = {Chattopadhyay, Sudipta and Kee, Chong Lee and Roychoudhury, Abhik and Kelter, Timon and Falk, Heiko and Marwedel, Peter},
title = {A Unified WCET Analysis Framework for Multi-core Platforms},
booktitle = {IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)},
year = {2012},
pages = {99-108},
address = {Beijing, China},
month = {April},
url = {http://www.rtas.org/12-home.htm},
keywords = {wcet},
file = {http://www.comp.nus.edu.sg/~sudiptac/papers/mxtiming.pdf},
confidential = {n},
abstract = {With the advent of multi-core architectures, worst case execution time (WCET) analysis has become an increasingly difficult problem. In this paper, we propose a unified WCET analysis framework for multi-core processors featuring both shared cache and shared bus. Compared to other previous works, our work differs by modeling the interaction of shared cache and shared bus with other basic micro-architectural components (e.g. pipeline and branch predictor). In addition, our framework does not assume a timing anomaly free multicore architecture for computing the WCET. A detailed experiment methodology suggests that we can obtain reasonably tight WCET estimates in a wide range of benchmark programs.},
} With the advent of multi-core architectures, worst case execution time (WCET) analysis has become an increasingly difficult problem. In this paper, we propose a unified WCET analysis framework for multi-core processors featuring both shared cache and shared bus. Compared to other previous works, our work differs by modeling the interaction of shared cache and shared bus with other basic micro-architectural components (e.g. pipeline and branch predictor). In addition, our framework does not assume a timing anomaly free multicore architecture for computing the WCET. A detailed experiment methodology suggests that we can obtain reasonably tight WCET estimates in a wide range of benchmark programs.
|
| Helena Kotthaus, Sascha Plazar and Peter Marwedel. A JVM-based Compiler Strategy for the R Language. In Abstract Booklet at The 8th International R User Conference (UseR!) WiP, pages 68 Nashville, Tennessee, USA, June 2012 [BibTeX][PDF]@inproceedings { kotthaus:12:user,
author = {Kotthaus, Helena and Plazar, Sascha and Marwedel, Peter},
title = {A JVM-based Compiler Strategy for the R Language},
booktitle = {Abstract Booklet at The 8th International R User Conference (UseR!) WiP},
year = {2012},
pages = {68},
address = {Nashville, Tennessee, USA},
month = {jun},
keywords = {R language, Java, dynamic compiler optimization},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2012-user-kotthaus.pdf},
confidential = {n},
} |
| Daniel Cordes, Michael Engel, Peter Marwedel and Olaf Neugebauer. Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms. In Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis Tampere, Finland, October 2012 [BibTeX][PDF][Abstract]@inproceedings { Cordes:2012:CODES,
author = {Cordes, Daniel and Engel, Michael and Marwedel, Peter and Neugebauer, Olaf},
title = {Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms},
booktitle = {Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis},
year = {2012},
series = {CODES+ISSS '12},
address = {Tampere, Finland},
month = {oct},
publisher = {ACM},
keywords = {automatic parallelization, embedded software, energy, genetic algorithms, multi-objective, pipeline parallelism},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2012-codes-cordes.pdf},
confidential = {n},
abstract = {The development of automatic parallelization techniques has been fascinating researchers for decades. This has resulted in a significant amount of tools, which should relieve the designer from the burden of manually parallelizing an application. However, most of these tools only focus on minimizing execution time which drastically reduces their applicability to embedded devices. It is essential to find good trade-offs between different objectives like, e.g., execution time, energy consumption, or communication overhead, if applications should be parallelized for embedded multiprocessor system-on-chip (MPSoC) devices. Another important aspect which has to be taken into account is the streaming-based structure found in many embedded applications such as multimedia and network services. The best way to parallelize these applications is to extract pipeline parallelism. Therefore, this paper presents the first multi-objective aware approach exploiting pipeline parallelism automatically to make it most suitable for resource-restricted embedded devices. We have compared the new pipeline parallelization approach to an existing task-level extraction technique. The evaluation has shown that the new approach extracts very efficient multi-objective aware parallelism. In addition, the two approaches have been combined and it could be shown that both approaches perfectly complement each other.},
} The development of automatic parallelization techniques has been fascinating researchers for decades. This has resulted in a significant amount of tools, which should relieve the designer from the burden of manually parallelizing an application. However, most of these tools only focus on minimizing execution time which drastically reduces their applicability to embedded devices. It is essential to find good trade-offs between different objectives like, e.g., execution time, energy consumption, or communication overhead, if applications should be parallelized for embedded multiprocessor system-on-chip (MPSoC) devices. Another important aspect which has to be taken into account is the streaming-based structure found in many embedded applications such as multimedia and network services. The best way to parallelize these applications is to extract pipeline parallelism. Therefore, this paper presents the first multi-objective aware approach exploiting pipeline parallelism automatically to make it most suitable for resource-restricted embedded devices. We have compared the new pipeline parallelization approach to an existing task-level extraction technique. The evaluation has shown that the new approach extracts very efficient multi-objective aware parallelism. In addition, the two approaches have been combined and it could be shown that both approaches perfectly complement each other.
|
| Peter Marwedel and Michael Engel. Efficient Computing in Cyber-Physical Systems. In Proceedings of SAMOS XII July 2012 [BibTeX][PDF][Abstract]@inproceedings { marwedel:2012:samos,
author = {Marwedel, Peter and Engel, Michael},
title = {Efficient Computing in Cyber-Physical Systems},
booktitle = {Proceedings of SAMOS XII},
year = {2012},
month = {jul},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2012-samos-marwedel.pdf},
confidential = {n},
abstract = {Computing in cyber-physical systems has to be efficient in terms of a number of objectives. In particular, computing has to be execution-time and energy efficient. In this paper, we will consider optimization techniques aiming at efficiency in terms of these two objectives. In the first part, we will consider techniques for the integration of compilers and worst-case execution time (WCET) estimation. We will demonstrate, how such integration opens the door to WCET-reduction algorithms. For example, an algorithm for WCET-aware compilation reduces the WCET for an automotive application by more than 50\% by exploiting scratch pad memories (SPMs).
In the second part, we will demonstrate techniques for improving the energy efficiency of cyber-physical systems, in particular the use of SPMs. In the third part, we demonstrate how the optimization for multiple objectives taken into account. This paper provides an overview of work performed at the Chair for Embedded Systems of TU Dortmund and the Informatik Centrum Dortmund, Germany.},
} Computing in cyber-physical systems has to be efficient in terms of a number of objectives. In particular, computing has to be execution-time and energy efficient. In this paper, we will consider optimization techniques aiming at efficiency in terms of these two objectives. In the first part, we will consider techniques for the integration of compilers and worst-case execution time (WCET) estimation. We will demonstrate, how such integration opens the door to WCET-reduction algorithms. For example, an algorithm for WCET-aware compilation reduces the WCET for an automotive application by more than 50% by exploiting scratch pad memories (SPMs).
In the second part, we will demonstrate techniques for improving the energy efficiency of cyber-physical systems, in particular the use of SPMs. In the third part, we demonstrate how the optimization for multiple objectives taken into account. This paper provides an overview of work performed at the Chair for Embedded Systems of TU Dortmund and the Informatik Centrum Dortmund, Germany.
|
| Olivera Jovanovic, Peter Marwedel, Iuliana Bacivarov and Lothar Thiele. MAMOT: Memory-Aware Mapping Optimization Tool for MPSoC. In 15th Euromicro Conference on Digital System Design (DSD 2012) Izmir, Turkey, September 2012 [BibTeX]@inproceedings { Jovanovic/etal/2012a,
author = {Jovanovic, Olivera and Marwedel, Peter and Bacivarov, Iuliana and Thiele, Lothar},
title = {MAMOT: Memory-Aware Mapping Optimization Tool for MPSoC},
booktitle = {15th Euromicro Conference on Digital System Design (DSD 2012)},
year = {2012},
address = {Izmir, Turkey},
month = {September},
confidential = {n},
} |
| Michael Engel and Peter Marwedel. Semantic Gaps in Software-Based Reliability. In Proceedings of the 4th Workshop on Design for Reliability (DFR'12) Paris, France, January 2012 [BibTeX][Abstract]@inproceedings { engel:dfr:2012,
author = {Engel, Michael and Marwedel, Peter},
title = {Semantic Gaps in Software-Based Reliability},
booktitle = {Proceedings of the 4th Workshop on Design for Reliability (DFR'12)},
year = {2012},
address = {Paris, France},
month = {jan},
organization = {HiPEAC},
keywords = {ders},
confidential = {n},
abstract = {Future semiconductors will show a heterogeneous distribution of permanent faults as a result of fabrication variations and aging. To increase yields and lifetimes of these chips, a fault tolerance approach is required that handles resources on a small-scale basis with low overhead. In embedded systems, this overhead can be reduced by classifying data and instructions to determine the varying impact of errors on different instructions and data. Using this classification, only errors with significant impact on system behavior have to be corrected.
In this position paper, we describe one problem with this analysis, the semantic gap between high-level language source code and the low-level data flow through architecture components. In addition, we discuss possible approaches to handle this gap. Of special interest are the implications on achieving reliable execution of dependability-critical code.
},
} Future semiconductors will show a heterogeneous distribution of permanent faults as a result of fabrication variations and aging. To increase yields and lifetimes of these chips, a fault tolerance approach is required that handles resources on a small-scale basis with low overhead. In embedded systems, this overhead can be reduced by classifying data and instructions to determine the varying impact of errors on different instructions and data. Using this classification, only errors with significant impact on system behavior have to be corrected.
In this position paper, we describe one problem with this analysis, the semantic gap between high-level language source code and the low-level data flow through architecture components. In addition, we discuss possible approaches to handle this gap. Of special interest are the implications on achieving reliable execution of dependability-critical code.
|