| Helena Kotthaus, Michel Lang, Jörg Rahnenführer and Peter Marwedel. Runtime and Memory Consumption Analyses for Machine Learning R Programs. In Abstracts 45. Arbeitstagung, Ulmer Informatik-Berichte, pages 3-4 June 2013 [BibTeX][PDF]@inproceedings { kotthaus/2013a,
author = {Kotthaus, Helena and Lang, Michel and Rahnenf{\"u}hrer, J{\"o}rg and Marwedel, Peter},
title = {Runtime and Memory Consumption Analyses for Machine Learning R Programs},
booktitle = {Abstracts 45. Arbeitstagung, Ulmer Informatik-Berichte},
year = {2013},
pages = {3-4},
month = {jun},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/kotthaus_etal_2013a.pdf},
confidential = {n},
} |
| Daniel Cordes, Michael Engel, Olaf Neugebauer and Peter Marwedel. Automatic Extraction of Task-Level Parallelism for Heterogeneous MPSoCs. In Proceedings of the Fourth International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2013) Lyon, France, October 2013 [BibTeX][PDF][Abstract]@inproceedings { Cordes:2013:PSTI,
author = {Cordes, Daniel and Engel, Michael and Neugebauer, Olaf and Marwedel, Peter},
title = {Automatic Extraction of Task-Level Parallelism for Heterogeneous MPSoCs},
booktitle = {Proceedings of the Fourth International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2013)},
year = {2013},
series = {PSTI 2013},
address = {Lyon, France},
month = {oct},
keywords = {automatic parallelization; embedded software; heterogeneity; mpsoc; integer linear programming; task-level parallelism},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2013-psti-cordes.pdf},
confidential = {n},
abstract = {Heterogeneous multi-core platforms are increasingly attractive for embedded applications due to their adaptability and efficiency. This proliferation of heterogeneity demands new approaches for extracting thread level parallelism from sequential applications which have to be efficient at runtime. We present, to the best of our knowledge, the first Integer Linear Programming (ILP)-based parallelization approach for heterogeneous multi-core platforms. Using Hierarchical Task Graphs and high-level timing models, our approach manages to balance the extracted tasks while considering performance differences between cores. As a result, we obtain considerable speedups at runtime, significantly outperforming tools for homogeneous systems. We evaluate our approach by parallelizing standard benchmarks from various application domains.},
} Heterogeneous multi-core platforms are increasingly attractive for embedded applications due to their adaptability and efficiency. This proliferation of heterogeneity demands new approaches for extracting thread level parallelism from sequential applications which have to be efficient at runtime. We present, to the best of our knowledge, the first Integer Linear Programming (ILP)-based parallelization approach for heterogeneous multi-core platforms. Using Hierarchical Task Graphs and high-level timing models, our approach manages to balance the extracted tasks while considering performance differences between cores. As a result, we obtain considerable speedups at runtime, significantly outperforming tools for homogeneous systems. We evaluate our approach by parallelizing standard benchmarks from various application domains.
|
| Daniel Cordes, Michael Engel, Olaf Neugebauer and Peter Marwedel. Automatic Extraction of Pipeline Parallelism for Embedded Heterogeneous Multi-Core Platforms. In Proceedings of the Sixteenth International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES 2013) Montreal, Canada, October 2013 [BibTeX][PDF][Abstract]@inproceedings { Cordes:2013:CASES,
author = {Cordes, Daniel and Engel, Michael and Neugebauer, Olaf and Marwedel, Peter},
title = {Automatic Extraction of Pipeline Parallelism for Embedded Heterogeneous Multi-Core Platforms},
booktitle = {Proceedings of the Sixteenth International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES 2013)},
year = {2013},
series = {CASES 2013},
address = {Montreal, Canada},
month = {oct},
keywords = {Automatic Parallelization; Heterogeneity; MPSoC; Embedded Software; Integer Linear Programming; Pipeline},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2013-cases-cordes.pdf},
confidential = {n},
abstract = {Automatic parallelization of sequential applications is the key for efficient use and optimization of current and future embedded multi-core systems. However, existing approaches often fail to achieve efficient balancing of tasks running on heterogeneous cores of an MPSoC. A reason for this is often insufficient knowledge of the underlying architecture's performance.
In this paper, we present a novel parallelization approach for embedded MPSoCs that combines pipeline parallelization for loops with knowledge about different execution times for tasks on cores with different performance properties. Using Integer Linear Programming, an optimal solution with respect to the model used is derived implementing tasks with a well-balanced execution behavior. We evaluate our pipeline parallelization approach for heterogeneous MPSoCs using a set of standard embedded benchmarks and compare it with two existing state-of-the-art approaches. For all benchmarks, our parallelization approach obtains significantly higher speedups than either approach on heterogeneous MPSoCs.
},
} Automatic parallelization of sequential applications is the key for efficient use and optimization of current and future embedded multi-core systems. However, existing approaches often fail to achieve efficient balancing of tasks running on heterogeneous cores of an MPSoC. A reason for this is often insufficient knowledge of the underlying architecture's performance.
In this paper, we present a novel parallelization approach for embedded MPSoCs that combines pipeline parallelization for loops with knowledge about different execution times for tasks on cores with different performance properties. Using Integer Linear Programming, an optimal solution with respect to the model used is derived implementing tasks with a well-balanced execution behavior. We evaluate our pipeline parallelization approach for heterogeneous MPSoCs using a set of standard embedded benchmarks and compare it with two existing state-of-the-art approaches. For all benchmarks, our parallelization approach obtains significantly higher speedups than either approach on heterogeneous MPSoCs.
|
| Andreas Heinig, Ingo Korb, Florian Schmoll, Peter Marwedel and Michael Engel. Fast and Low-Cost Instruction-Aware Fault Injection. In Proc. of SOBRES 2013 2013 [BibTeX][Link][Abstract]@inproceedings { heinig:2013:sobres,
author = {Heinig, Andreas and Korb, Ingo and Schmoll, Florian and Marwedel, Peter and Engel, Michael},
title = {Fast and Low-Cost Instruction-Aware Fault Injection},
booktitle = {Proc. of SOBRES 2013},
year = {2013},
url = {http://danceos.org/sobres/2013/papers/SOBRES-640-Heinig.pdf},
keywords = {ders},
confidential = {n},
abstract = {In order to assess the robustness of software-based fault-tolerance methods, extensive tests have to be performed that inject faults, such as bit flips, into hardware components of a running system. Fault injection commonly uses either system simulations, resulting in execution times orders of magnitude longer than on real systems, or exposes a real system to error sources like radiation. This can take place in real time, but it enables only a very coarse-grained control over the affected system component.
A solution combining the best characteristics from both approaches should achieve precise fault injection in real hardware systems. The approach presented in this paper uses the JTAG background debug facility of a CPU to inject faults into main memory and registers of a running system. Compared to similar earlier approaches, our solution is able to achieve rapid fault injection using a low-cost microcontroller instead of a complex FPGA. Consequently, our injection software is much more flexible. It allows to restrict error injection to the execution of a set of predefined components, resulting in a more precise control of the injection, and also emulates error reporting, which enables the evaluation of different error detection approaches in addition to robustness evaluation.},
} In order to assess the robustness of software-based fault-tolerance methods, extensive tests have to be performed that inject faults, such as bit flips, into hardware components of a running system. Fault injection commonly uses either system simulations, resulting in execution times orders of magnitude longer than on real systems, or exposes a real system to error sources like radiation. This can take place in real time, but it enables only a very coarse-grained control over the affected system component.
A solution combining the best characteristics from both approaches should achieve precise fault injection in real hardware systems. The approach presented in this paper uses the JTAG background debug facility of a CPU to inject faults into main memory and registers of a running system. Compared to similar earlier approaches, our solution is able to achieve rapid fault injection using a low-cost microcontroller instead of a complex FPGA. Consequently, our injection software is much more flexible. It allows to restrict error injection to the execution of a set of predefined components, resulting in a more precise control of the injection, and also emulates error reporting, which enables the evaluation of different error detection approaches in addition to robustness evaluation.
|
| Daniel Cordes, Michael Engel, Olaf Neugebauer and Peter Marwedel. Automatic Extraction of Multi-Objective Aware Parallelism for Heterogeneous MPSoCs. In Proceedings of the Sixth International Workshop on Multi-/Many-core Computing Systems (MuCoCoS 2013) Edinburgh, Scotland, UK, September 2013 [BibTeX][PDF][Abstract]@inproceedings { Cordes:2013:MUCOCOS,
author = {Cordes, Daniel and Engel, Michael and Neugebauer, Olaf and Marwedel, Peter},
title = {Automatic Extraction of Multi-Objective Aware Parallelism for Heterogeneous MPSoCs},
booktitle = {Proceedings of the Sixth International Workshop on Multi-/Many-core Computing Systems (MuCoCoS 2013)},
year = {2013},
series = {MuCoCoS 2013},
address = {Edinburgh, Scotland, UK},
month = {sep},
keywords = {automatic parallelization; embedded software; heterogeneity; mpsoc; genetic algorithms; task-level parallelism; pipeline parallelism; multi-objective},
file = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2013-mucocos-cordes.pdf},
confidential = {n},
abstract = {Heterogeneous MPSoCs are used in a large fraction of current embedded systems. In order to efficiently exploit the available processing power, advanced parallelization techniques are required. In addition to consider performance variances between heterogeneous cores, these methods have to be multi-objective aware to be useful for resource restricted embedded systems. This multi-objective optimization requirement results in an explosion of the design space size. As a consequence, efficient approaches are required to find promising solution candidates. In this paper, we present the first portable genetic algorithm-based approach to speed up ANSI-C applications by combining extraction techniques for task-level and pipeline parallelism for heterogeneous multicores while considering additional objectives.
Using our approach enables embedded system designers to select a parallelization of an application from a set of Pareto-optimal solutions according to the performance and energy consumption requirements of a given system. The evaluation of a large set of typical embedded benchmarks shows that our approach is able to generate solutions with low energy consumption, high speedup, low communication overhead or useful trade-offs between these three objectives.},
} Heterogeneous MPSoCs are used in a large fraction of current embedded systems. In order to efficiently exploit the available processing power, advanced parallelization techniques are required. In addition to consider performance variances between heterogeneous cores, these methods have to be multi-objective aware to be useful for resource restricted embedded systems. This multi-objective optimization requirement results in an explosion of the design space size. As a consequence, efficient approaches are required to find promising solution candidates. In this paper, we present the first portable genetic algorithm-based approach to speed up ANSI-C applications by combining extraction techniques for task-level and pipeline parallelism for heterogeneous multicores while considering additional objectives.
Using our approach enables embedded system designers to select a parallelization of an application from a set of Pareto-optimal solutions according to the performance and energy consumption requirements of a given system. The evaluation of a large set of typical embedded benchmarks shows that our approach is able to generate solutions with low energy consumption, high speedup, low communication overhead or useful trade-offs between these three objectives.
|
| Jan Kleinsorge, Heiko Falk and Peter Marwedel. Simple Analysis of Partial Worst-case Execution Paths on General Control Flow Graphs. In Proceedings of the International Conference on Embedded Software (EMSOFT 2013) Montreal, Canada, October 2013 [BibTeX][Link]@inproceedings { Kleinsorge:2013:EMSOFT,
author = {Kleinsorge, Jan and Falk, Heiko and Marwedel, Peter},
title = {Simple Analysis of Partial Worst-case Execution Paths on General Control Flow Graphs},
booktitle = {Proceedings of the International Conference on Embedded Software (EMSOFT 2013)},
year = {2013},
series = {EMSOFT 2013},
address = {Montreal, Canada},
month = {oct},
url = {http://ls12-www.cs.tu-dortmund.de/daes/media/documents/publications/downloads/2013_emsoft.pdf},
keywords = {wcet; Worst-case Execution Time; Path Analysis; Static Analysis},
confidential = {n},
} |
| Timon Kelter, Tim Harde, Peter Marwedel and Heiko Falk. Evaluation of resource arbitration methods for multi-core real-time systems. In Proceedings of the 13th International Workshop on Worst-Case Execution Time Analysis (WCET) Paris, France, July 2013 [BibTeX][PDF][Link][Abstract]@inproceedings { kelter:2013:wcet,
author = {Kelter, Timon and Harde, Tim and Marwedel, Peter and Falk, Heiko},
title = {Evaluation of resource arbitration methods for multi-core real-time systems},
booktitle = {Proceedings of the 13th International Workshop on Worst-Case Execution Time Analysis (WCET)},
year = {2013},
editor = {Claire Maiza},
address = {Paris, France},
month = {July},
url = {http://wcet2013.imag.fr/},
keywords = {wcet},
file = {http://drops.dagstuhl.de/opus/volltexte/2013/4117/pdf/2.pdf},
confidential = {n},
abstract = {Multi-core systems have become prevalent in the last years, because of their favorable properties in terms of energy consumption, computing power and design complexity. First attempts have been made to devise WCET analyses for multi-core processors, which have to deal with the problem that the cores may experience interferences during accesses to shared resources. To limit these interferences, the vast amount of previous work is proposing a strict TDMA (time division multiple access) schedule for arbitrating shared resources. Though this type of arbitration yields a high predictability, this advantage is paid for with a poor resource utilization. In this work, we compare different arbitration methods with respect to their predictability and average case performance. We show how known WCET analysis techniques can be extended to work with the presented arbitration strategies and perform an evaluation of the resulting ACETs and WCETs on an extensive set of realworld benchmarks. Results show that there are cases when TDMA is not the best strategy, especially when predictability and performance are equally important.},
} Multi-core systems have become prevalent in the last years, because of their favorable properties in terms of energy consumption, computing power and design complexity. First attempts have been made to devise WCET analyses for multi-core processors, which have to deal with the problem that the cores may experience interferences during accesses to shared resources. To limit these interferences, the vast amount of previous work is proposing a strict TDMA (time division multiple access) schedule for arbitrating shared resources. Though this type of arbitration yields a high predictability, this advantage is paid for with a poor resource utilization. In this work, we compare different arbitration methods with respect to their predictability and average case performance. We show how known WCET analysis techniques can be extended to work with the presented arbitration strategies and perform an evaluation of the resulting ACETs and WCETs on an extensive set of realworld benchmarks. Results show that there are cases when TDMA is not the best strategy, especially when predictability and performance are equally important.
|