You are here:

Home Research Research Seminar



Research Seminar

General Information

We regularly organize a seminar about the research topics that we are working on. Everyone is welcome to join.

The schedule for the next meetings and summaries of the old meetings are given below.

The seminar takes place in OH16/E18 on Monday at 14:15 bi-weekly. (starting from 02.09.19)


The default sequence is as follows:

Mojtaba Masoudinejad, Noura Sleibi, Ching-Chi Lin, Niklas Ueter, Junjie Shi, Kuan-Hsun Chen, Jian-Jia Chen, Mario Günzel, Christian Hakert, Yuenfeng Huang, Mikail Yayla


When there is an exceptional event, e.g., rehearsal talk, the person who does it will be automatically added to the end of the default sequence by FIFO policy.

*If there is more than one exceptional event for the same person in the same round, the above rule is only triggered once.

NEWS: Please find the updates of our research seminar information in:  https://daes.cs.tu-dortmund.de/research-1/research-seminar/


Date Presenter Topic Reference Abstract
29.03.21 Mikail Yayla
15.03.21 Yunfeng Huang
01.03.21 Christian Hakert
15.02.21 Mario Guenzel
08.02.21 Jian-Jia Chen
18.01.21 Kuan-Hsun Chen
04.01.21 Junjie Shi  The connection between classical jobshop schedule and real-time resource synchronization
14.12.20 Niklas Ueter An interesting problem with respect to real-time scheduling under uncertainties
30.11.20 Ching-Chi (Simon) Lin Energy-Efficient Parallel Real-Time Tasks Scheduling on Asymmetric Multi-cores
16.11.20 Noura Sleibi Introduction of the research plan
06.11.20 Yaswanth (UT Dallas) LINTS^RT: A Learning-driven Testbed for Intelligent Scheduling in Embedded Systems Paper
02.11.20 Mojtaba Masoudinejad Open-Loop Dynamic Modeling of Low-Budget Batteries with Low-Power Loads
19.10.20 Mikail Yayla  Bit error tolerance of BNNs   
12.10.20 Yunfeng Huang PassengerFlows: A Correlation-based Passenger Estimator in Automated Public Transport Human mobility information is widely needed by many sectors in smart cities, especially for public transport. This work designs lightweight algorithms to perform stop detection, passenger flow tracking, and passenger estimation in an automated train. The on-board learning and detection algorithms are running on an edge node that is integrated with Wi-Fi sniffing technology, a GPS sensor, and an inertial measurement unit (IMU) sensor. The stop detection algorithm determines if the automated train is static at which train station based on GPS and IMU data. When the train is moving, the correlations between statistical properties extracted from Wi-Fi probes and the actual number of passengers change. Therefore, two algorithms, passenger flow tracking and passenger estimation, are designed to analyze passenger mobility. The passenger flow tracking algorithm analyzes the number of incoming and outgoing passengers in the train. The passenger estimation algorithm approximates the number of passengers inside the train based on a multidimensional regression model created by statistical properties extracted from different device brands. The designed prototype is deployed in an automated hanging train to conduct real-world experiments. The experimental results indicate that the proposed passenger flow tracking algorithm reduces the average errors of 62.5% and 70% compared against two existing clustering algorithms respectively. When the devices’ brands are split for creating a regression model, compared with two counting-based approaches, the proposed passenger estimation algorithm results in lower errors with a mean of 3.15 and a variance of 9.29.
07.09.20 Qiao Yu Using a Set of Triangle Inequalities toAccelerate K-means Clustering  Rehearsal of SISAP 
24.08.20 Niklas Ueter Simultaneous Progressing Switching Protocols for Timing Predictable Real-Time Network-on-Chips Video Presentation of RTCSA'20
17.08.20 Christian Hakert Split’n Trace NVM: Leveraging Library OSes for Semantic Memory Tracing Rehearsal of NVMSA 
17.08.20 Mario Günzel Suspension-Aware Earliest-Deadline-First Scheduling Analysis Rehearsal of EMSOFT
10.08.20 Jian-Jia Chen Timing analysis of self-suspending tasks for fixed-priority scheduling 
13.07.20 Kuan-Hsun Chen VTune Profiling  https://software.intel.com/content/www/us/en/develop/articles/vtune-tutorials.html
18.05.20 Junjie Shi MODES: Model-based Optimization on Distributed Embedded Systems 

The predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to obtain completed data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the available time for tuning. Model-BasedOptimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) MODES-B recognizes the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, (2) MODES-I recognizes all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate MODES by conducting experiments on the optimization for the hyper- parameters of a Random Forest and a Multi-Layer Perceptron on 5 open datasets using both presented optimization modes. The experiment results demonstrate that, with an improvement in terms of accuracy and resource consumption, MODES outperforms the two baselines: carry out tuning with MBO on each device independently and tuning with MBO by collecting completed data through expensive communications. 

04.05.20 Niklas Ueter


20.04.20 Lea Schönberger


30.03.20 Mikail Yayla *not ready to be shown


16.03.20 Qiao Yu

Introduction to XGboost and current progress of efficient

The talk introduces some features of xgboost including Regularized Objective, Taylor expansion, Sparsity-Aware Split Finding, Column Block for Parallel Learning and Shrinkage and Column subsampling. Ohter features like  Weighted quantile sketch for approximate tree learning and Out-of-core computation are not in this talk. 

24.02.20 Christian Hakert Why you should not move compiled code during execution

In short, the talk is about one idea to move a compiled binary during execution to new memory locations and which problems occur during this procedure. This will cover PIC (position independent code), PIE (position independent executable), PC relative addressing, ARMv8 ISA (instruction set architecture) and dynamic ELF loading.

20.02.20 Lea Schönberger

Priority-Preserving Optimization of Status Quo ID-Assignments in Controller Area Network

(DATE 2020 Rehearsal)

Controller Area Network (CAN) is the prevailing solution for connecting multiple electronic control units (ECUs) in automotive systems. Every broadcast message on the bus is received by each bus participant and introduces computational overhead to the typically resource-constrained ECUs due to interrupt handling. To reduce this overhead, hardware message filters can be applied. However, since such filters are configured according to the message identifiers (IDs) specified in the system, the filter quality is limited by the nature of the ID-assignment. Although hardware message filters are highly relevant for in- dustrial applications, so far, only the optimization of the filter design, but not the related optimization of ID-assignments has been addressed in the literature.

In this work, we explicitly focus on the optimization of message ID-assignments against the background of hardware message filtering. More precisely, we propose an optimization algorithm transforming a given ID-assignment in such a way that, based on the resulting IDs, the quality of hardware message filters is improved significantly, i.e., the computational overhead introduced to each ECU is minimized, and, moreover, the priority order of the system remains unchanged. Conducting comprehen- sive experiments on automotive benchmarks, we show that our proposed algorithm clearly outperforms optimizations based on the conventional method simulated annealing with respect to the achieved filter quality as well as to the runtime.

10.02.20 Kuan-Hsun Chen Open Access and Templates


27.01.20 Mario Günzel


16.12.19 Junjie Shi Brainstorming
How to construct 'good' dependency graphs when considering periodic real-time task system where each task has multiple critical sections?
How to handle the concept drift (dynamic optimization) problem when tuning the (optimal) hyper-parameter for machine learning algorithms during the run time?
02.12.19 Jian-Jia Chen Writing Workshop


18.11.19 Niklas Ueter


11.11.19 Yunfeng Huang

Quantifying Mobility Similarity based on Multi-modal Sensor Data from Mobile Devices

Master Thesis

Nowadays, due to the ubiquity of mobile device, more and more applications developed for mobile device put higher demand on tracking mobility in indoor environments. However, most research on tracking mobility of mobile device are based on indoor localization, which requires either installation of extra beaconing infrastructure or surveyor for creation of fingerprint. In the work we design a new system to track the mobility of multiple devices without localization. By quantifying the similarity between the fusion of Wi-Fi data and sensor data collected by different devices, our system is able to determine whether these devices are carried by same entity. Through experiments in different scenarios, the performance of our system can be evaluated. The overall accuracy can reach 89.71% in worst condition while the overall accuracy is 97.07% in the best condition. We anticipate that our system can be used as the complement of image technology for tracking the mobility of mobile device in the future

04.11.19 Lea Schönberger


21.10.19 Qiao Yu Accelerating k-means algorithm based on efficient filtering methods Master Thesis

K-means is a well-known clustering algorithm in data mining and machine learning. It is widely applicable in various domains such as computer vision, market segmentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fission-Fusion k-means accelerates k-means by grouping number of points automatically during the iterations. It can balance these expenses well between distance calculations and the filtering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms especially when the datasets are low-dimensional and the number of clusters is quite large. In addition, for more separated and naturally-clustered datasets, our algorithm is relatively faster than other accelerated k-means algorithms.

07.10.19 Marco Dürr End-to-End Timing Analysis of Sporadic Cause-Effect Chains in Distributed Systems Rehearsal Talk

A cause-effect chain is used to define the logical order of data dependent tasks, which is independent from the execution order of the jobs of the (periodic/sporadic) tasks. Analyzing the worst-case End-to-End timing behavior, associated to a cause-effect chain, is an important problem in embedded control systems. For example, the detailed timing properties of modern automotive systems are specified in the AUTOSAR Timing Extensions. In this paper, we present a formal End-to-End timing analysis for distributed systems. We consider the two most important End-to-End timing semantics, i.e., the button-to-action delay (termed as the maximum reaction time) and the worst-case data freshness (termed as the maximum data age). Our contribution is significant due to the consideration of the sporadic behavior of job activations, whilst the results in the literature have been mostly limited to periodic activations. The proof strategy shows the (previously unexplored) connection between the reaction time (data age, respectively) and immediate forward (backward, respectively) job chains. Our analytical results dominate the state of the art for sporadic task activations in distributed systems and the evaluations show a clear improvement for synthesized task systems as well as for a real world automotive benchmark setting.

23.09.19 Projektgruppe
F1/10 - Autonomous Racing PG 620: F1/10 - Autonomous Racing

Live-demo in CILAB

09.09.19 Jiang Bian/Mikail Yayla Parameter dropping during inference for efficient and scalable neural networks



Marco Dürr

End-to-End Timing Behavior Analysis of Immediate Forward and Backward Job-Chains based on AUTOSAR Timing Extensions



Junjie Shi

How to Apply Dependency Graph Approach in Synchronizing Periodic Real-Time Tasks in Multiprocessor



Mostafa Jafari Nodoushan



Niklas Ueter

RTSS rehearsal talk



Lea Schönberger

Do Nothing, but Carefully: Fault Tolerance with Timing Guarantees for Multiprocessor Systems devoid of Online Adaption PRDC rehearsal talk

Many practical real-time systems must be able to sustain several reliability threats induced by their physical environments that cause short-term abnormal system behavior, such as transient faults. To cope with this change of system behavior, online adaptions, which may introduce a high computation overhead, are performed in many cases to ensure the timeliness of the more important tasks while no guarantees are provided for the less important tasks. In this work, we propose a system model which does not require any online adaption, but, according to the concept of dynamic real-time guarantees, provides full timing guarantees as well as limited timing guarantees, depending on the system behavior. For the normal system behavior, timeliness is guaranteed for all tasks; otherwise, timeliness is guaranteed only for the more important tasks while bounded tardiness is ensured for the less important tasks. Aiming to provide such dynamic timing guarantees, we propose a suitable system model and discuss, how this can be established by means of partitioned as well as semi-partitioned strategies. Moreover, we propose an approach for handling abnormal behavior with a longer duration, such as intermittent faults or overheating of processors, by performing task migration in order to compensate the affected system component and to increase the system’s reliability. We show by comprehensive experiments that good acceptance ratios can be achieved under partitioned scheduling, which can be further improved under semi-partitioned strategies. In addition, we demonstrate that the proposed migration techniques lead to a reasonable trade-off between the decrease in schedulability and the gain in robustness of the system. The presented approaches can also be applied to mixed-criticality systems with two criticality levels.


Jian-Jia Chen



Marco Dürr

Anas Toma

Keynotes in retreat


03.09.18 Niklas Ueter



Lea Schönberger

Kuan-Hsun Chen

Analysis of Deadline Miss Rates for Uniprocessor Fixed-Priority Scheduling (Kuan) RTCSA rehearsals

Timeliness is an important feature for many embedded systems. Although soft real-time embedded systems can tolerate and allow certain deadline misses, it is still important to quantify them to justify whether the considered systems are acceptable. In this paper, we provide a way to safely over-approximate the expected deadline miss rate for a specific sporadic real-time task under fixed-priority preemptive scheduling in uniprocessor systems. Our approach is compatible with the existing results in the literature that calculate the probability of deadline misses either based on the convolution-based approaches or analytically. We  demonstrate our approach by considering randomly generated task sets with an execution behavior that simulates jobs that are subjected to soft errors incurred by hardware transient faults under a given fault rate. To empirically gather the deadline miss rates, we implemented an event-based simulator with a fault-injection module and release the scripts. With extensive simulations under different fault rates, we evaluate the efficiency and the pessimism of our approach. The evaluation results show that our approach is effective to derive an upper bound of the expected deadline miss rate and efficient with respect to the required computation time.

30.07.18 Junjie Shi Finding a better mapping for TensorFlow

Due to the high demand of running TensorFlow in the heterogeneous platforms in machine learning field, mapping problem becomes more and more important. We are trying to find an efficient way to tune a better mapping for TensorFlow in different heterogeneous platforms.

The state of the art is based on Genetic Algorithm and Gradient Boosting Regressors. However, the results depend on the initial setting of GA (e.g., starting points, probability of performing crossover, probability of mutation, and so on). We believe that our MBO method can outperform the existed methods.

28.06.18 Georg von der Brüggen ECRTS rehearsal


28.06.18 Anas Toma OSPERT rehearsal


04.06.18 Prof. Fang-Jing Wu Crowd Estimation: Approximating Crowd Sizes with Multi-modal Data Invited Talk

Crowd mobility has been paid attention for the Internet-of-things (IoT) applications. We address the crowd estimation problem and build an IoT service to share the crowd estimation results across different systems. The crowd estimation problem is to approximate the crowd size in a targeted area using the observed information (e.g., Wi-Fi data). We exploit Wi-Fi probe request packets ("Wi-Fi probes" for short) broadcasted by mobile devices to solve this problem. However, using only Wi-Fi probes to estimate the crowd size may result in inaccurate results due to various environmental uncertainties which may lead to crowd overestimation or underestimation. Moreover, the ground-truth is unavailable because the coverage of Wi-Fi signals is time-varying and invisible. Our system introduces auxiliary sensors, stereoscopic cameras, to collect the near ground-truth at a specified calibration choke point. The key idea of the proposed crowd estimation algorithm is to calibrate the Wi-Fi-only crowd estimation based on the correlations between the two types of data modalities. To verify the proposed system, we have launched an indoor pilot study in the Wellington Railway Station and an outdoor pilot study in the Christchurch Re:START Mall in New Zealand. The large-scale pilot studies show that stereoscopic cameras can reach the minimum accuracy of 85% and high precision detection for providing the near ground-truth. The proposed calibration algorithms reduce estimation errors by 43.68% on average compared to the Wi-Fi-only approach.

14.05.18 Kuan and JJ Infrastructure Meeting


23.04.18 Prof. Jian-Jia Chen Internal Workshop


05.03.18 Kuan-Hsun Chen OPT of Decision Trees


05.02.18 Prof. Jian-Jia Chen


27.11.17 Ching-Chi Lin


20.11.17 Lea Schönberger Improving Hardware-Based Message Acceptance Filtering for Controller Area Network (CAN)

In the field of automotive engineering, the Controller Area Network (CAN) is frequently used for the purpose of connecting multiple Electronic Control Units (ECUs). Unfortunately, even though this technique is prevalent in the respective domain, it entails additional computation overhead, since CAN is a broadcast bus. In fact, any message transmitted is possibly received by each ECU and thus must be evaluated in terms of its relevance for the respective receiving node. Such filtering for desired messages can be performed either by means of hardware or of software mechanisms, whereat the latter is preferably avoided due to the ECUs' resource limitations. Hardware-based approaches, in contrast, are much more cost-efficient and can reduce the number of irrelevant messages arriving at a receiving node drastically or even completely, provided that the configuration has been set properly. 

We provide novel methods for finding a minimal (feasible) filter configuration in the first instance as well as for determining a minimal (feasible) configuration under limited hardware resources, i.e., an insufficient amount of filters, such that the number of unintentionally accepted messages is minimized.

06.11.17 Junjie Shi The next phase of sub-project A3: Methods for Efficient Resource Utilization in Machine Learning Algorithm

During the optimization process in RAMBO, there are lots of points need to be evaluated, different points have different profits and execution times. Also, if there are two pints close with each other, the total profits won't be the sum of each of their profits (which can be shown easily as 1+1 < 2). My topic is about how to evaluate these candidates to maximize total profit under the constraints of limited time budget and number of cpus.  (kind of Quadratic Knapsack Problem)

23.10.17 Wei Liu

An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more difficult. Current approaches rely on programmers to tune their applications by exploiting the design space exhaustively without fully understanding the performance characteristics of their applications. To provide insights into the performance bottlenecks of parallel applications on GPU architectures, a simple analytical model is proposed to estimate the execution time of massively parallel programs. The key component of this model is estimating the number of parallel memory requests (we call this the memory warp parallelism) by considering the number of running threads and memory bandwidth. Based on the degree of memory warp parallelism, the model estimates the cost of memory requests, thereby estimating the overall execution time of a program.

09.10.17 Anas Toma Auxiliary Resources in Mobile Cloud Computing

A middleware was already presented in a former seminar to save energy in embedded systems by using nearby resources. In this talk, the results of experimental evaluations of the proposed middleware will be presented and discussed. They also include power consumption evaluation of the ODROID-XU4 for different operation modes.



Georg von der Brüggen

1) Parametric Utilization Bounds for Implicit-Deadline Periodic Tasks in Automotive Systems


2) Release Enforcement in Resource-Oriented
Partitioned Scheduling for Multiprocessor Systems

Rehearsal: RTNS Conference Presentation

1) Fixed-priority scheduling has been widely used in safety-critical applications. This paper explores the parametric utilization bounds for implicit-deadline periodic tasks in automotive uniprocessor systems, where the period of a task is either 1, 2, 5, 10, 20, 50, 100, 200, or 1000 milliseconds. We prove a parametric utilization bound of 90%+z for such automotive task systems under rate-monotonic preemptive scheduling (RM-P), where z is a parameter defined by the input task set with 0 ? z ? 10%. Moreover, we explain how to perform an exact schedulability test for an automotive task set under RM-P by validating only three conditions. Furthermore, we extend our analyses to rate-monotonic non-preemptive scheduling (RM-NP). We show that very reasonable utilization values can still be achieved under RM-NP if the execution time of all tasks is below 1 millisecond. The analyses presented here are compatible with angle synchronous tasks by applying the related arrival curves. It is shown in the evaluations that scheduling those angle-synchronous tasks according to their minimum inter-arrival time instead of assigning them to the highest priority can drastically increase the acceptance ratio in some settings. 2) When partitioned scheduling is used in real-time multiprocessor systems, access to shared resources can jeopardize the schedulability if the task partition is not done carefully. To tackle this problem we change our view angle from focusing on the computing tasks to focusing on the shared resources by applying resource-oriented partitioned scheduling. We use a release enforcement technique to shape the interference from the higher-priority jobs to be sporadic, analyze the schedulability, and provide strategies for partitioning both the critical and the non-critical sections of tasks onto processors individually. Our approaches are shown to be effective, both in the evaluations and from a theoretical point of view by providing a speedup factor of 6, improving previously known results.

To tackle the unavoidable self-suspension behavior due to I/O-intensive interactions, multi-core processors, computation offloading systems with coprocessors, etc., the dynamic and the segmented self-suspension sporadic task models have been widely used in the literature. We propose new self-suspension models that are hybrids of the dynamic and the segmented models. Those hybrid models are capable of exploiting knowledge about execution paths, potentially reducing modelling pessimism. In addition, we provide the corresponding schedulability analysis under fixed-relative-deadline (FRD) scheduling and explain how the state-of-the-art FRD scheduling strategy can be applied. Empirically, these hybrid approaches are shown to be effective with regards to the number of schedulable task sets.

19.07.17 Wei Liu OpenCL Offloading framework on Virus Detection


21.06.17 Prof. Jian-Jia Chen and Junjie Shi

Implementation and Evaluation of Multiprocessor Resource Synchronization Protocol (MrsP) on LITMUSRT

Rehearsal for ECRTS and OSPERT


07.06.17 Kuan-Hsun Chen Probabilistic Schedulability Tests for Uniprocessor Fixed-Priority Scheduling under Soft Errors Rehearsal Talk for SIES'17

Due to rising integrations, low voltage operations, and environmental influences such as electromagnetic interference and radiation, transient faults may cause soft errors and corrupt the execution state. Such soft errors can be recovered by applying fault-tolerant techniques. Therefore, the execution time of a job of a sporadic/periodic task may differ, depending upon the occurrence of soft errors and the applied error detection and recovery mechanisms. We model a periodic/sporadic real-time task under such a scenario by using two different worst-case execution times (WCETs), in which one is with the occurrence of soft errors and another is not. Based on a probabilistic soft-error model, the WCETs are hence with different probabilities. In this paper, we present efficient probabilistic schedulability tests that can be applied to verify the schedulability based on probabilistic arguments under fixed-priority  scheduling on a uniprocessor system. We demonstrate how the Chernoff bounds can be used to calculate the task workloads based on their probabilistic WCETs. In addition, we further consider how to calculate the probability of -consecutive deadline misses of a task. The pessimism and the efficiency of our approaches are evaluated against the tighter and approximated convolution-based approaches, by running extensive evaluations under different soft-error rates. The evaluation results show that our approaches are effective to derive the probability of deadline misses and efficient with respect to the needed calculation time.

31.05.17 Santiago Pagani Ultra-low power and dependability for IoT devices


16.05.17 Kuan-Hsun Chen How to deploy experiments on our servers?


09.05.17 Ching-Chi Lin
Research Proposal : Energy-efficient Containers-to-Server and Tasks-to-Core mapping in Cloud Computing System


16.02.17 Kevin/Georg Framework for Empirical Evaluation on Schedulability Tests for Real-Time Scheduling Algorithms


02.02.17 Anas Toma Power-Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems http://dl.acm.org/citation.cfm?id=2934612


19.01.17 Ingo Korb The Cake Cutting Problem

Fair distribution of a finite set of resources among multiple agents can be a complex, but important problem -- consider for example the problem of splitting a disputed territory among multiple neighbouring countries. Mathematicians tend to formulate it in a more pleasant way by thinking of a cake as the shared resource to be distributed. This talk will present a few interesting algorithms and results from this problem space.

08.12.16 Wei Liu A Simplified Acceleration Framework for Data Offloading and Workload Scheduling

Today’s trend to use accelerators like GPGPUs in heterogeneous computer systems has entailed several low-level APIs for accelerator programming. However, programming with these APIs is often tedious and therefore unproductive. Seeking faster application performance without significant programming effort is necessary for scientific programmers. In this work, we present a parallel acceleration framework with a set of simplified API functions. In our framework, based on these  API functions, data offloading algorithm and heterogeneous scheduling algorithm are effectively explored. We compare the performance of our framework with CUDA for some real-world applications and evaluate the performance. From experiment results, our framework is more efficient than low-level APIs with high programming efficiency.

17.11.16 (10:00) Georg von der Brüggen and Sheng-Wei Cheng Georg: Systems with Dynamic Real-Time Guarantees in Uncertain and Faulty Execution Environments Rehearsal: RTSS Conference Presentation

In many practical real-time systems, the physical environment and the system platform can impose uncertain execution behaviour to the system. For example, if transient faults are detected, the execution time of a task instance can be increased due to recovery operations. Such fault recovery routines make the system very vulnerable with respect to meeting hard real-time deadlines. In theory and in practical systems, this problem is often handled by aborting not so important tasks to guarantee the response time of the more important tasks. However, for most systems such faults occur rarely and the results of not so important tasks might still be useful, even if they are a bit late. This implicates to not abort these not so important tasks but keep them running even if faults occur, provided that the more important tasks still meet their hard real time properties. In this paper, we present  Systems with Dynamic Real-Time Guarantees to model this behaviour and determine if the system can provide full timing guarantees or limited timing guarantees  without any online adaptation after a fault occurred. We present a schedulability test, provide an algorithm for optimal priority assignment, determine the maximum interval length until the system will again provide full timing guarantees and explain how we can monitor the system state online. The approaches presented in this paper can also be applied to mixed criticality systems with dual criticality levels.

10.11.16 (10:00) Kevin Wen-Hung Huang


Resource-Oriented Partitioned Scheduling in Multiprocessor Systems: How to Partition and How to Share?


Rehearsal: RTSS Conference Presentation

When concurrent real-time tasks have to access shared resources, to prevent race conditions, the synchronization and resource access must ensure mutual exclusion, e.g., by using semaphores. That is, no two concurrent accesses to one shared resource are in their critical sections at the same time. For uniprocessor systems, the priority ceiling protocol (PCP) has been widely accepted and supported in real-time operating systems. However, it is still arguable whether there exists a preferable approach for resource sharing in multiprocessor systems. In this paper, we show that the proposed resource-oriented partitioned scheduling using PCP combined with a reasonable allocation algorithm can achieve a non-trivial speedup factor guarantee. Specically, we prove that our task mapping and resource allocation algorithm has a speedup factor 11.

27.10.16 Jian-Jia Chen


13.10.16 Georg von der Brüggen Uniprocessor Scheduling Strategies for Self-Suspending
Task Systems 
Rehearsal: RTNS Conference Presentation

We study uniprocessor scheduling for hard real-time self-suspending task systems where each task may contain a single self-suspension interval. We focus on improving state-of-the-art fixed-relative-deadline (FRD) scheduling approaches, where an FRD scheduler assigns a separate relative deadline to each computation segment of a task. Then, FRD schedules different computation segments by using the earliest-deadline first (EDF) scheduling policy, based on the assigned deadlines for the computation segments. Our proposed algorithm, Shortest Execution Interval First Deadline Assignment (SEIFDA), greedily assigns the relative deadlines of the computation segments, starting with  the task with the smallest execution interval length, i.e., the period minus the self-suspension  time. We show that any reasonable deadline assignment under this strategy has a speedup factor of $3$. Moreover, we present how to approximate the schedulability test and a generalized mixed integer linear programming (MILP) that can be formulated based on the tolerable loss in the schedulability test defined by the users.  We show by both analysis and experiments that through designing smarter relative deadline assignment policies, the resulting FRD scheduling algorithms yield significantly better performance than existing schedulers for such task systems.

22.09.16 Kuan-Hsun Chen

Overrun Handling for Mixed-Criticality Support in RTEMS

Rehearsal: WMC Workshop

Real-time operating systems are not only used in embedded real-time systems but also useful for the simulation and validation of those systems. During the evaluation of our paper about Systems with Dynamic Real-Time Guarantees that appears in RTSS 2016 we discovered certain unexpected system behavior in the open-source real-time operating system RTEMS. In the current implementation of RTEMS (version 4.11), overruns of an implicit-deadline task, i.e., deadline misses, result in unexpected system behavior as they may lead to a shift of the release pattern of the task. This also has the consequence that some task instances are not released as they should be. In this paper we explain the reason why such problems occur in RTEMS and our solutions.

08.09.16 (12:30) Anas Toma

Auxiliary Middleware Resources for Embedded Systems

In this talk, a middleware for client-server applications will be presented. The middleware can be used to save energy on the client device and also to reduce the workload of the server. The main idea is to provide auxiliary resources through the middleware by exploiting the nearby devices.

28.07.16 Ingo Korb


Something about trees, probably of the decision kind

14.07.16 Wei Liu

Deep learning for Vision: The Caffe Framework

Caffe provides multimedia scientist and practitioners with a clean and modifiable framework for state-of-the art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and Matlab bindings for training and deploying general purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU. Caffe allows experimentation and seamless switching among platforms for ease of development from prototyping machines to cloud environment.

30.06.16 Jian-Jia Chen

Open research data


16.06.16 Georg von der Brüggen

Presentation course


02.06.16 Kuan-Hsun Chen

Compensate or Ignore? Meeting Control Robustness Requirements through Adaptive Soft-Error Handling

Rehearsal Talk for LCTES'16

To avoid catastrophic events like unrecoverable system failures on mobile and embedded systems caused by soft-errors, software-based error detection and compensation techniques have been proposed. Methods like error-correction codes or redundant execution can offer high flexibility and allow for application-specific fault-tolerance selection without the needs of special hardware supports. However, such software-based approaches may lead to system overload due to the execution time overhead. An adaptive deployment of such techniques to meet both application requirements and system constraints is desired. From our case study, we observe that a control task can tolerate limited errors with acceptable performance loss. Such tolerance can be modeled as a (m, k) constraint which requires at least m correct runs out of any k consecutive runs to be correct. In this paper, we discuss how a given (m, k) constraint can be satisfied by adopting patterns of task instances with individual error detection and compensation capabilities. We introduce static strategies and provide a formal feasibility analysis for validation. Furthermore, we develop an adaptive scheme that extends our initial approach with online awareness that increases efficiency while preserving analysis results. The effectiveness of our method is shown in a real-world case study as well as for synthesized task sets.

19.05.16 Sheng-Wei Cheng

Many-Core Real-Time Task Scheduling with Scratchpad Memory


This work is motivated by the demand for scheduling tasks upon the increasingly popular island-based many-core architectures. On such an architecture, homogeneous cores are grouped into islands, each of which is equipped with a scratchpad memory module (referred to as local memory). We first show the NP-hardness and the inapproximability of the scheduling problem. Despite the inapproximability, positive results can still be found when different cases of the problem are investigated. A (3 ? 1/F)-approximation algorithm is proposed for the minimization of the maximum system utilization, where F is the number of cores in the platform. When the technique of resource augmentation is considered, this paper further develops a (? + 1)-memory (2??1)/(??1)-approximation algorithm, where ? represents the trade-off between CPU utilization and local memory space. On the other hand, a special case is also considered when the ratio of the worst-case execution time of a task without and with using the local memory is bounded by a constant. The capabilities of the proposed algorithms are then evaluated with benchmarks from MRTC, UTDSP, NetBench and DSPstone, where the maximum system utilization can be significantly reduced even when the local memory size is only 5% of the total footprint of all of the tasks.

19.05.16 Kevin Huang

Utilization Bounds on Allocating Rate-Monotonic Scheduled
Multi-Mode Tasks on Multiprocessor Systems

Rehersal Talk for DAC16

Formal models used for representing recurrent real-time processes have traditionally been characterized by a collection of jobs that are released periodically. However, such a modeling may result in resource under-utilization in systems whose behaviors are not entirely periodic. For instance, tasks in cyber-physical system (CPS) may change their service levels, e.g., periods and/or execution times, to adapt to the changes of environments. In this work, we study a model that is a generalization of the periodic task model, called multi-mode task model: a task has several modes specified with different execution times and periods to switch during runtime, independent of other tasks. Moreover, we study the problem of allocating a set of multi-mode tasks on a homogeneous multiprocessor system. We present a scheduling algorithm using any reasonable allocation decreasing (RAD) algorithm for task allocations for scheduling multi-mode tasks on multiprocessor systems. We prove that this algorithm achieves 38% utilization for implicit-deadline rate-monotonic (RM) scheduled multi-mode tasks on multiprocessor systems.

10.03.16 Kuan-Hsun Chen

GetSURE-II Progress Report

Rehearsal Talk for SPP1500

1. Task Mapping for Redundant Multithreading in Multi-Cores with Reliability and Performance Heterogeneity

2. Systems with Dynamic Real-Time Guarantees in Uncertain and Faulty Execution Environments

10.03.16 Kevin Huang Self-Suspension Real-Time Tasks under Fixed-Relative-Deadline Fixed-Priority Scheduling Rehearsal Talk for DATE'16

Self-suspension is becoming a prominent characteristic in real-time systems such as: (i) I/O-intensive systems (ii) multi-core processors, and (iii) computation offloading systems with coprocessors, like Graphics Processing Units (GPUs). In this work, we study self-suspension systems under fixed-priority (FP) fixed-relative-deadline (FRD) algorithm by using release enforcement to control self-suspension tasks' behavior. Specifically, we use equal-deadline assignment (EDA) to assign the release phases of computations and suspensions. We provide analysis for deriving the speedup factor of the FP FRD scheduler using suspension-laxity-monotonic (SLM) priority assignment.This is the first positive result to provide bounded speedup factor guarantees for general multi-segment self-suspending task systems.

18.02.16 (14:00) Anas Toma Brain-Computer Interface - Potential Research Areas

Brain-Computer Interface (BCI) is a communication or a control technique based on reading the neural electrical activities of the human brain. The commands are detected in spatiotemporal electroencephalograms (EEG) recorded by electrodes distributed over the scalp. This technology is mainly used to help the handicapped people with severe motor disabilities. In this talk, I will present a portable neuroheadset, a resource-constrained device, that reads raw EEG data from the brain and offloads it to a remote processing unit. Furthermore, I will introduce different computation offloading techniques used especially in wearable devices. Finally, potential research areas\\cooperation related to the presented techniques will be discussed (e.g. performance and energy optimization, real-time scheduling, reliability, parallel processing, pattern recognition, etc.).

21.01.16 Ingo Korb Care and Feeding of Benchmarks

Runtime benchmarking appears to be a simple topic - just run the program and measure how long it took. There are however some pitfalls that can influence the run time of your program and thus increase the variance of your results. This talk will demonstrate a few of them and give hints for avoiding them.

07.01.16 Wei Liu Data Offloading for Remote GPU Acceleration in Distributed Systems

As many computation-intensive applications increase on the mobile embedded systems, GPUs (Graphics Processing Units) can be used to accelerate these computation-intensive applications even in distributed systems. Traditionally, applications use Remote Procedure Call (RPC) to access the GPUs in the network. However, its simplicity has also limited its efficiency in existing implementations. Specifically, the API requires the application to execute many system calls like select, accept, read, and write. Each of these functions crosses the protection boundary between user space and the operating system, which is expensive. When several applications access GPUs in a remote server, concurrency on the server has become bottlenecks and the response time for applications on embedded systems will also largely be increased.To solve this problem, we propose a computation offloading framework for remote GPU acceleration in distributed systems. In our framework, a set of API is provided for remote GPU acceleration. An offloading decision algorithm can efficiently utilize GPU resources in the network. Moreover, data communication in our framework is based on user space network protocols. Compared with traditional Linux-based network protocols, our implementation can largely increase the concurrency of GPU utilization when a large number of applications offload data to the GPU server. Average response time on GPU servers can also be largely saved.

10.12.15 Maolin Yang

The partitioned fixed-priority scheduling of multiprocessor real-time systems with shared resources.

Multiprocessor scheduling has been studied since decades, and one well-known and well-understood scheduling policy is the partitioned fixed-priority scheduling. However, when shared resources protected by suspension-based locks are modeled explicitly in the system, the situation is much less understood: which synchronization strategy should

be used and how to partition real-time tasks among multiple cores? In this work, we present a dedicated-core synchronization framework, in which dedicated cores are reserved only for shared resources such that all requests to the shared resources are carried out on such cores. This synchronization strategy theoretically outperforms the other two well-know

synchronization strategies in terms of speedup factor. Meanwhile, interplay between task assignment and the response-time analysis is avoided, which enables an efficient task assignment and simplifies the corresponding analysis.

26.11.15 Prof. Jian-Jia Chen / Kuan-Hsun Chen


Experiments and Benchmarks! How do they matter?
(Embedding 3-d figures in Portable Document Format (PDF) files)



12.11.15 George von der Brueggen


Dynamic Hard and Soft Real-Time Service Level Guarantees in Uncertain and Faulty Execution Environments

In many practical real time systems the physical environment and the system platform will impose some kind of uncertain behaviour to the system. If faults are detected, the execution time of a task instance can be enlarged due to recovery operations. This fault recovery routines makes the system very vulnerable with respect to meeting hard real time deadlines. In theory and practical systems this problem is often handled by abortion ''not so important'' tasks to guarantee the response time of the more important tasks. However, for most systems those faults occur rarely and the results of ''not so important'' tasks might still be useful, even if they are a bit late. This leads to the idea to not abort these ''not so important'' tasks but keep them running if faults occur as long as the hard real time properties of the important tasks are still guaranteed. We present a new task model and related schedulability tests and an optimal priority assignment to handle this case.

29.10.15 Kevin Huang

Response Time Bounds for Sporadic Arbitrary-Deadline Tasks under Global Fixed-Priority Scheduling on Multiprocessors

In this paper, we study the problem of scheduling arbitrary-deadline real-time sporadic task sets on a multiprocessor system under global fixed-priority scheduling.  Two contributions are made   in this paper.  First, it has been shown that the existing response time analysis in arbitrary-deadline systems is flawed: the response time may be larger than the derived bound. This paper  provides a revised analysis resolving the problems with the original approach, and then propose a corresponding schedulability test. Secondly, we derive a linear-time upper bound on the response time of arbitrary-deadline tasks in multiprocessor systems. To the best of our knowledge, this is the first work presenting a linear-time response time upper bound for arbitrary-deadline sporadic tasks in multiprocessor systems. Empirically, this linear-time response time bound is shown to be highly effective in terms of the number of task sets that are deemed schedulable.

06.10.15 Wei Liu

Object Detection with Discriminatively Trained Part Based Models

We describe an object detection system based on mixtures of multiscale deformable part models. Mixtures of deformable part models are trained using a discriminative method that only requires bounding boxes for the objects in an image. The approach leads to efficient object detectors that achieve state of the art results on the PASCAL and INRIA person datasets.
22.09.15 Kuan-Hsun Chen

Reliability-Aware Task Mapping on Many-Cores with Performance Heterogeneity

Due to architectural design, process variations and aging, cores may exhibit heterogeneous performance.
In many-core systems. A commonly adopted soft error mitigation technique is Redundant Multithreading (RMT) that achieves error detection and recovery through redundant thread execution on different cores for an application.
However, task mapping and determining the task execution mode (i.e. a task executes in a reliable mode with RMT or unreliable mode without RMT) need to be considered for achieving resource-efficient reliability.
This paper explores how to efficiently assign the tasks onto different cores with heterogeneous performance properties and determine the execution modes of tasks in order to achieve high reliability and satisfy the tolerance of timeliness.
Our results illustrate that compared to state-of-the-art, the proposed approaches achieve up to 80% reliability improvement (on average 20%) under different scenarios of chip frequencies variation maps.

15.09.15 Emiily Wu

Online Energy Scheduling

Online energy scheduling is an online job scheduling problem. Each job has an arrival time, deadline, and execution time. A feasible solution is to finish every job before its deadline. A processor has three different states: busy, idle and sleep. The objective is to minimize the total energy consumption. In this talk I will introduce some previous work and share my preliminary idea for improvement.

25.08.15 Maolin Yang

Response-time analysis for multiprocessor real-time systems with shared resources

When real-time applications synchronize access to shared resources with binary semaphores (ie, ``mutexes'' or suspension-based locks), a real-time locking protocol is required to bound the undesired priority inversions that increase the response-time of tasks. This talk will discuss the challenges in semaphore protocol analysis, the most recent analytical method for semaphore protocols under fixed-priority scheduling (both G-FP and P-FP), and seek to discuss the potential extensions to further improve schedulability.

11.08.15 Kevin Huang

Techniques for Schedulability Analysis in Mode
Change Systems under Fixed-Priority Scheduling

Accepted at RTCSA15

Rehearsal Talk for RTCSA

28.07.15 Prof. Jian-Jia Chen

What should be done if the paper is WRONG.



Wei Liu

Parameter selection for Real-time tasks in Camera-based Object Detection System



Georg von der Brueggen

Schedulability and Optimization Analysis for
Non-Preemptive Static Priority Scheduling Based on Task Utilization and
Blocking Factors

Accepted ECRTS'15 paper

Rehearsal Talk


Kuan-Hsun Chen

Semi-automatic R2Pi Navigation

Fachprojekt discussion


Kevin Huang

Timing Analysis of Real-Time Self-Suspending Tasks under Fixed-Priority Scheduling

Accepted DAC'15 paper

Rehearsal Talk


Georg von der Brueggen

Monte-Carlo Method

A short introduction into the Monte-Carlo Method. The general idea of the Monte-Carlo Method is to use random sampling to get (hopefully good) solutions for mathematical, physical or computer science problems that are too difficult to solve analytically.


Kevin Huang

Timing Analysis of Real-Time Self-Suspending Tasks under Fixed-Priority Scheduling

Self-suspension is becoming an increasingly prominent characteristic in real-time systems such as: (i) I/O-intensive systems, where applications interact intensively with I/O devices, (ii) multi-core processors, where tasks running on different cores have to synchronize and communicate with each other, and (iii) computation offloading systems with coprocessors, like Graphics Processing Units (GPUs).


Wei Liu

System-Level Performance Optimization through Data Offloading and Parallel GPU Executions

GPUs are becoming extremely important to improve system performance for many embedded systems. Running massively parallel workloads on GPUs is challenging for overall system performance especially when massive workloads are executed concurrently. In this paper, we develop a mechanism to optimize the system-level performance. Two scheduling algorithms can be used to scheduled parallel workloads with data offloading and parallel executions. Experiments show the performance of our algorithms and the feasibility of our mechanism across different platforms.


Kuan-Hsun Chen

Dependable Task Mapping on Many Cores

To mitigate soft errors, redundant copies of an application can execute on different cores in redundant multithreading (RMT) to achieve much higher reliability. On the other hand, due to the architectural design, process variations, and aging, individual cores in manycore systems may have heterogeneous performance. Therefore, we discuss about how to allocate the tasks and cores with heterogeneous performance for dependable execution.


Prof. Jian-Jia Chen

Prof. Peter Marwedel

How to write a paper

Jian-Jia: (staff only)

Peter: #1(staff only), #2(staff only)



George von der Brueggen

dSpace Turorial: Control Desk - Next Generation (Basic)

dSpace GmbH

Introduction into the basic concepts of the dSpace Control Desk. It can be used to simulate Hardware in the loop, either by using hardware or software simulators. We covered the properties of the Control Desk, creation and configuration of use cases, the definition of variables, input and output devices and data recording.