Benjamin Rouxel

Google Scholar

Journals

Time-sensitive autonomous architectures

Real-Time Systems'2023 pdf 

Abstract BibTeX

Autonomous and software-defined vehicles (ASDVs) feature highly complex systems, coupling safety-critical and non-critical components such as infotainment. These systems require the highest connectivity, both inside the vehicle and with the outside world. An effective solution for network communication lies in Time-Sensitive Networking (TSN) which enables high-bandwidth and low-latency communications in a mixed-criticality environment. In this work, we present Time-Sensitive Autonomous Architectures (TSAA) to enable TSN in ASDVs. The software architecture is based on a hypervisor providing strong isolation and virtual access to TSN for virtual machines (VMs). TSAA latest iteration includes an autonomous car controlled by two Xilinx accelerators and a multiport TSN switch. We discuss the engineering challenges and the performance evaluation of the project demonstrator. In addition, we propose a Proof-of-Concept design of virtualized TSN to enable multiple VMs executing on a single board taking advantage of the inherent guarantees offered by TSN.

@article{ferraro2023time,
title={Time-sensitive autonomous architectures},
author={Ferraro, Donato and Palazzi, Luca and Gavioli, Federico and Guzzinati, Michele and Bernardi, Andrea and Rouxel, Benjamin and Burgio, Paolo and Solieri, Marco},
journal={Real-Time Systems},
pages={1--41},
year={2023},
publisher={Springer}
}

Tightening contention delays while scheduling parallel applications on multi-core architectures

TECS'2017 pdf pdf pdf 

Abstract BibTeX

Multi-core systems are increasingly interesting candidates for executing parallel real-time applications, in avionic, space or automotive industries, as they provide both computing capabilities and power eciency. However, ensuring that timing constraints are met on such platforms is challenging, because some hardware resources are shared between cores.
Assuming worst-case contentions when analyzing the schedulability of applications may result in systems mistakenly declared unschedulable, although the worst-case level of contentions can never occur in practice. In this paper, we present two contention-aware scheduling strategies that produce a time-triggered schedule of the application’s tasks. Based on knowledge of the application’s structure, our scheduling strategies precisely estimate the effective contentions, in order to minimize the overall makespan of the schedule. An Integer Linear Programming (ILP) solution of the scheduling problem is presented, as well as a heuristic solution that generates schedules very close to ones of the ILP (5 % longer on average), with a much lower time complexity. Our heuristic improves by 19% the overall makespan of the resulting schedules compared to a worst-case contention baseline.

@article{rouxel2017tightening,
author = {Rouxel, Benjamin and Derrien, Steven and Puaut, Isabelle},
title = {Tightening Contention Delays While Scheduling Parallel Applications on Multi-core Architectures},
journal = {ACM Transactions on Embedded Computing Systems (TECS)},
issue_date = {October 2017},
volume = {16},
number = {5s},
month = oct,
year = {2017},
issn = {1539-9087},
pages = {164:1--164:20},
articleno = {164},
numpages = {20},
url = {http://doi.acm.org/10.1145/3126496},
doi = {10.1145/3126496},
acmid = {3126496},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Real-time system, contention-aware scheduling}}

Conferences/Workshops

The IMOCO4.E reference framework for intelligent motion control systems

ETFA'2023 pdf 

Abstract BibTeX

Intelligent motion control is integral to modern cyber-physical systems. However, smart integration of intelligent motion control with commercial and industrial systems requires domain expertise, industrial ‘know-how’of the production processes, and resilient adaptation for the various engineering phases. The challenge is amplified with the adoption of advanced digital twin approaches, big data and artificial intelligence in the various industrial domains. This paper proposes the IMOCO4. E reference framework for the smart integration of intelligent motion control with commercial platforms (eg from SMEs) and industrial systems. The IMOCO4. E reference framework brings together the architecture, data management, artificial intelligence and digital twin viewpoints from the industrial users of the large-scale ‘Intelligent Motion Control under Industry4. E’(IMOCO4. E) consortium. The framework envisions a generic platform for designing, developing, and implementing novice and complex motion-controlled industrial systems. Refinements and instantiations of the framework for the IMOCO4. E industrial cases validate the framework’s applicability for various industrial domains throughout the engineering phases and under different constraints imposed on the industrial cases.

@inproceedings{imocoframework,
title={The {IMOCO4.E} reference framework for intelligent motion control systems},
author={Mohamed, Sajid and others},
booktitle={ETFA},
year={2023}
}

Machine Learning Techniques for Understanding and Predicting Memory Interference in CPU-GPU Embedded Systems

RTCSA'2023

Abstract

Nowadays, heterogeneous embedded platforms are extensively used in various low-latency applications, including the automotive industry, real-time IoT systems, and automated factories. These platforms utilize specific components, such as CPUs, GPUs, and neural network accelerators for efficient task processing and to solve specific problems with a lower power consumption compared to more traditional systems. However, since these accelerators share resources such as the global memory, it is crucial to understand how workloads behave under high computational loads to determine how parallel computational engines on modern platforms can interfere and adversely affect the system’s predictability and performance. One area that remains unclear is the interference effect on shared memory resources between the CPU and GPU: more specifically, the latency degradation experienced by GPU kernels when memory-intensive CPU applications run concurrently. In this work, we first analyze the metrics that characterize the behavior of different kernels under various board conditions caused by CPU memory-intensive workloads on a Nvidia Jetson Xavier. Then, we exploit various machine learning methodologies aiming to estimate the latency degradation of kernels based on their metrics. As a result of this, we are able to identify the metrics that could potentially have the most significant impact when predicting the kernels completion latency degradation.

Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

JSSPP'2023 pdf 

Abstract

The current trend in recently released Graphic Processing Units (GPUs) is to exploit transistor scaling at the architectural level, hence, larger and larger GPUs in every new chip generation are released. Architecturally, this implies that the clusters count of parallel processing elements embedded within a single GPU die is constantly increasing, posing novel and interesting research challenges for performance engineering in latency-sensitive scenarios. A single GPU kernel is now likely not to scale linearly when dispatched in a GPU that features a larger cluster count. This is either due to VRAM bandwidth acting as a bottleneck or due to the inability of the kernel to saturate the massively parallel compute power available in these novel architectures. In this context, novel scheduling approaches might be derived if we consider the GPU as a partitionable compute engine in which multiple concurrent kernels can be scheduled in non-overlapping sets of clusters. While such an approach is very effective in improving the GPU overall utilization, it poses significant challenges in estimating kernel execution time latencies when kernels are dispatched to variable-sized GPU partitions. Moreover, memory interference within co-running kernels is a mandatory aspect to consider. In this work, we derive a practical yet fairly accurate memory-aware latency estimation model for co-running GPU kernels.

The TeamPlay Project: Analysing and Optimising Time, Energy, and Security for Cyber-Physical Systems.

DATE'2023 pdf pdf pdf video

Abstract BibTeX

Non-functional properties, such as energy, time, and security (ETS) are becoming increasingly important in Cyber-Physical Systems (CPS) programming. This article describes TeamPlay, a research project funded under the EU Horizon 2020 programme between January 2018 and June 2021. TeamPlay aimed to provide the system designer with a toolchain for developing embedded applications where ETS properties are first-class citizens, allowing the developer to reflect directly on energy, time and security properties at the source code level. In this paper we give an overview of the TeamPlay methodology, introduce the challenges and solutions of our approach and summarise the results achieved. Overall, applying our TeamPlay methodology led to an improvement of up to 18% performance and 52% energy usage over traditional approaches.

@inproceedings{teamplay-date-23,
title = {The {TeamPlay} Project: Analysing and Optimising Time, Energy, and Security for Cyber-Physical Systems},
author = {Benjamin Rouxel and Christopher Brown and Emad Ebeid and Kerstin Eder and Heiko Falk and Clemens Grelck and Jesper Holst and Shashank Jadhav and Yoann Marquer and Marcos Martinez De Alejandro and Kris Nikov and Ali Sahafi and Ulrik Pagh Schultz Lundquist and Adam Seewald and Vangelis Vassalos and Simon Wegener and Olivier Zendra},
booktitle = {Proceedings of DATE '23: Design, Automation and Test in Europe},
month = {April},
year = 2023,
address = {Antwerp , Belgium}}

YASMIN: a Real-time Middleware for COTS Heterogeneous Platforms

Middleware'2021 pdf pdf video

Abstract BibTeX

Commercial-off-the-shelf (COTS) heterogeneous platforms provide immense computational power, but are difficult to program and to correctly use when real-time requirements come into play: A sound configuration of the operating system scheduler is needed, and a suitable mapping of tasks to computing units must be determined. Flawed designs lead to sub-optimal system configurations and, thus, to wasted resources or even to deadline misses and system failures.
We propose YASMIN, a middleware to schedule end-user applications with real-time requirements in user space and on behalf of the operating system. YASMIN combines an easy-to-use programming interface with portability across a wide range of architectures. It treats heterogeneity on COTS embedded platforms as a first-class citizen: YASMIN supports multiple functionally equivalent task implementations with distinct extra-functional behaviour. This enables the system designer to quickly explore different scheduling policies and task-to-core mappings, and thus, to improve overall system performance.
In this paper, we present the design and implementation of YASMIN and provide an analysis of the scheduling overhead on an Odroid-XU4 platform. We demonstrate the merits of YASMIN on an industrial use-case involving a search-and-rescue drone.

@inproceedings{rouxel2021yasmin,
title={YASMIN: a Real-time Middleware for COTS Heterogeneous Platforms},
author={Rouxel, Benjamin and Altmeyer, Sebastian and Grelck, Clemens},
booktitle={Proceedings of the 22nd International Middleware Conference (Middleware '21)},
year={2021},
organization={ACM}}

Scheduling DAGs of Multi-version Multi-phase Tasks on Heterogeneous Real-time Systems

MCSoC'2021 pdf pdf 

Abstract BibTeX

Heterogeneous high performance embedded systems are increasingly used in industry. Nowadays, these platforms embed accelerator-style components, such as GPUs, alongside different CPU cores. We use multiple alternatives/versions/implementations of tasks to fully benefit from the heterogeneous capacities of such platforms and due to binary incompatibility. Implementations targeting accelerators not only require access to the accelerator but also to a CPU core for, e.g., pre-processing and branching the control flow. Hence, accelerator workloads can naturally be divided into multiple phases (e.g. CPU, GPU, CPU). We propose an asynchronous scheduling approach that utilises multiple phases and thereby enables a finegrained scheduling of tasks that require two types of hardware. We show that our approach can increase the schedulability rate by up 24% over two multi-version phase-unaware schedulers. Additionally, we demonstrate that the schedulability rate of our heuristic is close to the optimal schedulability rate.

@inproceedings{roeder2021scheduling,
title={Scheduling DAGs of Multi-version Multi-phase Tasks on Heterogeneous Real-time Systems},
author={Roeder, Julius and Rouxel, Benjamin and Grelck, Clemens},
booktitle={Proceedings of the 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2021)},
year={2021},
organization={IEEE}}

Task-level Redundancy vs Instruction-level Redundancy against Single Event Upsets in Real-time DAG scheduling

MCSoC'2021 pdf pdf 

Abstract BibTeX

Real-time cyber-physical systems have become ubiquitous. As such systems are often mission-critical, designers must include mitigations against various types of hardware faults, including Single Event Upsets (SEU). SEUs can be mitigated using both software and hardware approaches. When using software approaches, the application designer needs to select the appropriate redundancy level for the application. We propose the use of task-level redundancy for SEU detection, aiming at applications structured as a Directed Acyclic Graph (DAG) of tasks. This work compares existing instruction-level redundancy against task-level redundancy using the UPPAAL model checking tool in SMC mode. Our comparison shows that task-level redundancy implemented using Dual Modular Spatial Redundancy and Checkpoint-Restart offers significantly lower deadline miss ratios when slack is limited. While task-level redundancy usually performs better or equal, we also show that rare cases exist where long running DAG application benefit more from instruction-level redundancy.

@inproceedings{miedema2021task,
title={Task-level Redundancy vs Instruction-level Redundancy against Single Event Upsets in Real-time DAG scheduling},
author={Miedema, Lukas and Rouxel, Benjamin and Grelck, Clemens},
booktitle={Proceedings of the 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2021)},
year={2021},
organization={IEEE}}

Energy-aware Scheduling of Multi-version Tasks on Heterogeneous Real-time Systems

SAC'2021 pdf 

Abstract BibTeX

The emergence of battery-powered devices has led to an increase of interest in the energy consumption of computing devices. For embedded systems, dispatching the workload on different computing units enables the optimisation of the overall energy consumption on high-performance heterogeneous platforms. However, to use the full power of heterogeneity, architecture specific binary blocks are required, each with different energy/time trade-offs. Finding a scheduling strategy that minimises the energy consumption, while guaranteeing timing constraints creates new challenges. These challenges can only be met by using the full heterogeneous capacity of the platform (e.g. heterogeneous CPU, GPU, DVFS, dynamic frequency changes from within an application). We propose an off-line scheduling algorithm for dependent multi-version tasks based on Forward List Scheduling to minimise the overall energy consumption. Our heuristic accounts for Dynamic Voltage and Frequency Scaling (DVFS) and enables applications to dynamically adapt voltage and frequency during run time. We demonstrate the benefits of multi-version task models coupled with an energy-aware scheduler. We observe that selecting the most energy efficient version for each task does not lead to the lowest energy consumption for the whole application. Then we show that our approach produces schedules that are on average 45.6% more energy efficient than schedules produced by a state-of-the-art scheduling algorithm. Next we compare our heuristic against an optimal solution derived by an Integer Linear Programming (ILP) formulation (deviation of 1.6% on average). Lastly, we empirically show that the energy consumption predicted by our scheduler is close to the actual measured energy consumption on a Odroid-XU4 board (at most -15.8%).

@inproceedings{roeder2021energy,
title={Energy-aware Scheduling of Multi-version Tasks on Heterogeneous Real-time Systems},
author={Roeder, Julius and Rouxel, Benjamin and Altmeyer, Sebastiand Grelck, Clemens},
booktitle={Proceedings of the 36rd Annual ACM Symposium on Applied Computing},
pages={},
year={2021},
organization={ACM}
}

PReGO: A Generative Methodology for Satisfying Real-Time Requirements on COTS-Based Systems: Definition and Experience Report

GPCE'2020 pdf pdf video

Abstract BibTeX

Satisfying real-time requirements in cyber-physical systems is challenging as timing behaviour depends on the application software, the embedded hardware, as well as the execution environment. This challenge is exacerbated as real-world, industrial systems often use unpredictable hardware and software libraries or operating systems with timing hazards and proprietary device drivers. All these issues limit or entirely prevent the application of established real-time analysis techniques.
In this paper we propose PReGO, a generative methodology for satisfying real-time requirements in industrial commercial-off-the-shelf (COTS) systems. We report on our experience in applying PReGO to a use-case: a Search & Rescue application running on a fixed-wing drone with COTS components, including an NVIDIA Jetson board and a stock Ubuntu/Linux. We empirically evaluate the impact of each integration step and demonstrate the effectiveness of our methodology in meeting real-time application requirements in terms of deadline misses and energy consumption.

@inproceedings{rouxel2020prego,
title={PReGO: a generative methodology for satisfying real-time requirements on COTS-based systems: definition and experience report},
author={Rouxel, Benjamin and Schultz, Ulrik Pagh and Akesson, Benny and Holst, Jesper and J{\o}rgensen, Ole and Grelck, Clemens},
booktitle={Proceedings of the 19th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences},
pages={70--83},
year={2020},
organization={ACM}
}

Towards Energy-, Time- and Security-aware Multi-core Coordination

Coordination'2020 pdf 

Abstract BibTeX

Coordination is a well established computing paradigm with a plethora of languages, abstractions and approaches. Yet, we are not aware of any adoption of the principle of coordination in the broad domain of cyber-physical systems, where non-functional properties, such as execution/response time, energy consumption and security are as crucial as functional correctness.
We propose a coordination approach, including a functional coordination language and its associated tool flow, that considers time, energy and security as first-class citizens in application design and development. We primarily target cyber-physical systems running on off-the-shelf heterogeneous multi-core platforms. We illustrate our approach by means of a real-world use case, an unmanned aerial vehicle for autonomous reconnaissance mission, which we develop in close collaboration with industry.

@inproceedings{roeder2020towards,
title={Towards Energy-, Time-and Security-Aware Multi-core Coordination},
author={Roeder, Julius and Rouxel, Benjamin and Altmeyer, Sebastian and Grelck, Clemens},
booktitle={International Conference on Coordination Languages and Models},
pages={57--74},
year={2020},
organization={Springer}
}

Hiding communication delays in contention-free execution for SPM-based multi-core architectures. (outstanding)

ECRTS'2019 pdf pdf 

Abstract BibTeX

Multi-core systems using ScratchPad Memories (SPMs) are attractive architectures for executing time-critical embedded applications, because they provide both predictability and performance. In this paper, we propose a scheduling technique that jointly selects SPM contents off-line, in such a way that the cost of SPM loading/unloading is hidden. Communications are fragmented to augment hiding possibilities. Experimental results show the effectiveness of the proposed technique on streaming applications and synthetic task-graphs. The overlapping of communications with computations allows the length of generated schedules to be reduced by 4% on average on streaming applications, with a maximum of 16%, and by 8% on average for synthetic task graphs. We further show on a case study that generated schedules can be implemented with low overhead on a predictable multi-core architecture (Kalray MPPA).

@inproceedings{rouxel2019hiding,
title={Hiding communication delays in contention-free execution for SPM-based multi-core architectures},
author={Rouxel, Benjamin and Skalistis, Stefanos and Derrien, Steven and Puaut, Isabelle},
booktitle={31st Euromicro Conference on Real-Time Systems (ECRTS 2019)},
year={2019},
organization={Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik}
}

A Time-predictable Branch Predictor

SAC'2019 pdf 

Abstract BibTeX

Long pipelines need good branch predictors to keep the pipeline running. Current branch predictors are optimized for the average case, which might not be a good fit for real-time systems and worst-case execution time analysis.
This paper presents a time-predictable branch predictor co-designed with the associated worst-case execution time analysis. The branch predictor uses a fully-associative cache to track branch outcomes and destination addresses. The fully-associative cache avoids any false sharing of entries between branches. Therefore, we can analyze program scopes that contain a number of branches lower than or equal to the number of branches in the prediction table. Experimental results show that the worst-case execution time bounds of programs using the proposed predictor are lower than using static branch predictors at a moderate hardware cost.

@inproceedings{schoeberl2018atime,
title={A Time-predictable Branch Predictor},
author={Schoeberl, Martin and Rouxel, Benjamin and Puaut, Isabelle},
booktitle={Proceedings of the 34rd Annual ACM Symposium on Applied Computing},
pages={},
year={2019},
organization={ACM}
}

Tightening contention delays while scheduling parallel applications on multi-core architectures

EMSOFT'2017 pdf pdf pdf 

Abstract BibTeX

Multi-core systems are increasingly interesting candidates for executing parallel real-time applications, in avionic, space or automotive industries, as they provide both computing capabilities and power eciency. However, ensuring that timing constraints are met on such platforms is challenging, because some hardware resources are shared between cores.
Assuming worst-case contentions when analyzing the schedulability of applications may result in systems mistakenly declared unschedulable, although the worst-case level of contentions can never occur in practice. In this paper, we present two contention-aware scheduling strategies that produce a time-triggered schedule of the application’s tasks. Based on knowledge of the application’s structure, our scheduling strategies precisely estimate the effective contentions, in order to minimize the overall makespan of the schedule. An Integer Linear Programming (ILP) solution of the scheduling problem is presented, as well as a heuristic solution that generates schedules very close to ones of the ILP (5 % longer on average), with a much lower time complexity. Our heuristic improves by 19% the overall makespan of the resulting schedules compared to a worst-case contention baseline.

@article{Rouxel:2017:TCD:3145508.3126496,
author = {Rouxel, Benjamin and Derrien, Steven and Puaut, Isabelle},
title = {Tightening Contention Delays While Scheduling Parallel Applications on Multi-core Architectures},
journal = {International Conference on Embedded Software (EMSOFT)},
year = {2017},
pages = {20},
url = {http://doi.acm.org/10.1145/3126496},
doi = {10.1145/3126496},
publisher = {ACM}}

STR2RTS: Refactored StreamIT benchmarks into statically analyzable parallel benchmarks for WCET estimation & real-time scheduling

WCET Workshop in ECRTS'2017 pdf pdf 

Abstract BibTeX

We all had quite a time to find non-proprietary architecture-independent exploitable parallel benchmarks for Worst-Case Execution Time (WCET) estimation and real-time scheduling. However, there is no consensus on a parallel benchmark suite, when compared to the single-core era and the Mälardalen benchmark suite [12]. This document bridges part of this gap, by presenting a collection of benchmarks with the following good properties: (i) easily analyzable by static WCET estimation tools (written in structured C language, in particular neither goto nor dynamic memory allocation, containing flow information such as loop bounds); (ii) independent from any particular run-time system (MPI, OpenMP) or real-time operating system. Each benchmark is composed of the C source code of its tasks, and an XML description describing the structure of the application (tasks and amount of data exchanged between them when applicable). Each benchmark can be integrated in a full end-to-end empirical method validation protocol on multi-core architecture. This proposed collection of benchmarks is derived from the well known StreamIT [21] benchmark suite and will be integrated in the TACleBench suite [11] in a near future. All these benchmarks are available at https://gitlab.inria.fr/brouxel/STR2RTS.

@InProceedings{rouxel_et_al:OASIcs:2017:7304,
author ={Benjamin Rouxel and Isabelle Puaut},
title ={{STR2RTS: Refactored StreamIT Benchmarks into Statically Analyzable Parallel Benchmarks for WCET Estimation & Real-Time Scheduling}},
booktitle ={17th International Workshop on Worst-Case Execution Time Analysis (WCET 2017)},
pages ={1:1--1:12},
series ={OpenAccess Series in Informatics (OASIcs)},
ISBN ={978-3-95977-057-6},
ISSN ={2190-6807},
year ={2017},
volume ={57},
editor ={Jan Reineke},
publisher ={Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address ={Dagstuhl, Germany},
URL ={http://drops.dagstuhl.de/opus/volltexte/2017/7304},
URN ={urn:nbn:de:0030-drops-73047},
doi ={10.4230/OASIcs.WCET.2017.1},
annote ={Keywords: Parallel benchmarks, Tasks scheduling, Worst-Case Execution Time estimation}}

The Heptane Static Worst-Case Execution Time Estimation Tool

WCET Workshop in ECRTS'2017 pdf pdf 

Abstract BibTeX

Estimation of worst-case execution times (WCETs) is required to validate the temporal behavior of hard real time systems. Heptane is an open-source software program that estimates upper bounds of execution times on MIPS and ARM v7 architectures, offered to the WCET estimation community to experiment new WCET estimation techniques. The software architecture of Heptane was designed to be as modular and extensible as possible to facilitate the integration of new approaches. This paper is devoted to a description of Heptane, and includes information on the analyses it implements, how to use it and extend it.

@InProceedings{hardy_et_al:OASIcs:2017:7303,
author ={Damien Hardy and Benjamin Rouxel and Isabelle Puaut},
title ={{The Heptane Static Worst-Case Execution Time Estimation Tool}},
booktitle ={17th International Workshop on Worst-Case Execution Time Analysis (WCET 2017)},
pages ={8:1--8:12},
series ={OpenAccess Series in Informatics (OASIcs)},
ISBN ={978-3-95977-057-6},
ISSN ={2190-6807},
year ={2017},
volume ={57},
editor ={Jan Reineke},
publisher ={Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address ={Dagstuhl, Germany},
URL ={http://drops.dagstuhl.de/opus/volltexte/2017/7303},
URN ={urn:nbn:de:0030-drops-73033},
doi ={10.4230/OASIcs.WCET.2017.8},
annote ={Keywords: Worst-Case Execution Time Estimation, Static Analysis, WCET Estimation Tool, Implicit Path Enumeration Technique}}

Work In Progress

Brief Announcement: Optimized GPU-accelerated Feature Extraction for ORB-SLAM Systems

SPAA'2023 pdf 

Abstract BibTeX

Reducing the execution time of ORB-SLAM algorithm is a crucial aspect of autonomous vehicles since it is computationally intensive for embedded boards. We propose a parallel GPU-based implementation, able to run on embedded boards, of the Tracking part of the ORB-SLAM2/3 algorithm. Our implementation is not simply a GPU port of the tracking phase. Instead, we propose a novel method to accelerate image Pyramid construction on GPUs. Comparison against state-of-the-art CPU and GPU implementations, considering both computational time and trajectory errors shows improvement on execution time in well-known datasets, such as KITTI and EuRoC.

@inproceedings{muzzini2023brief,
title={Brief Announcement: Optimized GPU-accelerated Feature Extraction for ORB-SLAM Systems},
author={Muzzini, Filippo and Capodieci, Nicola and Cavicchioli, Roberto and Rouxel, Benjamin},
booktitle={Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures},
pages={299--302},
year={2023}}

Strategy Switching: Smart Fault-tolerance for Resource-constrained Real-time Applications

CERCIRAS'2021 pdf 

Abstract BibTeX

Software-based fault-tolerance is an attractive alternative to hardware-based fault-tolerance, as it allows for the use of cheap Commercial Off The Shelf hardware. However, software-based fault-tolerance comes at a cost, requiring computing the same results multiple times to allow for the detection and mitigation of faults. Resource-constrained real-time applications may not be able to afford this cost. At the same time, the domain of a real-time task may allow it to tolerate a fault, provided it does not occur in consecutive iterations of the task. In this paper, we introduce a new way to deploy fault-tolerance called strategy switching. Our method targets Single Event Upsets by running different subsets of tasks under fault-tolerance at different points in time. We do not bound the number of faults in a window, nor does our method assume that tasks under fault-tolerance cannot still fail. Our technique does not require a minimal amount of additional compute resources for fault-tolerance. Instead, our method optimally utilizes any available compute resources for fault-tolerance for resource-constrained real-time applications.

@article{miedema2021strategy,
title={Strategy Switching: Smart Fault-tolerance for Resource-constrained Real-time Applications},
author={Miedema, Lukas and Rouxel, Benjamin and Grelck, Clemens},
year={2021}
}

Q-learning for Statically Scheduling DAGs

BigData'2020

 BibTeX

@inproceedings{roeder2020qlearning,
title={Q-learning for Statically Scheduling DAGs},
author={Roeder, Julius and Rouxel, Benjamin and Grelck, Clemens},
booktitle={Proceedings of the 2020 IEEE International Conference on Big Data},
year={2020},
organization={IEEE}

Interdependent Multi-version Scheduling in Heterogeneous Energy-aware Embedded Systems

JRWRTC'2019 pdf 

Abstract BibTeX

High-performance heterogeneous multi-core embedded systems are increasingly popular in various fields. Embedded systems engineers need to reason about more than just functional correctness of applications; they also need to reason about energy, time and security (ETS). In this paper, we sketch our coordination language and scheduling approach to enable ETS-aware applications. We present an Integer Linear Programming (ILP) based scheduler on a real life drone application, that minimizes energy consumption, guarantees timing and offers security.

@inproceedings{roeder2019time,
title={Interdependent Multi-version Scheduling in Heterogeneous Energy-aware Embedded Systems},
author={Roeder, Julius and Rouxel, Benjamin and Altmeyer, Sebastian and Grelck, Clemens},
booktitle={The 31st Junior Researcher Workshop on Real-Time Computing (JRWRTC 2019)},
year={2019}
}

A time, energy and security coordination approach

WATERS'2019 pdf pdf 

 BibTeX

@inproceedings{rouxel2019time,
title={A time, energy and security coordination approach},
author={Rouxel, Benjamin and Roeder, Julius and Altmeyer, Sebastian and Grelck, Clemens},
booktitle={10th International Workshop on Analysis Tools and Methodologies for Embedded and Real-Time Systems (WATERS 2019)},
year={2019}
}

Resource-aware task graph scheduling using ILP on multi-core

ACACES'2016 pdf pdf 

Abstract

Multi-core usage have increased in real-time embedded system. Despite of the different existing multi-core architecture, there is a real need for mapping and scheduling applications on those architecture. Many techniques already exist to achieve it but all of them take the worst-case latency when dealing with shared resource. We propose here to study the impact of synchronisation on the global WCET. By adding synchronisation on tasks that use a shared resources we are able to decrease the effect of contention on shared resources.

Thesis

Minimising shared resource contention when scheduling real-time applications on multi-core architectures

PhD Thesis'2018 pdf pdf 

Abstract BibTeX

Multi-core architectures using scratch pad memories are very attractive to execute embedded time-critical applications, because they offer a large computational power. However, ensuring that timing constraints are met on such platforms is challenging, because some hardware resources are shared between cores. When targeting the bus connecting cores and external memory, worst-case sharing scenarios are too pessimistic. This thesis propose strategies to reduce this pessimism. These strategies offer to both improve the accuracy of worst-case communication costs, and to exploit hardware parallel capacities by overlapping computations and communications. Moreover, fragmenting the latter allow to increase overlapping possibilities.

@phdthesis{rouxel2018minimising,
title={Minimising shared resource contention when scheduling real-time applications on multi-core architectures},
author={Rouxel, Benjamin},
year={2018},
school={Rennes 1}}

Symbolic Evaluation and Disassembling x86 Low-Level Code

Master Thesis'2015 pdf 

Abstract

Malwares have been existing from the beginning of computer science. Nowadays malwares authors use more and more sophisticated obfuscations to hide their code. Analyzing malwares is not an easy task as their authors use of imagination to find obfuscations that defeat standard disassemblers.
The major tool they use is a packer which warps the malicious code by series of self-modifying steps. The key tools to study or detect malwares, are unpackers and disassemblers. Most of the time, those tools are studied independently and thus not compatible between each other. In addition, nowadays we have to process a huge amount of suspicious softwares, in this context existing solutions are not ecient.
This document focuses on x86 malwares and how to build a disassembler in conjunction of an unpacker which can handle self-modifying code and other obfuscation techniques. It uses dynamic introspection and static analysis to reconstruct the program in a high level of abstraction. The proposed solution implements the work of [1, 2, 3]. The method is evaluated as sound and ecient. Results are encouraging but further work will show if the accuracy of the disassembler is improvable.

Symbolic Evaluation and x86 Disassembling

Master thesis, bibliography study'2015 pdf 

Abstract

Cyberterrorists do not only use Denial Of Service attacks. Past years have seen emerged a lot of very evolved viruses such as Duqu’s driver or Stuxnet. In order to facilitate the reverse-engineering process of such programs, researchers need powerful binary analysis platforms as the source code is not available. The first step of such platform is : disassembling the binary file. This consists in translating low-level instruction code to a higher level of abstraction. However viruses are not easy to disassemble. Indeed developers use techniques to obfuscate their code in order to make the reverse-engineering process much harder. They also use self-modifying code which makes most of the binary analysis tools inefficient. This document presents existing approaches and tools to reconstruct program structure called Control Flow Graph from binary code, and gives a hint on their usefulness when dealing with code obfuscation and self-modifying code. This work is part of the ANR BINSEC1 project which aims to build an efficient binary analysis platform with such programming techniques. At first this platform will support x86 assembly code. However the final goal is to be architecture independent in order to support all kind of binary code with the help of an Intermediate Language.

Informations Traceability in Control Flow to Compute WCET with Compiler Optimizations

First year of Master'2014 pdf 

Abstract

Real-time systems are omnipresent in our life. Such kind of systems require more attention as human lifes depend on them. Designers need to compute the Worst Case Execution Time (WCET) in order to guarantee they respect their timing constraints. Many WCET techniques exist, the safest one is based on static analysis. This method analyzes the source code structure, and model a target architecture to compute the WCET. To compute a WCET close to the reality, it must be done at the binary level (target platform modeling is easier at this level). Also, information flow are required to improve the precision of the WCET. Infeasible paths property is one of them. It may come from mutually exclusion between conditional branch in the execution flow of a program. Those properties are extracted at a high-level language design. As most of real-time applications are compiled to C code, flow informations must be propagated through compiler steps to WCET estimation tools. But compilers propose to optimize the code’s structure, and most of the flow information depends on the code structure. So infeasible paths properties become invalid when the compiler optimizes the code. In order to reconcile real-time system developers, we created a framework to trace infeasible paths properties into compilers. It is designed to be fully optimizations independent t(if an optimization is added/removed or modified, our framework will still be working properly). The resulted WCET estimation are very encouraging, our traceability allowed to improve the WCET by around 30% in our test case.

Embedded-Scilab : A Scilab Compiler Designed To Be Used With Embedded System

First year of Master'2014 pdf 

Abstract

Scilab est un langage très populaire pour le prototypage d’application. Il est cependant impossible d’utiliser ces prototypes sur des architectures embarquées. En effet, ceux-ci ne peuvent embarquer un interpréteur Scilab, entre autre car l’interprétation est généralement moins efficace que l’exécution d’un code compilé en langage machine. Une solution est donc de compiler préalablement Scilab en C. Scilab étant un langage à typage dynamique, une des étapes cruciales de la compilation est l’inférence de type des variables. Ce document introduit les concepts nécessaires de la compilation, puis présente des solutions pour la compilation de Scilab en C dans le but d’obtenir un code efficace, tant en temps d’exécution qu’en occupation mémoire.