Adaptation of Distributed Safety-Critical Time-Triggered Systems using Machine Learning

Adaptation can be achieved by adjusting schedules to change the distribution of tasks among the available resources, avoiding failed resources, or accounting for slack within schedules. Metascheduling can be utilised in Time-Triggered Systems (TTS) to realise adaptation. However, metaschedulers encounter the problems of state explosion, storage limitations, and runtime issues when dealing with a high number of tasks. The objective of this thesis is to introduce a metascheduling model that utilises Machine Learning (ML) to generate modified schedules in real-time. Consequently, there is no longer a need to store the extensive collections of schedules produced by a meta-scheduler. Furthermore, real-time adaptation is enhanced by producing compact, compatible models for run-time execution. This thesis conducts a study, analysis, modelling, comparison, and implementation of Machine Learning (ML) based meta-scheduling algorithms focusing on multi-core safety-critical TTS. The motivation for adaptation in TTS is to achieve enhanced energy efficiency, perform fault recovery, and adjust to varying environmental conditions.
This study examined different types of machine learning models. The models used in this study include Graph Neural Networks (GNN), Encoder/Decoder Neural Networks (E/D NN), Artificial Neural Networks (ANN), Random Forest Classifiers (RFC), and Reinforcement Learning (RL) algorithms. The performance and complexity of the previously mentioned ML algorithms were tested in assigning temporal and spatial allocation aspects of the meta-scheduling problem. Additionally, comparisons were made with heuristic algorithms commonly used in literature for the purpose of comparison. The significance of this study is that the suggested method offers a way to balance the storage capacity of the multi-core safety-critical TTS with the number of schedules for each adaptation scenario. This is achieved by utilising a conventional meta-scheduler with a Genetic Algorithm (GA) to generate appropriate datasets for training. Three datasets were primarily produced with an emphasis on workload (operation load across processing units), makespan, and energy usage.
The suggested ML architecture has the capability to handle a wide range of scenarios without requiring the storage of schedules. In addition to increasing adaptation capacity, as ML models are capable of adjusting to new situations that were not part of the training dataset.
Moreover, permitting online operation (real-time execution) of the RL algorithm provides an extra layer for adaptation as it continues to enhance the decision making process of allocating resources with time.
The thesis is part of a research project that targets a solution to the conventional metaschedulers' problems: state-space explosion and runtime inefficiency by implementing the previously mentioned ML models in a hardware simulated environment designed to mimic the hardware difficulties of a multi-core safety-critical TTS. It compares the performance parameters of several algorithms as well, enabling the selection of the ideal model for each certain case depending on the required performance and hardware resource consumption. The results indicated that the GNN-based model exhibited superior accuracy and performance in predicting temporal allocations. The RL-based approach demonstrated remarkable adaptability and continuous improvement in real-time scheduling scenarios. The ANN and RFC models also performed robustly, offering substantial computational efficiency and reduced energy consumption compared to traditional heuristic methods. The integration of the GA for dataset generation enhanced the training process, resulting in highly optimised and reliable models.
Additionally, the experimental results highlighted that the ML models effectively balance the trade-offs between workload, makespan, and energy consumption, providing a versatile solution adaptable to various operational demands. Overall, the proposed framework not only addresses the limitations of traditional metaschedulers but also sets a new standard for adaptive, efficient, and scalable scheduling in multi-core safety-critical TTS.

DOI

10.25819/ubsi/10767

URN

nbn:de:hbz:467-71001

URI

https://dspace.ub.uni-siegen.de/handle/ubsi/7100

License

http://creativecommons.org/licenses/by-nc-nd/4.0/

File(s)