Summary: | Abstract Distributed computing continuum systems (DCCS) make use of a vast number of computing devices to process data generated by edge devices such as the Internet of Things and sensor nodes. Besides performing computations, these devices also produce data including, for example, event logs, configuration files, network management information. When these data are analyzed, we can learn more about the devices, such as their capabilities, processing efficiency, resource usage, and failure prediction. However, these data are available in different forms and have different attributes due to the highly heterogeneous nature of DCCS. The diversity of data poses various challenges which we discuss by relating them to big data, so that we can utilize the advantages of big data analytical tools. We enumerate several existing tools that can perform the monitoring task and also summarize their characteristics. Further, we provide a general governance and sustainable architecture for DCCS, which reflects the human body’s self-healing model. The proposed model has three stages: first, it analyzes system data to acquire knowledge; second, it can leverage the knowledge to monitor and predict future conditions; and third, it takes further actions to autonomously solve any issue or to alert administrators. Thus, the DCCS model is designed to minimize the system’s downtime while optimizing resource usage. A small set of data is used to illustrate the monitoring and prediction of the performance of a system through Bayesian network structure learning. Finally, we discuss the limitations of the governance and sustainability model, and we provide possible solutions to overcome them and make the system more efficient.
|