Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures

The possibilities of increasing the readiness of a redundant computer system for the timely execution of requests critical to service delays are being investigated. A fault-tolerant computer cluster is considered in which nodes are duplicated computing systems that combine computer nodes and memor...

Full description

Bibliographic Details
Main Authors: Vladimir A. Bogatyrev, Stanislav V. Bogatyrev, Anatoly V. Bogatyrev
Format: Article
Language:English
Published: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2023-06-01
Series:Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:
Online Access:https://ntv.ifmo.ru/file/article/22070.pdf
_version_ 1797798445976649728
author Vladimir A. Bogatyrev
Stanislav V. Bogatyrev
Anatoly V. Bogatyrev
author_facet Vladimir A. Bogatyrev
Stanislav V. Bogatyrev
Anatoly V. Bogatyrev
author_sort Vladimir A. Bogatyrev
collection DOAJ
description The possibilities of increasing the readiness of a redundant computer system for the timely execution of requests critical to service delays are being investigated. A fault-tolerant computer cluster is considered in which nodes are duplicated computing systems that combine computer nodes and memory nodes. Two-stage recovery of memory nodes is assumed: first physical, and then informational, carried out using the resources of computing nodes. The novelty of the approach lies in the fact that for systems with a limitation of the allowable service time of functional requests, the impact of recovery disciplines on the readiness of the system with various options for dividing computing resources to restore information after memory failures and to perform the required functions is evaluated. At the same time, the reliability of the computer systems under study is assessed not only by the probability of their readiness to perform functional tasks (by the readiness coefficient), but also by the probability of the system readiness to perform tasks in a timely manner. Justification of the choice of disciplines for the restoration and maintenance of the flow of functional requests is carried out on the basis of Markov models. At the same time, models are proposed that allow taking into account the impact of the division of computing resources on the joint performance of the required functions and on the information recovery of memory, implemented after its physical recovery. The choice of computer system maintenance disciplines based on the proposed Markov model is aimed at achieving a compromise between the desire to increase the availability factor and the probability of timely execution of the incoming flow of functional requests. The justification of the choice of options for the distribution (separation) of computing resources stored after failures to solve functional queries (required functions) and information recovery of memory, implemented after its physical recovery, is carried out. Based on the proposed Markov models, the dependence of the system readiness for timely execution of requests on the distribution options of computing resources stored in the system for restoring information in memory and for performing functional tasks is investigated. The study was conducted depending on the allowable waiting time for functional requests and the intensity of their traffic. The influence on the system readiness for timely execution of traffic balancing requests of functional tasks between functional computing nodes is analyzed, taking into account the options for their possible joint use for information recovery of memory nodes after their physical recovery. The existence of an optimal share of traffic distribution between computing nodes is shown, taking into account the options for dividing their resources to service functional requests and to restore information in memory nodes after their physical recovery. The results obtained can be used to justify the choice of disciplines for servicing functional requests and recovery after failures of fault-tolerant cluster systems critical to delays in the execution of functional requests.
first_indexed 2024-03-13T04:03:49Z
format Article
id doaj.art-0a6eca79dec846a5aba13f5cef9e678f
institution Directory Open Access Journal
issn 2226-1494
2500-0373
language English
last_indexed 2024-03-13T04:03:49Z
publishDate 2023-06-01
publisher Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format Article
series Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling doaj.art-0a6eca79dec846a5aba13f5cef9e678f2023-06-21T09:42:25ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732023-06-0123360861710.17586/2226-1494-2023-23-3-608-617Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failuresVladimir A. Bogatyrev0https://orcid.org/0000-0003-0213-0223Stanislav V. Bogatyrev1https://orcid.org/0000-0003-0836-8515Anatoly V. Bogatyrev2https://orcid.org/0000-0001-5447-7275D.Sc., Professor, ITMO University, Saint Petersburg, 197101, Russian Federation, 197101, Russian Federation; Professor, Saint Petersburg State University of Aerospace Instrumentation, 190000, Russian Federation, sc 7006571069PhD Student, ITMO University, Saint Petersburg, 197101, Russian Federation; Consulting Engineer, Yadro Cloud Storage Development Center, Saint Petersburg, 195027, Russian Federation, sc 57183002200PhD, Consulting Engineer, Yadro Cloud Storage Development Center, Saint Petersburg, 195027, Russian Federation, sc 56549712700The possibilities of increasing the readiness of a redundant computer system for the timely execution of requests critical to service delays are being investigated. A fault-tolerant computer cluster is considered in which nodes are duplicated computing systems that combine computer nodes and memory nodes. Two-stage recovery of memory nodes is assumed: first physical, and then informational, carried out using the resources of computing nodes. The novelty of the approach lies in the fact that for systems with a limitation of the allowable service time of functional requests, the impact of recovery disciplines on the readiness of the system with various options for dividing computing resources to restore information after memory failures and to perform the required functions is evaluated. At the same time, the reliability of the computer systems under study is assessed not only by the probability of their readiness to perform functional tasks (by the readiness coefficient), but also by the probability of the system readiness to perform tasks in a timely manner. Justification of the choice of disciplines for the restoration and maintenance of the flow of functional requests is carried out on the basis of Markov models. At the same time, models are proposed that allow taking into account the impact of the division of computing resources on the joint performance of the required functions and on the information recovery of memory, implemented after its physical recovery. The choice of computer system maintenance disciplines based on the proposed Markov model is aimed at achieving a compromise between the desire to increase the availability factor and the probability of timely execution of the incoming flow of functional requests. The justification of the choice of options for the distribution (separation) of computing resources stored after failures to solve functional queries (required functions) and information recovery of memory, implemented after its physical recovery, is carried out. Based on the proposed Markov models, the dependence of the system readiness for timely execution of requests on the distribution options of computing resources stored in the system for restoring information in memory and for performing functional tasks is investigated. The study was conducted depending on the allowable waiting time for functional requests and the intensity of their traffic. The influence on the system readiness for timely execution of traffic balancing requests of functional tasks between functional computing nodes is analyzed, taking into account the options for their possible joint use for information recovery of memory nodes after their physical recovery. The existence of an optimal share of traffic distribution between computing nodes is shown, taking into account the options for dividing their resources to service functional requests and to restore information in memory nodes after their physical recovery. The results obtained can be used to justify the choice of disciplines for servicing functional requests and recovery after failures of fault-tolerant cluster systems critical to delays in the execution of functional requests.https://ntv.ifmo.ru/file/article/22070.pdfclusteravailability factorrecoveryinformation recovery of memorymarkov modelrecovery disciplinecriticality to service delaysprobability of timely execution of requestsduplicated systemfault tolerance
spellingShingle Vladimir A. Bogatyrev
Stanislav V. Bogatyrev
Anatoly V. Bogatyrev
Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
cluster
availability factor
recovery
information recovery of memory
markov model
recovery discipline
criticality to service delays
probability of timely execution of requests
duplicated system
fault tolerance
title Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
title_full Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
title_fullStr Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
title_full_unstemmed Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
title_short Assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
title_sort assessment of the readiness of a computer system for timely servicing of requests when combined with information recovery of memory after failures
topic cluster
availability factor
recovery
information recovery of memory
markov model
recovery discipline
criticality to service delays
probability of timely execution of requests
duplicated system
fault tolerance
url https://ntv.ifmo.ru/file/article/22070.pdf
work_keys_str_mv AT vladimirabogatyrev assessmentofthereadinessofacomputersystemfortimelyservicingofrequestswhencombinedwithinformationrecoveryofmemoryafterfailures
AT stanislavvbogatyrev assessmentofthereadinessofacomputersystemfortimelyservicingofrequestswhencombinedwithinformationrecoveryofmemoryafterfailures
AT anatolyvbogatyrev assessmentofthereadinessofacomputersystemfortimelyservicingofrequestswhencombinedwithinformationrecoveryofmemoryafterfailures