Checkpointing for the RESTART Problem in Markov Networks
We apply the known formulae of the RESTART problem to Markov models of software (and many other) systems, and derive new equations. We show how checkpoints might be included, with their resultant performance under RESTART. The result is a complete procedure for finding the mean, variance, and tail behavior of the job completion time as a function of the failure rate. We also provide a detailed example.
& Gokhale, S. S.
(2011). Checkpointing for the RESTART Problem in Markov Networks. Journal of Applied Probability, 48A, 195-207.