Date of Award
1993
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical and Computer Engineering
First Advisor
Suresh Rai
Abstract
This dissertation presents improved techniques for analyzing network-connected (NCF), 2-connected (2CF), task-based (TBF), and subcube (SF) functionality measures in a hypercube multiprocessor with faulty processing elements (PE) and/or communication elements (CE). These measures help study system-level fault tolerance issues and relate to various application modes in the hypercube. Solutions discussed in the text fall into probabilistic and deterministic models. The probabilistic measure assumes a stochastic graph of the hypercube where PE's and/or CE's may fail with certain probabilities, while the deterministic model considers that some system components are already failed and aims to determine the system functionality. For probabilistic model, MIL-HDBK-217F is used to predict PE and CE failure rates for an Intel iPSC system. First, a technique called CAREL is presented. A proof of its correctness is included in an appendix. Using the shelling ordering concept, CAREL is shown to solve the exact probabilistic NCF measure for a hypercube in time polynomial in the number of spanning trees. However, this number increases exponentially in the hypercube dimension. This dissertation, then, aims to more efficiently obtain lower and upper bounds on the measures. Algorithms, presented in the text, generate tighter bounds than had been obtained previously and run in time polynomial in the cube dimension. The proposed algorithms for probabilistic 2CF measure consider PE and/or CE failures. In attempting to evaluate deterministic measures, a hybrid method for fault tolerant broadcasting in the hypercube is proposed. This method combines the favorable features of redundant and non-redundant techniques. A generalized result on the deterministic TBF measure for the hypercube is then described. Two distributed algorithms are proposed to identify the largest operational subcubes in a hypercube $C\sb{n}$ with faulty PE's. Method 1, called LOS1, requires a list of faulty components and utilizes the CMB operator of CAREL to solve the problem. In case the number of unavailable nodes (faulty or busy) increases, an alternative distributed approach, called LOS2, processes m available nodes in O(mn) time. The proposed techniques are simple and efficient.
Recommended Citation
Soh, Sieteng, "Reliability Analysis of the Hypercube Architecture." (1993). LSU Historical Dissertations and Theses. 5548.
https://repository.lsu.edu/gradschool_disstheses/5548
Pages
201
DOI
10.31390/gradschool_disstheses.5548