Intelligent Distributed Fault and Performance Management for Communication Networks

Li, Hongjun

Intelligent Distributed Fault and Performance Management for Communication Networks

Files

PhD_2002-2.pdf (1.11 MB)

No. of downloads: 1064

Date

2002

Authors

Li, Hongjun

Abstract

This dissertation is devoted to the design of an intelligent,distributed fault and performance management system forcommunication networks. The architecture is based on a distributed agent paradigm, with belief networks as the framework forknowledge representation and evidence propagation.

The dissertation consists of four major parts. First, we choosethe mobile code technology to help implement a distributed,extensible framework for supporting adaptive, dynamic networkmonitoring and control. The focus of our work is on three aspects.First, there is the design of the standard infrastructure, or VirtualMachine, based on which agents could be created, deployed, managedand initiated to run. Second, there is the collection API for our delegatedagents to collect data from network elements. Third, there is the callbackmechanism through which the functionality of the delegated agentsor even the native software could be extended. We propose threesystem designs based on such ideas.

Second, we propose a distributed framework for intelligent faultmanagement purpose. The managed network is divided into severaldomains and for each domain, there is an intelligent agentattached to it, which is responsible for this domain's faultmanagement tasks. Belief networks are embedded in such an agent asthe probabilistic fault models, based on which evidencepropagation and decision making processes are carried out.

Third, we address the problem of parameter learning for beliefnetworks with fixed structure. Based on the idea ofExpectation-Maximization (EM), we derive a uniform learningalgorithm under incomplete observations. Further, we study therate of convergence via the derivation of Jacobian matrices of ouralgorithm and provide a guideline for choosing step size. Oursimulation results show that the learned values are relativelyclose to the true values. This algorithm is suitable for bothbatch and on-line mode.

Finally, when using belief networks as the fault models, weidentify two fundamental questions: (1) When can I say that I get theright diagnosis and stop? (2) If right diagnosis has not been obtainedyet, which test should I choose next?

The first question istackled by the notion of right diagnosis via intervention, and wesolve the second problem based on a dynamic decision theoreticstrategy. Simulation shows that our strategy works well for thediagnosis purpose. This framework is general, scalable, flexibleand robust.

URI (handle)

http://hdl.handle.net/1903/6329

Collections

Institute for Systems Research Technical Reports

Full item page