August 1, 1997 - August 3, 1997

Uncertainty in Artificial Intelligence (UAI) ’97

Location: Providence, Rhode Island, USA

  • Speaker: Steffen L. Lauritzen

    Inference in probabilistic expert systems has been made possible through the development of efficient algorithms that in one way or another involve message passing between local entities arranged to form a junction tree. Many of these algorithms have a common structure which can be partly formalized in abstract axioms with an algebraic flavor. However, the existing abstract frameworks do not fully capture all interesting cases of such local computation algorithms. The lecture described the basic elements of the algorithms, giving examples of interesting local computations that are covered by current abstract frameworks, and also examples of interesting computations that are not, with a view towards reaching a fuller exploitation of the potential in these ideas.

  • Speaker: Robert J. McEliece

    In 1993 a group coding researchers in France devised, as part of their astonishing “turbo code” breakthrough, a remarkable iterative decoding algorithm. This algorithm can be viewed as an inference algorithm on a Bayesian network, but (a) it is approximate, not exact, and (b) it violates a sacred assumption in Bayesian analysis, viz., that the network should have no loops. Indeed, it is accurate to say that the turbo decoding algorithm is functionally equivalent to Pearl’s algorithm applied to a certain directed bipartite graph in which the messages circulate around indefinitely, until either convergence is reached, or (more realistically) for a fixed number of cycles. With hindsight, it is possible to trace a continuous chain of “loopy” belief propagation algorithms within the coding community beginning in 1962 (with Gallager’s iterative decoding algorithm for low density parity check codes), continued in 1981 by Tanner and much more recently (1995-1996) by Wiberg and MacKay-Neal. This talk challenged the UAI community to reassess the conventional wisdom that probability propagation only works in trees, since the coding community has now accumulated considerable experimental evidence that in some cases at least, “loopy” belief propagation works, at least approximately. McEliece brought the AI audience up to speed on the latest developments in coding, emphasizing convolutional codes, since they are the building blocks for turbo-codes. Two of the most important (pre-turbo) decoding algorithms, viz. Viterbi (1967) and BCJR (1974) can be stated in orthodox Bayesian network terms. BCJR, for example, is an anticipation of Pearls’ algorithm on a special kind of tree, and Viterbi’s algorithm gives a solution to the “most probable explanation” problem on the same structure. Thus coding theorists and AI people have been working on, and solving, similar problems for a long time. It would be nice if they became more aware of each other’s work.

  • Speaker: Alejandro A. Schaffer

    Genetic linkage analysis is a collection of statistical techniques used to infer the approximate chromosomal location of disease susceptibility genes using family tree data. Among the widely publicized linkage discoveries in 1996 were the approximate locations of genes conferring susceptibility to Parkinson’s disease, prostate cancer, Crohn’s disease, and adult-onset diabetes. Most linkage analysis methods are based on maximum likelihood estimation. Parametric linkage analysis methods use probabilistic inference on Bayesian networks, which is also used in the UAI community. Schaffer gave a self-contained overview of the genetics, statistics, algorithms, and software used in real linkage analysis studies.

  • Speaker: David J.C. MacKay

    Feedforward neural networks such as multilayer perceptrons are popular tools for nonlinear regression and classification problems. From a Bayesian perspective, a choice of a neural network model can be viewed as defining a prior probability distribution over non-linear functions, and the neural network’s learning process can be interpreted in terms of the posterior probability distribution over the unknown function. (Some learning algorithms search for the function with maximum posterior probability and other Monte Carlo methods draw samples from this posterior probability). In the limit of large but otherwise standard networks, Neal (1996) has shown that the prior distribution over non-linear functions implied by the Bayesian neural network falls in a class of probability distributions known as Gaussian processes. The hyperparameters of the neural network model determine the characteristic lengthscales of the Gaussian process. Neal’s observation motivates the idea of discarding parameterized networks and working directly with Gaussian processes. Computations in which the parameters of the network are optimized are then replaced by simple matrix operations using the covariance matrix of the Gaussian process. In this talk MacKay reviewed work on this idea by Neal, Williams, Rasmussen, Barber, Gibbs and MacKay, and assessed whether, for supervised regression and classification tasks, the feedforward network has been superceded.