Mathematical Approach to Uncertainty
In his classic 1976 book, Shafer stated the paradigm shift, which led him to formulate an alternative to the existing Bayesian formalism for automated reasoning, thus leading to what is commonly known as Dempster-Shafer evidential reasoning. The basic concept was that an expert’s complete ignorance about a statement need not translate into giving 1/2 a probability to the statement and the other 1/2 to its complement, as was assumed in Bayesian reasoning (Shafer).
Recently, engineers and scientists began recognizing the absolute necessity of defining and addressing uncertainty. In the new era of super-speed computers, technology is equipped to better handle complex analyses, yet only one mathematical framework is relied upon and used to represent uncertainty: the probability theory. Probability theory and evidence theory are introduced as possible mathematical structures for the representation of the epistemic uncertainty associated with the performance of safety systems.
Probabilistic networks are graphical models supporting the modeling of uncertainty in large complex domains. The framework of probabilistic networks was designed for reasoning and uncertainty (Renooij). Uncertainties exist in every aspect of decision-making process in expert systems. This paper aims to utilize evidence theory that can assist in the task of quantifying uncertainty for expert environments. Overview There are three types of uncertainty: Aleatory uncertainty, epistemic uncertainty and error as shown in Figure 1 (Agarwal). Figure 1.
Classification of Uncertainty (adapted from Agarwal) Probability theory provides the two mathematical structures traditionally used in the representation of uncertainty: 1. Aleatory or random uncertainty is an inherent uncertainty associated with the environment or some kind of physical system. Variability, random uncertainty, irreducible uncertainty, and stochastic uncertainty are other terms used describing aleatory uncertainty (Bae and Grandhi). An example is the atmospheric reaction of two different metals due to changes in temperature. 2.
Epistemic uncertainty is due to lack of knowledge of quantities or processes of the system or the environment and appears to be subjective. Subjective uncertainty, incertitude uncertainty, and reducible uncertainty are other terms used describing epistemic uncertainty (Bae and Grandhi). An example is the presence of minimum amount of data that characterizes new processes and material. 3. Error. Estimation error is due to incompleteness of sampling information and our inability to estimate accurately the model parameters that describe inherent variability.
Model imperfection is due to lack of knowledge or understanding of physical phenomena, or ignorance, and the use of simplified structural models, or errors of simplification (Der Kiureghian as cited by Nikolaidis). Upper and lower probabilities are the basis that led to combination theory. Dempster’s rule of combination can be directly extended for the combination of N independent and equally reliable sources of evidence and its major interest comes essentially from its commutativity and associativity properties.
When Dempster’s orthogonal sum rule is used for combining (fusing) information from experts who might disagree with each other, one obtains the usual Dempster-Shafer (DS) theory (Dempster). Debois stated that absolute reliability implies that the analyst is qualified to make distinctions between the reliability of experts, sensors and/or other sources of information and can express this distinction between sources mathematically (Dubois and Prade). According to Klir when he was describing the Generalized Information Theory (GIT), the following axiomatic requirements, each expressed in a generic form, must be satisfied whenever applicable:
1. Subadditivity-the amount of uncertainty in a joint representation of evidence (defined on a Cartesian product) cannot be greater than the sum of the amounts of uncertainty in the associated marginal representations of evidence. 2. Additivity-the two amounts of uncertainty considered under subadditivity become equal if and only if the marginal representations of evidence are non-interactive according to the rules of the uncertainty calculus involved. 3.
Range-the range of uncertainty is [0, M], where 0 must be assigned to the unique uncertainty function that describes full certainty and M depends on the size of the universal set involved and on the chosen unit of measurement. 4. Continuity-any measure of uncertainty must be a continuous functional. 5. Expansibility-expanding the universal set by alternatives that are not supported by evidence must not affect the amount of uncertainty. 6. Branching/Consistency-when uncertainty can be computed in more ways, which are all acceptable within the calculus of the uncertainty theory involved, the results must be the same (consistent).
7. Monotonocity-when evidence can be ordered in the uncertainty theory employed (as in possibility theory), the relevant uncertainty measure must preserve this ordering. 8. Coordinate invariance-when evidence is described within the n-dimensional Euclidean space (n? 1), the relevant uncertainty measure must not change under isometric transformations of coordinates. When distinct types of uncertainty coexist in a given uncertainty theory, it is not necessary that these requirements be satisfied by each uncertainty type.
However, they must be satisfied by an overall uncertainty measure, which appropriately aggregates measures of the individual uncertainty types (Klir 25). Analyses based in Probabilistic Approach Probability theory is a popular approach in uncertainty quantification in engineering problems. Renooij stated this in his definition as “With the term probability elicitation method, we denote any aid that is used to acquire a probability from an expert” (Renooij 257). Generally, a distinction is made between direct and indirect methods.
With direct methods, experts are asked to directly express their degree of belief as a number, be it a probability, a frequency or an odds ratio. For expressing probabilities, however, people find words more appealing than numbers. This is probably because the vagueness of words captures the uncertainty they feel about their probability assessment; the use of numerical probabilities can produce considerable discomfort and resistance among those not used to it (Renooij). In addition, since directly assessed numbers tend to be biased, various indirect elicitation methods have also been developed.
With these methods, an expert is asked not for a direct assessment, but for a decision from which his degree of belief is conditional (Renooij). A complicating factor, as noted by Clemen, is that everything is conditional on the decision maker. Moreover, the issue not only involves the decision maker’s information about the events or variables of interest, but the possibility of dependence between this information and the experts’ information. Even without these complications, the decision maker’s perception of the experts (e.
g. , whether they are calibrated, whether there is dependence among the experts, whether cognitive biases are influencing the probabilities) plays an important role in the modeling process (Clemen). Bayesian belief networks are rooted in traditional subjective probability theory, which builds on the foundation of Pascalian calculus (Kramosil). In subjective probability theory, the probability of a proposition represents the degree of confidence an individual has about that proposition’s truth.
This matches quite well to our knowledge base of information from a human expert in addition to his or her subjective beliefs about the accuracy of that information. Before Bayesian belief networks are described, we must begin with the fundamentals of probability theory. Let A be some event within the context of all possible events E, within some domain, such that A 0 E and E is the event space. The probability of A occurring is denoted by P(A). P(A) is the probability assigned to A prior to the observation of any evidence and is also called the apriori probability.
This probability must conform to certain laws. First, the probability must be non-negative and must also be less than one; therefore, ?A? E,0? P(A)? 1 (1) A probability of 0 means the event will not occur while a probability of 1 means the event will always occur. Second, the total probability of the event space is 1 or in other words the sum of the probabilities of all of the events Ai in E must equal 1. ?A? E,? -A_i =1 (2) Finally, we consider the compliment of A, 5 A, which is all events in E except for A. From equation (2) we then get:
P(A)+P(? A)=1 (3) Now consider another event in E, B such that E 0 B. The probability that event A will occur given that event B has occurred is called the conditional probability of A given B and is represented by P(A | B). The probability that both A and B will occur is called the joint probability and is defined by P(A 1 B). P(A | B) is defined in terms of the joint probability of A and B by: P(A | B)=(P(A ? B))/(P(A | B)) (4) Equation (4) can be further manipulated to yield Bayes Rule: P(A | B)=(P(B | A) X P(A ))/(P(B)) (5)
If these two events are independent, in that the occurrence of one event has no effect on the occurrence of the other, then P(A | B) = P(A) and P (B | A) = P (B). If we manipulate equation 5 still further we get: P(A | B)=(P(B | A) X P(A ))/([P(B | A) X P(A )]+[P(B | A) X P(? A )]) (6) This lays the foundation for managing and manipulating uncertainty using probability theory in expert systems. It allows us to turn a rule around and calculate the conditional probability of A given B from the conditional probability of B given A.
Some of the advantages of Bayesian belief networks are that the representation is visual and easy to understand. It is also relatively straightforward to implement as the methodology for combining uncertainty follows set rules and procedures. Probability theory is a well-refined method for dealing with knowledge of unknown certainty. Bayesian belief networks still have some problems. They require large numbers of probabilities that must be obtained from the human expert. The number of probabilities is dependent on the complexity of the conditional dependencies in the domain.
They also cannot represent cycles (eg. A implies B and B implies A) or infinite loops would occur during inference. Additionally, because the sum of all possible states must equal 1, when evidence reinforces the belief in some possible world, it correspondingly decreases our belief in all other worlds. This is not necessarily the case in real life. Bayesian networks require us to make certain artificial assumptions about the independence of information/events leading to counter intuitive, possibly incorrect results. The CDF describes the probability distribution of a random variable X.
For every real number x, the distribution function of X is defined by: F(x) = P(X ? x) (7) where the right of x represents the probability that X takes on a value less than or equal to x and the left of x represents the probability that X takes on a value greater than x. The probability that X lies in the interval [a, b] is, therefore, F(b) – F(a) if a < b (Ayyub). In expert environments, how often the random variable is above a particular level. This is referred to “the exceedance question” and is necessary for the correlation with Evidence theory.
This graphical analysis called the complementary cumulative distribution function (CCDF), which can be defined by: Fc(x) = P(X > x) = l – F(x) (8) CCDF curve is typically obtained by sampling based techniques and are, therefore, approximate. The complementary nature of the CCDF results in the right of x representing the probability that X takes on a value greater than or equal to x and the left of x representing the probability that X takes on a value less than x. Analyses based on a Non-Probabilistic Approach Dempster-Shafer Theory.
The advantages of Dempster-Shafer theory lie in its ability to better represent ignorance as well as its structure allowing evidence supporting one possible world to not necessarily detract from belief in all other worlds. The disadvantages occur because of its implementational complexity and the requirement for exhaustive enumeration of all possible combinations of hypotheses. Dempster Shafer theory also lacks an effective methodology for extracting inferences. Before an analysis is performed, the relationship among the Fuzzy Measures must be explained.
According to Klir and Yuan it is obvious from their mathematical properties that possibility, necessity, and probability measures do not overlap with one another except for one very special measure, characterized by only one focal element, which is called a singleton. Probability theory coincides with the sub-areas of Evidence Theory in which Belief measures and Plausibility measures are equal. The differences in mathematical properties of these theories make each theory suitable for modeling certain types of uncertainty and less suitable for modeling others which is shown in Figure 2 (Klir and Yuan).
Figure 2. Relationship between plausibility, probability and belief (adapted from Klir and Yuan) Is fuzzy logic better science than probability? No, it is a different science. Fuzzy logic and probability offer solutions to slightly different classes of problems. Fuzzy logic allows engineers to make explicit precision-versus-cost trade-offs. A fuzzy logistician would embrace the vagueness and make a model; if the model did not work, he would learn from the failure and build a better model (Almond).
Dubois used decision-maker uncertainty, which only require bounded, linearly ordered, valuation sets for expressing uncertainty and preferences, which is a testable descriptive approach of possibility theory. In this framework, pessimistic (uncertainty adverse) and optimistic attitudes can be captured (Dubois). Evidence Theory Dempster-Shafer Theory (DST) was started by Arthur Dempster in the 1960’s and expanded by Glen Shafer in the 1970’s (Dempster; Shafer). Dempster felt there was a need for a new system of dealing with uncertainty because of two shortcomings he saw with the probability theory.
The Evidence theory can be defined as a mathematical model that establishes upper and lower limits of likelihood – plausibility and belief respectively (Kramosil). There are three important functions in Dempster-Shafer theory: the basic probability assignment function (BPA or m), the Belief function (Bel), and the Plausibility function (Pl). The basic probability assignment (BPA) is a primitive of evidence theory. Generally speaking, the term “basic probability assignment” does not refer to probability in the classical sense.
The BPA, represented by m, defines a mapping of the power set to the interval between 0 and 1, where the BPA of the null set is 0 and the summation of the BPA’s of all the subsets of the power set is 1. The value of the BPA for a given set A (represented as m(A)), expresses the proportion of all relevant and available evidence that supports the claim that a particular element of X (the universal set) belongs to the set A but to no particular subset of A (Klir and Wierman). The value of m(A) pertains only to the set A and makes no additional claims about any subsets of A.
Any further evidence on the subsets of A would be represented by another BPA, i. e. B ? A, m(B) would the BPA for the subset B. Formally, this description of m can be represented with the following three equations: m: P (X) ? [0,1] (9) m(? ) =0 (10) ?_(A? P(X))-? m(A)? =1 (11) where P(X) represents the power set of X, ? is the null set, and A is a set in the power set (A? P(X)) (Klir and Wierman). Some researchers have found it useful to interpret the basic probability assignment as a classical probability, such as (Chokr & Kreinovich), and the framework of Dempster-Shafer theory can support this interpretation.
The theoretical implications of this interpretation are well developed in (Kramosil). This is a very important and useful interpretation of Dempster-Shafer theory but it does not demonstrate the full scope of the representational power of the basic probability assignment. As such, the BPA cannot be equated with a classical probability in general. From the basic probability assignment, the upper and lower bounds of an interval can be defined. This interval contains the precise probability of a set of interest (in the classical sense) and is bounded by two no additive continuous measures called Belief and Plausibility.
The lower bound Belief for a set A is defined as the sum of all the basic probability assignments of the proper subsets (B) of the set of interest (A) (B ? A). The upper bound, Plausibility, is the sum of all the basic probability assignments of the sets (B) that intersect the set of interest (A) (B ? A ?? ). Formally, for all sets A that are elements of the power set (A? P(X)), the following equations apply (Klir and Wierman): Bel (A) =? _(B | B? A)-? m(B)? (12) Pl (A) =? _(B | B 1A?? )-? m(B)? (13) The two measures, Belief and Plausibility are non-additive.
It is possible to obtain the basic probability assignment from the Belief measure with the following inverse function: m (A)=? _(B | B? A)-?? (-1)? ^(|A-B|) Bel(B)? (14) where |A-B| is the difference of the cardinality of the two sets. In addition to deriving these measures from the basic probability assignment (m), these two measures can be derived from each other. For example, Plausibility can be derived from Belief m the following way: Pl(A)= 1-Bel(A ? ) (15) where A is the classical complement of A. This definition of Plausibility in terms of Belief comes from the fact that all basic assignments must sum to 1.
Bel (A ? ) = ? _(B | B? A ? )-? m(B)=? ?_(B | B 1A?? )-? m(B)? (16) ?_(B | B? A ? )-? m(B)=? 1-? _(B | B 1A?? )-? m(B)? (17) From the definitions of Belief and Plausibility, it follows that Pl(A) = 1 – Bel(A ? ). As a consequence of Equations 14 and 15, given any one of these measures (m(A), Bel(A), Pl(A)), it is possible to derive the values of the other two measures. The precise probability of an event (in the classical sense) lies within the lower and upper bounds of Belief and Plausibility, respectively. Bel(A) = P(A) = Pl(A) (18)
The probability is uniquely determined if Bel(A) = Pl(A). In this case, which corresponds to classical probability, all the probabilities, P(A) are uniquely determined for all subsets A of the universal set X (Yager). Otherwise, Bel(A) and Pl(A) may be viewed as lower and upper bounds on probabilities respectively, where the actual probability is contained in the interval described by the bounds. Upper and lower probabilities derived by the other frameworks in generalized information theory cannot be directly interpreted as Belief and Plausibility functions (Dubois and Prade).
In summary, Basic Belief Assignment (BBA) is not probability, but just a belief in a particular proposition irrespective of other propositions. The BBA structure gives the flexibility to express belief for possible propositions with partial and insufficient evidence and also avoids our making excessive or baseless assumptions in assigning our belief to propositions (Bae and Grandhi). Conclusion The previous work by others includes probability theory, which is a well-researched and practiced methodology that provides the mathematical structure traditionally used in the representation of aleatory and epistemic uncertainty.
The probabilistic uncertainties in analysis outcomes are represented with probability distributions and are typically summarized as cumulative distribution functions (CDF) and complimentary cumulative distribution functions (CCDF). The most familiar technique is the Monte Carlo simulation. On the other hand, the extension of the efforts to define the development of a more robust system is the Evidence theory. Evidence theory provides a promising alternative to probability theory.
It allows for a fuller representation of the implications of uncertainty as compared to a probabilistic representation of uncertainty. Evidence theory can handle not only aleatory uncertainty but epistemic uncertainty as well. As the probability of a given occurrence increases, the uncertainty logically will decrease. Probability theory and Evidence theory are comparable methodologies; however, they are conceptually inverse functions. This paper suggests that the assessment of uncertainty in expert environments may be better conveyed by using both probabilistic and non-probabilistic theories.
Works Cited Agarwal, H. , et al. d “Uncertainty quantification using evidence theory in multidisciplinary design optimization”. Reliability Engineering and System Safety 85 (2004): 281-294 Almond, R. G. “Discussion: Fuzzy Logic: Better Science? Or Better Engineering? ” Technometrics 37. 3 (1995): 267-270 Ayyub, B. M. Elicitation of Expert Opinions for Uncertainty and Risks. Boca Raton, FL: CRC Press, 2001. Bae, H. & Grandhi, R. V. “Uncertainty Quantification of Structural Response Using Evidence Theory”, American Institute of Aeronautics and Astronautics Journal, 41.
10 (October 2003): 2062-2068. Chokr, B. & V. Kreinovich. “How far are we from complete knowledge? Complexity of knowledge acquisition in the Dempster-Shafer approach. ” Advances in the Dempster-Shafer Theory of Evidence. R. R. Yager, J. Kacprzyk and M. Fedrizzi. New York, John Wiley & Sons, Inc. , 1994. 555-576. Clemen, R. T. “Calibration and the Aggregation of Probabilities” Management Science 32 (1986): 312-314. Dempster, A. P. “Upper and lower probabilities induced by multivalued mapping. ” Annals of Mathematical Statistics 38. 2 (1967): 325-339.Sample Essay of Eduzaurus.com