Bayesianism: Compound Plausibilities

[This post is part of a series on Bayesian epistemology; see index here]
.
The last assumption of the core of Bayesianism is that the plausibility of (logically) compound propositions depends, in a particular way, on the plausibilities of the propositions that constitute them. I will write it in the following form* (using Boolean Algebra):
.
Assumption 3: Compound Plausibilities: The plausibility of the logical conjunction (“A and B” or “AB”) and the logical dijunction (“A or B” or “A+B”) is a universal function of the plausibilities that constitute it, and their complements, under all relevant conditions of knowledge.
(A+B|X)=F[(A|X),(B|X),(A|X),(B|X),(A|B,X),(B|A,X),(A|B,X),(B|A,X),(A|B,X),(B|A,X),(A|B,X),(B|A,X)]
(AB|X)=G[(A|X),(B|X),(A|X),(B|X),(A|B,X),(B|A,X),(A|B,X),(B|A,X),(A|B,X),(B|A,X),(A|B,X),(B|A,X)]
.
The functions are “universal” in the sense that they do not depend on the content of the propositions or the domain of discourse. The claim is that the plausibility of logical conjunction or disjunction – and thereofre of every complicated consideration of the basic propositions – depends on the plausibilities of the basic propositions, not on what they’re talking about.The assumption of universality is clearly correct in the cases of total certainty or denial. If A is true and B false, for example, we know that A+B is true – regardless of the content of the claims A and B, the topic they discuss, or so on. It is less clear why universality should be maintained for intermediate degrees of certainty. Some** suggest to consider it a hypothesis – let’s assume that there are general laws of thought, and let’s see what these are.

Another aspect of assumption 3 is that the universal functions depend only on their components. But assuming the functions are universal – what else can they depend on? They can only depend on some plausibilities. They cannot depend on the plausibility of an unrelated claim, for then it will not be possible to identify it in different domains of discourse. They must depend at least on the plausibilities of their components as they do under the extreme cases of utter certainty or rejection. It is perhaps possible to conjecture that in addition they may depend on some other compund proposition that is composed out of the basic propositions of the conjunction/disjunction, but this would surely be very strange. The decomposition into constituents therefore appears very simple and “logical” – I don’t know of any that object to it.

Let us proceed, then, under assumption 3.

It is a cumbersome assumption as each function depends on lots of variables. Fortunately, we can reduce their number. Consider the case where B is the negation of A, that is B=A. In this case
(A+B|X)=F[(A|X),(A|X),(A|X),(A|X),F,F,T,(A|X),T,T,F,F]
so that F depends on only two variables, (A|X) and (A|X). On the other hand, logic dictates that this plausibility must have a constant value,
(A+B|X)=(A+A|X)=(T|X)=vT
Assuming that the universal function F is not constant, the only way we can maintain a constant value when we change (A|X) is to change (A|X) simultaneously. We are forced to conclude that the plausibility of a proposition is tied to that of the proposition’s negation by a universal function,

.

Theorem 3.1: The plausibility of A is tied to the plausibility of its negation by a universal function, (A|X)=S(A|X).
.
We will determine S explicitly later. For now, it is enough that it exists.

Something very important just happened – from the assumption that there are genreal rules for thought, we concluded that the plausibility of the negation of a proposition (A|X) is measured by the plausibility of the claim itself (A|X). It is therefore enough to just keep track of one plausibility, (A|X), to asses both. As we have said previously, this is an inherent part of the Bayesian analysis, and we see here that it is derived directly from the assumption of universality. The main alternative theory, the Dempster-Shafer theory, considers the measure of support that propositions have and requires a separate measure for the support of the proposition’s negation. The existence of S implies that Dempster-Shafer must reject the universality of their own theory! There cannot be a universal way to determine the support for compound propositions from the support we have for the basic propositions, and even within a particular domain if this can be done then the theory only reverts back to the Baysian one. Unsurprisingly, Shafer indeed doubts the existence of general rules of induction.

Let’s move on. The existence of S allows us to throw out the complements A and B from the parameters of the universal functions, as they are themselves a function of the propositions that they complement (A and B).

.

Theorem 3.2: Simple Shapes: The universal functions F and G can be written without an explicit dependence on the plausibilities of the complements.
(A+B|X)=F[(A|X),(B|X),(A|B,X),(B|A,X),(A|B,X),(B|A,X)]
(AB|X)=G[(A|X),(B|X),(A|B,X),(B|A,X),(A|B,X),(B|A,X)]
.
The functions are still rather complicated, but they can be made even simpler. Consider the case where A is a tautology. In this case there is no meaning for the expression (B|A,X) – no information in the world can determine that a tautology is wrong. This expression is just not defined. But that plausibility of (AB|X)=(B|X) must still be well defined! There must therefore be a way to write G in a way that does not depend on the undefined variable (B|A,X). We should notice here that the parallel variable (B|A,X) is actually well defined in this case, so G might still depend on it. A similar situation occurs for the pair of variables (A|B,X) and (A|B,X). We can therefore conclude that we can write the universal functions in a manner that does not depend on half of each of these pairs.
.
Theorem 3.3: Simpler Forms: The universal functions F and G can be written without explicit dependence on the information that A or B are wrong.
(A+B|X)=F[(A|X),(B|X),(A|B,X),(B|A,X)]
(AB|X)=G[(A|X),(B|X),(A|B,X),(B|A,X)]
.
These forms cannot be used when A or B are contradictions, but otherwise they should be applicable. With just four variables, they are simple enough to server as a basis from which we can prove Cox’s Theorem – the foundation of Bayesianism.
.
* My variant on this assumption is somewhat more general than that usually given.

.

** For example, van Horn.

Advertisements

Bayesianism: Consistent Plausibilities

[This post is part of a series on Bayesian epistemology; see index here]

In the previous post we set the foundational assumption of Bayesianism, which is that plausibility can be measured by a Real number. We now put forward the second assumption: consistency. We will assume that the rational plausibility of logically-equivalent propositions must be equal.

Assumption 2: Consistency: If A and B are (logically) identical propositions, then their plausibilities must be equal (A|X)=(B|X). They must also have the same affect as information, (C|A,X)=(C|B,X).
This is a trivial assumption – treating identical things differently is the height of irrationality. I don’t think anyone can object to it.

All tautologies are always true and therefore logically identical to each other and to the truth (T). Since nothing can be more certain than the truth, they must receive the maximal plausibility value (regardless of any information). The situation is similar in regards to contradictions, which are always false (F).

Theorem 2.1: If A is a tautology, then (A|X)=(T|X)=vT, where vT is the highest possible plausibility value, representing absolute certainty. If A is a contradiction, then (A|X)=(F|X)=vF, where vF is the lowest possible plausibility value, representing total rejection of the proposition.

In light of this, we can rewrite the definition we have provided for “further infomration” in a simpler manner:
Theorem 2.2: The plausibility (A|A,X) of proposition A under information A and the further information that A is correct, is that of the truth (A|A,X)=T. The plausibility (A|A,X) of proposition A under information X and the further information that A is false, is that of falsehood (A|A,X)=F.

These are pretty obvious results. The importancce of demanding consistency will be made more obvious soon, as we discuss complex propositions.

Bayesianism: Real Plausibility

[This post is part of a series on Bayesian epistemology; see Index]In a previous post we established that under Bayesianism we are examining the truth of a set of propositions. Perhaps the fundamental assumption of Bayesianism is that the degree of belief in the truth of each proposition can be represented by a Real number.

Assumption 1: Real Plausibility: The plausibility of a proposition A under information X can be represented by a Real number (A|X) so that larger plausibility corresponds to a larger number.

Why should we accept such an assumption? Well, because of what we want to use this kind of reasoning for.

We want to be able to contemplate the truth values of propositions. This means that we want to allow being uncertain of its value, but also to allow certainty. The “plausibility” mentioned in assumption 1 is a measure of how certain we are that the proposition is true. Clearly, we want to allow absolute certainty as the highest value, but also allow lower levels of certainty*.

We also want to be able to compare the plausibilities of different propositions. We want to be able to say things like “Proposition A is very likely, and proposition B is plausible, but proposition C isn’t serious”. The demand to be able to compare the plausibility of all propositions is the demand of strong ordering – that we could arrange any two propositions in an order, from less to more plausible.

A Real number allows strong ordering. In principle, it is possible to achieve strong ordering with weaker assumptions. Using just fractions instead of real numbers would suffice, for example. But it is unclear why we would generally only want to consider fractions. The same is true for only a few, discrete, levels of certainty. For a general theory of how to think about plausibility, it appears much more sensible to allow more leeway in choosing the values, and allow any real number.

Another critique of using real numbers is that it is too “exact”. Degrees of belief shouldn’t be considered so precisely; they should be fuzzy, vague things. I find this objection unconvincing. It is true that “fuzzy” ordering makes sense for human thinking, in that we might say that e.g. A is more likely than B but we’re not certain if it’s superior to C. But I’m not sure if that is a result of the way we think or of our ignorance of how our own brain computes, and at any case see no reason for such fuzziness in a perfectly rational agent. Exactness is a virtue, not a flaw.

I was careful to define plausibility as the degree of belief that the proposition is true. But what about our uncertainty over whether the proposition is false? Bayesians maintain that a separate measure is not needed for that – it is already inherent in ‘plausibility’. Maximum plausibility implies absolute certainty that the proposition is true, and therefore minimum certainty, in any meaningful sense, that the proposition is false: we will not be willing to bet on it being false, or in any way act on the assumption that there is even the slightest chance that it is. Maximum plausibility of A therefore implies minimum certainty that A is false. Putting this argument in reverse, minimum plausibility of A must therefore imply absolute certainty that “not A” is true, i.e. that A is false.

Those that do not accept such lines of thought may want to keep track of our degree of certainty of “not A” separately from our degree of certainty in “A”. They can perhaps argue that minimal certainty is a state of ignorance, not of knowledge, and that one should not confuse the data held about a proposition with its evaluation for the purpose of taking a distinct action. This kind of approach is taken by the Dempster-Shafer theory, which is perhaps the major contender to Bayesianism. It can be understood as keeping track not of our degree of certainty in, but rather of our support for the truth of propositions. If we denote “no support for proposition A” as s(A)=0 and “demonstrative proof that proposition A is true” as s(A)=1, we have a Real measure that characterises the proposition must like the Bayesian “plausibility” does. But we have a separate such measure for “not A”. The support for A is not simply a mirror image for the support of not-A; it may be that both lack support (s(A)=s(not A)=0), or that we have a little evidence in favor of both (e.g. s(A)=0.2, s(not-A)=0.1), or so on. We can think the of difference from “1” (1-s(A)-s(not A)) as a measure of our “ignorance” on whether A is true or not.

Note that such a two-valued approach implies that there is no strong order, as having several “plausibility” measures (s(A) and s(not-A)) prohibits simply comparing our state of knowledge about proposition A using a “greater than” relation. One may attempt to consider theories with even more parameters (e.g. reliability of A) attached to every belief, and these too will violate strong order.

Who is right, Shafer or Bayes? I am not certain (huh!), but I am inclined to support the Bayesians to the extent that for choosing between options the single-measure of Baysianism seems needed, so I think ultimately Dempster-Shafer should reduce to Bayesianism once one generates measures of “certainty” from it. At any rate, Bayesianism discusses only a single measure of belief, “certainty” or “plausibility”, and we will proceed from now on under this assumption.

Another thing we want to be able to do with plausibilities is to change our beliefs given new information. This is why “under information X” should be part of the definition of plausibility. We have not clarified just what this “information” means, and we won’t yet – that will wait for another post. But I do want to define at this stage what “further information” means.

I will mark the plausibility of A given information X and the further information that B is true as (A|B, X). And I will demand that information that a proposition is true (or false) will change the plausibility of that proposition accordingly.

Definition 1: Further Information: The plausibility (A|A,X) of A given information X and the further information that A is true, is that of truth (A|A,X)=(T|A,X). The plausibility (A|A,X) of A given information X and the further information that that A is false is that of falsehood (A|A,X)=(F|A,X).

I have here written the logical value of “True” as T, and the logical value of “False” as F. I have marked “not A” by an underline, A. I am implicitly assuming that A can i principle be both true or false – no information in the world can make a tautology false, or a contradiction true.

I have written that the “demand” is a definition, because it really only elaborates on what is meant by “information”. We will return to information later, and provide a set of assumptions that underlies its use. The reader may, however, choose to view this “definition” as another assumption, if he wishes. I find it difficult to raise any arguments against it, once Assumption 1 is accepted – clearly, given this information these are the degrees of certainty that are implied.

* The choice of a larger Real value for larger plausibility is merely a convention. We could have chosen the opposite – lower values for great plausibility – and it would work just as well. The important part of the definition is that plausibilities could be arranged with a “greater than” relation; the direction is just a convention.

Bayesianism: Logical Propositions

[This post is part of a series on Bayesian epistemology; see Index]

Is it true that the Earth is round? Does Newtonian mechanics accurately describe the celestial mechanics? Before we answer such questions it seems that we first need to assume that they are meaningful. We need to assume that such propositions are true or false, in the sense that they accurately (or not) reflect and describe reality*. This is the first assumption of Baysianism.

Assumption 0: Domain of Discourse: We are concerned with the truth value of a set of basic propositions (A,B,C….) about the world, each of which can be either True or False, and with the truth of all the complex propositions that can be constructed from them.

Most explanations of Bayesianism don’t even bother listing this as an assumption, but it actually isn’t quite as trivial as it appears to be**. The most important objection to it is that thinking of descriptions of the world – propositions about it – as either “true” or “false” is generally simplistic and naive. The Earth isn’t really round – it’s shape is somewhat oblong, it vibrates, and if you insist on perfect accuracy you will find no physical object has a well-defined shape due to quantum effects. But clearly, saying that the Earth is round is more correct than saying that it is flat. The “truth” of a statement is therefore not a simple matter of “true” or “false”. There are grades of truth, grades of accuracy. The same description can even be less or more true in different ways (e.g. more accurate, but less complete). And there is good cause to think that we would never have a complete and fully accurate description of reality, Truth with a capital “T”, as that would require an infinitely detailed description and our minds (and records) are finite.

While all of this is true, it is nevertheless clearly useful to think in such dichotomies. Indeed, even when we aren’t we’re really only thinking about the truth of more elaborate propositions, like “It is more accurate to say that the Earth is round than that it is flat”. Even in computer applications and mathematical derivations attempting to make use of more “fuzzy” thinking*** the programs and theorems are written down as they “truly” are, not just fuzzily.

I therefore believe that this objection ultimately fails in practice. Bayesian thinking indeed refers to rough dichotomies, but in many cases and in many ways such thinking is perfectly valid and acceptable. The limited nature of this discourse, however, should be borne in mind, and ideally we should find ways to adapt and change our discourse as greater accuracy and precision are needed.

Another objection along these lines is that meaning is holistic. This line of thought objects to the entire attempt to determine the truth of some set of individual propositions. The proposition “the Earth is round”, for example, is meaningless on its own, without an understanding of what “Earth”, “round”, and so on mean. Meaning is only conferred by a network of intertwined concepts, and judging the truth of individual propositions in isolation is therefore impossible. Instead of trying to ascertain the truth of propositions, we should be considering the mental structures and representations underlying them.

I find this objection to be even weaker than the first. While meaning is indeed holistic, there is nothing wrong with approaching the problem of knowledge by considering specific propositions. They still represent truth claims about the world, and as such can be judged to be true or false. Understanding how to consider logically- and conceptually-related propositions should certainly be a concern, but this does not undermine the Bayesian approach.

The division into basic and complex propositions is not problematic, as far as I can see. The moment you admit to a domain of discourse containing true and false claims, you automatically can generate complex claims from it and it is only reasonable to want to consider their truth as well. As we shall see, it is precisely the construction of this complex structure that allows Bayesianism to proceed.

* This is the “Correspondence Theory of Truth”, and implicitly presumes “Realism”. Formally at least, Baysianism can be advanced without this metaphysical burden – merely having two truth values is enough, regardless of your theory of truth. In practice, however, I (and every Baysian I’ve read about) uses Truth in the correspondence sense and believes in Realism, so I’ll leave it at that.

** The only one I know that discusses it, however, is Mark Colyvan, in “The philosophical significance of Cox’s theorem”.

*** Allowing truth values to be continuous between 0 and 1, rather than 0 or 1, leads to “fuzzy logic”.

Bayesianism: Introduction

How to tell what’s true? What should we believe in? One growingly influential approach can be called Bayesian epistemology. It claims, in broad strokes, that rational beliefs and changing belief should conform to a mathematically precise probabilistic framework.

I admit I was rather skeptical of it, but after reading the first chapters of E.T. Jaynes’ book “Probability Theory: The Logic of Science” I’ve given it more respect and thought and now think that at least on some levels these Bayesians may be right. So I’ve decided to write this series of posts to examine it more closely. I will be focusing on examining its underlying assumptions and, therefore, its validity – but also on deriving its conclusions at a level of rigor and in a manner that appeals to me.

This post serves as an index to the series. I will update it as more posts are added. The intended series can be divided into several parts:

Cox Theorem: Proving the ‘rational’ beliefs must conform to probability theory.

  1. Logical Propositions
  2. Real Plausibility
  3. Consistent Plausibilities
  4. Compound Plausibilities

[and more…]

The Pressupositions of Science

What are the presuppositions of science? What does it assume from the get-go?

Of course in practice scinece is a human endeavour that is pursued by a host of scientists, each bringing their own presuppositions and assumptions (often – unknowingly) to the table. At the descriptive level, therefore, science won’t have any clear list of presuppositions as-such.

But at the philosophical level, we can wonder what science needs to get done, what needs to be assumed in order to even attempt to do science.

Gauch has some answers. He maintains that all that is needed is the acceptence of “common sense” thinking, like “pedestrians get hit by cars”, and the implicit assumptions behind them. And that what this boils down to is the belief that reality is Ordered and Comprehensible – that there is an Order that underlies the variety in reality, and that we can Comprehend this order and therefore reality by the use of reason. This fundamental assumption can be broken down into more detailed ones in various ways, but it is better to just keep the two-concept bottom line: Science pressuposed a Comprehensible Order.

I think Gauch is mistaken on both counts. Science does not presuppose that the world is orderly nor that it is comprehensible. Instead, science requires a specific list of “transcendental” assumptions, and science in practice also uses a supplementary list vague “theories”, broad hypothesis that seem (scientifically) to work. Two of these are that the world is ordered and comprehensible.

To ascertain the presuppositions of science, we first need to etablish what it is. I shall define it as “The attempt, by limited semi-rational agents, to infer the content of reality”.

Given this definition, there are a number of things that must be true in the world for it to be possible to attempt to conduct science, and to acheive scientific progress. These are:

1. Realism: There must be a reality containing all those agents that are attempting to do science. Science further assumes that this reality is independent of the beliefs of these agents about it (with the trivial exception that it contains those agents with those beliefs). This is an expediency, but is not strictly needed.

2. Rationality: Science assumes that the rational examination of data can infer the truth, or at least get close to it. For the purposes of this post, I’ll be thinking of rationality in Baysian terms – ‘rationality’ is taken to be the application of Bayesian inference.

3. Time Flow: Rationality (at least as understood above) operates in time. It is therefore requires that time flows for the agents, that they would be embedded in a stream of time.

I emphasize that this is not the same as assuming that all of reality is within time, or that time “flows” in some metaphysical sense, or that it changes in the same way at evey place, or so on. All that is being assumed is how time relates to each agent – each agent must function within (its own?) time.

4. Input Channel: The application of rationality by each agent (‘scientist’) requires that it have an input channel, receiving more and more inputs as time progresses.

5. Reliable Memory: As processing (for finite agents) takes time, one automatically needs a memory to process even a single experience. One also needs a memory, however, to analyze patterns in time. And to receive information with content exceeding the input channel’s bandwidth. A perfect memory isn’t required – just a sufficiently good one.

6. Reliable Inputs: The input one gets should reflect some part of reality, or else it cannot be used to analyze it. Similar types of events should lead to similar experiences.

This does not presume that the sense-impressions we have are accurate reflections of reality. It doesn’t even persume spatiality. The things one’s input corresponds to at different times define what the agent’s “environment” is – the environment are those things that the input “represents”.

7. Reliable Rationality: The agent’s reasoning should approach that or an ideal rational agent to a degree that will allow him to pursue the rational analysis of his inputs.

These are, as far as I can see, all the things that need to be in-place for agents to start doing science. There are two more things, however, that are required for science to work well.

I. Order. Science works to the extent that patterns exist. It works best in a reality that has uniform simple patterns, so that inferrences from part of it and in the past would hold true in different parts and in the future, and that these patterns would be easy to discover. In other words, science works best if there are simple universal uniform laws of nature.

But science does not presuppose such uniformity. Rationality requires belief in the simplest, most uniform construction that fits the data so far – but this is not that same as presupposing that such a structure exists. Contra Gauch, instead of presupposing Order science hypothesizes it.

II. Causality. Experiments definitely do require causality, as do lots of scientific models and explanations. But causality is merely one type of pattern within reality, one kind of structure to be inferred. One could in principle infer that there are no causal structures except those presupposed above. That would be extremely odd, surely, but that’s a different matter.

If it wasn’t for this leeway, science would never have discovered quantum mechanics. In QM science inferred non-causal patterns in existence.

To this list one can add numerous secondary assumptions and heuristics, like the Copernican principle, General Covariance, or so on. But these are all secondary “theories” and ideas, not the fundamental core of science. In my opinion.

So, by my reckoning science does not presume that the world is orderly. It hopes that it is, and will succeed to the extent that it is, but that is not the same thing. Likewise, science does not presume the world is comprehensible. It hopes that it is, and will succeed to the extent that it is easy to comprehend the world, but that is not the same thing.

Science similarly doesn’t assume that causality holds, or that time flows, or any such metaphysical principles – except to the extent that its presuppositions assume the limited applicability of such concepts to the agent’s own mental processing and interactions with the environment. The belief that such concepts extend to beyond these presumptions is (at most) a rational belief inferred by science, not presupposed by it.