Metamath Proof Explorer The Weak Deduction Theorem |
||
Mirrors > Home > MPE Home > The Weak Deduction Theorem |
Contents of this page |
A Theorem fine is Deduction, For it allows work-reduction: To show "A implies B", Assume A and prove B; Quite often a simpler production. --Stefan Bilaniuk A Problem Course in Mathematical Logic, 2003 [external] |
On this page we attempt to show you how this technical stuff works if you are interested. We also discuss the conditional operator "if(𝜑, 𝐴, 𝐵)" that you will find in these proofs.
In many situations today MPE no longer uses the approach described here. We instead tend to use deduction form and natural deduction. However, the approach described here is still valid, and it may be interesting to you.
con3i | ⊢ (𝜑 → 𝜓) ⇒ ⊢ (¬ 𝜓 → ¬ 𝜑) |
This means that if we have proved a theorem of the form ⊢ (𝜑 → 𝜓) (where 𝜑 and 𝜓 are any wffs), then we can conclude the theorem ⊢ (¬ 𝜓 → ¬ 𝜑). Here, the symbol "⊢ " is just a place-holder that can be read as "a proof exists for". The big arrow connecting the hypothesis and conclusion isn't a formal mathematical symbol; I put it there to suggest informally the relationship between hypothesis and conclusion. To gain an informal feel for what the contraposition inference says, suppose we agree with the English statement, "If it is raining, there are clouds in the sky." From contraposition we can conclude, "If there are no clouds in the sky, it is not raining."
An example of a theorem is the law of contraposition:
con3ALT | ⊢ ((𝜑 → 𝜓) → (¬ 𝜓 → ¬ 𝜑)) |
This looks almost the same as the deduction, except the hypothesis and conclusion are connected together with the formal mathematical symbol for "implies". We don't have to prove a hypothesis; instead it is a standalone statement that is unconditionally true. Even though it looks similar, it is "stronger" than the deduction in a fundamental way.
⊢ 𝜎 ⇒ ⊢ 𝜏 |
and we want to then prove a theorem of the form
⊢ (𝜎 → 𝜏) |
Doing this is not as simple as you might think. The deduction says, "if we can prove 𝜎 then we can prove 𝜏," which is in some sense weaker* than saying "𝜎 implies 𝜏." Now, there is no axiom of logic that permits us to directly obtain the theorem given the deduction. However, it is possible to make use of information contained in the deduction or its proof to assist us with the proof of the theorem, and this is what the standard deduction (meta)theorem and a related version that we use are all about.
On the other hand, if we have the theorem, it is easy to recover the deduction using modus ponens ax-mp. An example of this is the derivation of the inference simpli from the theorem simpl.
*For example, the conversion of a deduction to its theorem form does not hold in general for quantum propositional calculus, which is a weak subset of classical propositional calculus. In fact, it has been shown that adding the Standard Deduction Theorem (below) to quantum propositional calculus turns it into classical propositional calculus!
In ordinary mathematics, no one actually carries out the algorithm, because (in its most basic form) it involves an exponential explosion of the number of proof steps as more hypotheses are eliminated. Instead, the Standard Deduction Theorem is invoked simply to claim that it can be done in principle, without actually doing it.
Actually, the Standard Deduction Theorem is not quite as simple as it might first appear. There is a subtle restriction that must be taken into account when working with predicate calculus. If the Axiom of Generalization ax-gen was used in the proof of the original deduction (or anywhere in the potentially large tree of lemmas building up to it) to quantify over a free variable contained in one of its hypotheses, then converting that hypothesis to an antecedent may fail. The following example shows why. When 𝑥 and 𝑦 are distinct variables, the following (silly) deduction can be proved in set theory:
⊢ 𝑥 = 𝑦 ⇒ ⊢ 𝑥 ≠ 𝑦 |
The reason this deduction holds is that the hypothesis, which effectively states (using ax-gen) "for all 𝑥 and 𝑦, 𝑥 = 𝑦 is true", can never be proved as a standalone theorem. In fact it is provably false (by dtru), so any conclusion whatsoever follows from it. However, if we naively apply the Standard Deduction Theorem and blindly convert the hypothesis to an antecedent, the theorem form of this deduction:
Wrong! ⊢ (𝑥 = 𝑦 → 𝑥 ≠ 𝑦) |
("if two things are equal then they are not equal") is obviously absurd!
Note. The way we have presented the Standard Deduction Theorem above is slightly different from that in most textbooks, which use a list of hypotheses to the left of the the turnstile symbol ⊢ . This is discussed further in the Logic, Metalogic,... section below.
There is a much more efficient method for proving a theorem from a deduction that can be used in many (but not all) cases. We call it the Weak Deduction Theorem. (There is also an unrelated "Weak Deduction Theorem" in the field of relevance logic, so to avoid confusion we could call ours the "Weak Deduction Theorem for Classical Logic".) Unlike the Standard Deduction Theorem, the Weak Deduction Theorem produces the theorem directly from a special substitution instance of the deduction, using a small, fixed number of steps roughly proportional to the length of the final theorem. The number of additional steps is completely independent of the size of the original proof, so in practice it can be applied to very advanced theorems without a corresponding increase in proof size. There are also no restrictions on the use of the Axiom of Generalization that may cause us to stumble: how the original deduction was proved is irrelevant.
The Weak Deduction Theorem is used frequently in the Metamath Proof Explorer, especially in set theory, in order to make formal proofs more compact. Here we describe it in some detail since it is not necessarily intuitive, nor has it ever been published.
What are its limitations? Specifically, we must be able to prove a special case of the deduction hypothesis as a standalone theorem. This means it cannot be used in all cases—for example, it cannot be used when the hypothesis is a contradiction such as (𝜑 ∧ ¬ 𝜑). However, in the case of the contraposition deduction above, by substituting 𝜑 for 𝜓 in the hypothesis ⊢ (𝜑 → 𝜓), we obtain the trivially proved identity theorem id, ⊢ (𝜑 → 𝜑). This special case can be used (together with some auxiliary lemmas) to eliminate the hypothesis and introduce it as an antecedent.
Does this limitation cause us inconvenience? In Metamath's set theory development, a modified version for class variables (see the set theory version below) is used extensively. For most practical deductions we can almost always display an easy-to-prove special case of a hypothesis, since otherwise the deduction tends not to be of much use. For example, if the hypothesis requires a class variable to be a nonzero real number, the special case of the number 1 will satisfy the hypothesis.
An exception is a deduction for which we can never display an actual example satisfying its hypothesis but can show only the existence of a set satisfying it. Theorems involving the Axiom of Choice are sometimes like this. In such a case, the Weak Deduction Theorem cannot be used. Therefore we must settle for a longer direct proof of the theorem form, but so far I haven't encountered a case where this has been impractical.
Is this to say that the Standard Deduction Theorem is of little use? Definitely not! When outlining proofs prior to entering and verifying them with the Metamath program, I informally use the Standard Deduction Theorem implicitly and extensively just like any mathematician would. When satisfied with the informal outline, I then work out the details necessary to create a (hopefully short) formal proof either directly or with the help of the Weak Deduction Theorem.
By the way, the Standard and Weak Deduction Theorems are not "theorems" in the sense described in the "Deductions vs. Theorems" section above but are "metatheorems" (or more precisely "metametatheorems," since all Metamath theorems are actually theorem schemes) whose proofs require metamathematics outside of the axiom system. Specifically, they are both algorithms for generating actual proofs. However, once the Weak Deduction Theorem generates a proof, questions about the justification and rigor of its metamathematics become irrelevant, because the proof it produces is independently verified by the Metamath proof verifier. Where a proof came from, whether written by a human or generated by some algorithm, is of no importance to the verifier. The same cannot be said about the Standard Deduction Theorem in the way it is normally used, since it does not produce a practical proof that can be verified directly but only shows that a proof is possible in principle.
con3i | {(𝜑 → 𝜓)} ⊢ (¬ 𝜓 → ¬ 𝜑) |
In order to talk conveniently about deductions in general, the set of hypotheses may represented by a metavariable such as Γ that ranges over sets of wffs, with a conclusion represented by a metavariable such as 𝜏 ranging over wffs:
Γ ⊢ 𝜏 |
For example, Theorem 7 in [Margaris] p. 76 states (using our notation),
If Γ⊢ 𝜏, then Γ ⊢ ∀𝑥𝜏, provided that 𝑥 is not free in Γ. |
There are actually three kinds of implications in this statement. First we have any implication arrow → (not shown) that might be present in the wff represented by 𝜏. At the next higher level, we have the connection between Γ and 𝜏 that is represented by the turnstile ⊢ . Finally at the top-most level, we have the connection between the two deductions of the theorem, represented by the English phrase "If...then". We might think of these three as belonging to the logic, the metalogic, and the metametalogic respectively.
The expression between the "if" and "then" above is usually called a "premise" rather then a "hypothesis". The Standard Deduction Theorem does not provide for the elimination of premises, only of hypotheses. Viewed in this way, there is actually a fundamental difference between the Standard Deduction Theorem and the Weak Deduction Theorem: the former applies to hypotheses, and the latter applies to premises. In other words, they apply to different "meta" levels.
The set.mm database doesn't have a way of representing deductions in the way we have just described. However, the "natural deduction" database nat.mm [external], developed by Frédéric Liné, shows how this can be done in the Metamath language. Under such a system, Margaris's theorem would be represented in the database with a premise ($e statement) for the expression "Γ⊢ 𝜏" and with a provable conclusion ($p statement) for the expression "Γ⊢ ∀𝑥𝜏".
Another suggestion: If you are familiar with a little set theory, you may want to skip ahead directly to the Weak Deduction Theorem in set theory, which may be easier to understand. We never use the Weak Deduction Theorem in propositional calculus except in the contrived example con3ALT created for the next section.
The auxiliary lemmas we need for the Weak Deduction Theorem in propositional calculus are elimh and dedt. The first takes the special case (such as id above) as its third hypothesis and builds a strange-looking theorem from it. This strange-looking theorem is a substitution instance of the hypothesis we want to convert to an antecedent. We then apply the strange-looking theorem to the hypothesis of the deduction, resulting in a (still strange-looking) substitution instance of its conclusion. Finally, we take this result and apply it to the second hypothesis of dedt, and magically the desired theorem pops out as the conclusion of dedt, with the original hypothesis transformed into an antecedent. Below we show these auxiliary lemmas.
df-ifp | ⊢ (if-(𝜑, 𝜓, 𝜒) ↔ ((𝜑 ∧ 𝜓) ∨ (¬ 𝜑 ∧ 𝜒))) |
elimh | ⊢ ((if-(𝜒, 𝜑, 𝜓) ↔ 𝜑) → (𝜏 ↔ 𝜒)) & ⊢ ((if-(𝜒, 𝜑, 𝜓) ↔ 𝜓) → (𝜏 ↔ 𝜃)) & ⊢ 𝜃 ⇒ ⊢ 𝜏 |
dedt | ⊢ ((if-(𝜒, 𝜑, 𝜓) ↔ 𝜑) → (𝜏 ↔ 𝜃)) & ⊢ 𝜏 ⇒ ⊢ (𝜒 → 𝜃) |
In lemma elimh, the wff 𝜒 is the hypothesis of the original deduction. The wff 𝜃 is the special case (such as id in our example above) of that hypothesis. The wff variable 𝜑 is the deduction's hypothesis that is replaced with wff variable 𝜓 to result in the special case. The first two hypotheses of elimh are trivially-proved technical hypotheses that specify the substitution instances we need. The first one tells elimh that when the wff if-(𝜒, 𝜑, 𝜓) is substituted for the wff variable 𝜑 in the wff 𝜒 (the hypothesis of the original deduction), the result is the wff 𝜏. The "substitution" of if-(𝜒, 𝜑, 𝜓) for 𝜓 (in the wff 𝜃 to result in the wff 𝜏) specified by the second hypothesis of elimh is special — not all occurrences of the wff variable 𝜓 are replaced, but only the ones at the variable positions replaced in the first elimh hypothesis (Step 6 of the example in the next section will help clarify this).
The conclusion of elimh is applied to the hypothesis of a matching substitution instance of the original deduction, which in turn produces as its conclusion a wff that is applied to the second hypothesis of lemma dedt.
In lemma dedt, the wff 𝜒 is the hypothesis of the original deduction, and the wff 𝜃 is the conclusion of the original deduction. The first hypothesis of dedt tells it that when the wff if-(𝜒, 𝜑, 𝜓) is substituted for the wff variable 𝜑 in the wff 𝜃, the result is the wff 𝜏. The conclusion of dedt, the wff (𝜒 → 𝜃), is the desired theorem.
It may be hard to understand intuitively how the Weak Deduction Theorem works just by looking at elimh and dedt. In a later section, we give a proof that presents the algorithm in detail and should be easier to understand. Our two lemmas just encapsulate the algorithm into a form convenient to use.
Steps 1 through 6 of con3ALT build the technical substitution instances that we need. In general these substitution instances are quite easy to prove by repeatedly applying "formula-builder" lemmas such as notbid and imbi1d. Note that we use the term "substitution" in a nonstandard way in the description of Step 6 below, since not all occurrences of the thing being substituted for are replaced.
Step 3 proves that the substitution of if-((𝜑 → 𝜓), 𝜓, 𝜑) for 𝜓 in the contraposition conclusion wff (¬ 𝜓 → ¬ 𝜑) results in (¬ if-((𝜑 → 𝜓), 𝜓, 𝜑) → ¬ 𝜑). This is assigned to the first hypothesis of dedt.
Step 4 proves that the substitution of if-((𝜑 → 𝜓), 𝜓, 𝜑) for 𝜓 in the contraposition hypothesis wff (𝜑 → 𝜓) results in (𝜑 → if-((𝜑 → 𝜓), 𝜓, 𝜑)). This is assigned to the first hypothesis of elimh.
Step 6 proves that the substitution of if-((𝜑 → 𝜓), 𝜓, 𝜑) for only the second 𝜑 in the special case wff (𝜑 → 𝜑) results in (𝜑 → if-((𝜑 → 𝜓), 𝜓, 𝜑)). Unlike a "normal" substitution, we replaced only the second one because that was the position where the substitution took place in Step 4. This is assigned to the second hypothesis of elimh.
In practice, the Weak Deduction Theorem is not used very often in our propositional calculus development because direct proofs that build on a previously proved hierarchy of theorems are usually more efficient. In fact, our proof of the contraposition theorem con3ALT was contrived just for the purpose of this example. Compare the more efficient direct proof of the same theorem in con3.
To make things more efficient, we introduce a special notation called the "conditional operator" and denoted by "if(𝜑, 𝐴, 𝐵) " which combines a wff and two classes to produce another class. See definition df-if. Read "if(𝜑, 𝐴, 𝐵)" as "if 𝜑 is true then the value is 𝐴 else the value is 𝐵." It is just an abbreviation for a more complicated class expression like other such abbreviations (class union, etc.). This abbreviation makes working with the Weak Deduction Theorem a little easier, and like all definitions, in principle it can be eliminated by expanding out the abbreviation at each occurrence in a proof. (It also has many other applications in addition to its use with the Weak Deduction Theorem.)
Next, there are two basic lemmas involving the conditional operator that we use for the Weak Deduction Theorem. The lemma elimhyp is intended to take a provable special case of the hypothesis to be eliminated and transform it into a special substitution instance of that hypothesis. This in turn is fed into the deduction to be eliminated, and the deduction's output is then fed into the lemma dedth to produce the final theorem.
The last hypothesis of each of the following lemmas is the important one. The other hypotheses just specify the substitution instances we need and are are always easily constructed using simple equality laws. Using each last hypothesis as the "input" to each lemma, the proof of a theorem from a deduction follows this outline: [provable special case] -> elimhyp -> [known deduction] -> dedth -> [deduction converted to theorem].
An auxiliary lemma, keephyp, can be used if we wish to eliminate some hypotheses but keep others.
df-if | ⊢ if(𝜑, 𝐴, 𝐵) = {𝑥 ∣ ((𝑥 ∈ 𝐴 ∧ 𝜑) ∨ (𝑥 ∈ 𝐵 ∧ ¬ 𝜑))} |
elimhyp | ⊢ (𝐴 = if(𝜑, 𝐴, 𝐵) → (𝜑 ↔ 𝜓)) & ⊢ (𝐵 = if(𝜑, 𝐴, 𝐵) → (𝜒 ↔ 𝜓)) & ⊢ 𝜒 ⇒ ⊢ 𝜓 |
dedth | ⊢ (𝐴 = if(𝜑, 𝐴, 𝐵) → (𝜓 ↔ 𝜒)) & ⊢ 𝜒 ⇒ ⊢ (𝜑 → 𝜓) |
keephyp | ⊢ (𝐴 = if(𝜑, 𝐴, 𝐵) → (𝜓 ↔ 𝜃)) & ⊢ (𝐵 = if(𝜑, 𝐴, 𝐵) → (𝜒 ↔ 𝜃)) & ⊢ 𝜓 & ⊢ 𝜒 ⇒ ⊢ 𝜃 |
In addition to the above lemmas, a number of variations are also proved for convenient use. These allow us to eliminate several hypotheses simultaneously, to eliminate hypotheses where multiple class variables must be substituted simultaneously with specific classes in order to obtain the provable special case, and to eliminate hypotheses that occur in frequently-used specific forms. They can be browsed by clicking on dedth then clicking on the "Related theorems" link. It is easiest to learn how they work by looking at examples of their use, which can be found in the "This theorem is referenced by" links on their web pages.
Here is an example of the class version of the Weak Deduction Theorem in action. In the set.mm database you will find that we have proved the result
renegcli | ⊢ 𝐴 ∈ ℝ ⇒ ⊢ -𝐴 ∈ ℝ |
which means, if you have a proof that A is a real number (belongs to the set of reals), then you will also have a proof that its negative is a real number. (The proof of renegcli is not relevant to us right now, and you can ignore it.) The application of the Weak Deduction Theorem yields (and this is the proof you'll want to look at):
renegcl | ⊢ (𝐴 ∈ ℝ → -𝐴 ∈ ℝ) |
which means that "𝐴 is a real number" implies "the negative of 𝐴 is a real number". In the proof, we made use of elimel, which is a frequently used special case of elimhyp. Also, we made use of the special case that one is a real number 1re to eliminate the hypothesis of elimel (and thus elimhyp from which it is derived).
In this example, we did not make use of keephyp. For an example that uses it (via keepel), see the proof of divcan2zi.
In propositional calculus, the deduction theorem for one hypothesis states (where A and B are wffs): If A |- B then |- A -> B (We will ignore the converse, being a trivial application of modus ponens. In the description that follows, all letters refer to fixed wffs or to variables ranging over wffs—"wff variables"—as indicated.) Here we show a different kind of construction for the deduction theorem that, unlike the standard Tarski/Herbrand deduction theorem, does not require rewriting the proof of the deduction but simply adds a fixed number of steps (roughly proportional to formula length) to a proof for A |- B to result in a proof for |- A -> B. It is an unusual application of classical logic in that it involves a kind of self-reference wherein a formula is substituted into itself. It is important to note that this version does not replace the standard one because it is weaker. In particular, we can't eliminate a hypothesis that is a contradiction: for example we can't use the theorem to derive |- (¬A ^ A) -> B from ¬A ^ A |- B. We must be able to prove separately a special case of the hypothesis. For example, if we are given (A -> B) |- (¬B -> ¬A) and want to prove |- (A -> B) -> (¬B -> ¬A), the theorem will give us the proof provided we also have a proof for the special case |- (B -> B), where wff "(B -> B)" is the hypothesis "(A -> B)" with B substituted for A. Now for the theorem. For our notation, we state axioms, theorems, and proof steps as _schemes_ where the variables range over wffs. If F is a wff, and A is a wff variable contained in F, let us denote F by F(A). If G is another wff, then F(G) is the wff obtained by substituting all occurrences of wff variable A with wff G. (More precisely, we first replace variable A in F with a new wff variable not occurring elsewhere, then we substitute G for that new wff variable. This eliminates any ambiguity when G itself contains A.) For example, if F == F(A) = A v B and G = A ^ C, we have F(G) = (A ^ C) v B. Also, F(F(A)) = F(F) = (A v B) v B, F(F(F(A))) = ((A v B) v B) v B, etc. I hope this notation is clear. Theorem ------- Assume F |- G. Denote F by F(A) where A is some wff variable in F, and assume there is a wff B such that |- F(B). Then, in classical logic, |- F -> G. Proof ----- We will use F and F(A) interchangeably, and we also will denote G by G(A). In classical logic, we have |- F -> (A <-> ((A ^ F) v (B ^ ¬F))) (1) |- ¬F -> (B <-> ((A ^ F) v (B ^ ¬F))) (2) Also in classical logic, for any wffs P, Q, R, S, T, we have If |- P -> (Q <-> R) then |- P -> (¬Q <-> ¬R) If |- P -> (Q <-> R) and |- P -> (S <-> T) then |- P -> ((Q -> S) <-> (R -> T)) corresponding to the primitive connectives ¬ and ->. Using these, and (1) and (2) as the basis, we derive by induction on the formula length of F (with ^, v, etc. implicitly expanded into ¬ and -> primitives) |- F -> (F(A) <-> F((A ^ F) v (B ^ ¬F))) (3) |- ¬F -> (F(B) <-> F((A ^ F) v (B ^ ¬F))) (4) From (3) we have (since F = F(A) by definition) |- F -> F((A ^ F) v (B ^ ¬F)) (5) Since |- F(B) by hypothesis, from (4) we have |- ¬F -> F((A ^ F) v (B ^ ¬F)) (6) Combining (5) and (6), we obtain |- F((A ^ F) v (B ^ ¬F)) (7) By hypothesis F(A) |- G(A). Substituting ((A ^ F) v (B ^ ¬F)) for A gives F((A ^ F) v (B ^ ¬F)) |- G((A ^ F) v (B ^ ¬F)) (8) Since (7) satisfies the hypothesis of (8) we have |- G((A ^ F) v (B ^ ¬F)) (9) By induction on the formula length of G, from (1) we derive |- F -> (G(A) <-> G((A ^ F) v (B ^ ¬F))) (10) From (9) and (10) we finally obtain |- F -> G Q.E.D. Example: Suppose we have a proof for (A -> B) |- (¬B -> ¬A) and also a proof of |- (B -> B), which is a special case of the hypothesis. Let F(A) = (A -> B) and G(A) = (¬B -> ¬A). Then F(B) = (B -> B), and the theorem gives us |- (A -> B) -> (¬B -> ¬A). The theorem provides us with the algorithm for constructing the proof in detail if we wish. For example the proof step corresponding to (7) becomes |- (A ^ (A -> B)) v (B ^ ¬(A -> B))) -> B. It is easy to see that the number of steps in the final proof is equal to the number of steps in the proof of the original deduction and the special case, plus (primarily) a number of steps proportional to formula length used to derive (3), (4), and (10). The theorem can be extended to eliminate multiple hypotheses, as well as variants that keep some hypotheses while eliminating others. |
There are very few alternatives to the Standard Deduction Theorem in the literature. Historically, it is possible that once the Standard Deduction Theorem was discovered, mathematicians lost interest in alternate techniques, since they were no longer considered important. A mathematician can show, in one high-level application of the Standard Deduction Theorem, that a proof exists without displaying the actual proof. The Metamath proof verifier does not have that luxury (with the Hilbert-style axiom system of the set.mm database) unless we were to greatly complicate its language and proof verification engine. That would be contrary to the philosophical aim of Metamath, so efficient actual proofs are important to us. But techniques such as those in Principia Mathematica (published before the discovery of the Standard Deduction Theorem) have become a kind of lost art.
This page was last updated on 21-Sep-2022.
Copyright terms: Public domain |
W3C HTML validation [external] |