Description:
Here are some of the conventions we use in the Metamath Proof
Explorer (MPE, set.mm), and how they correspond to typical textbook
language (skipping the many cases where they are identical).
For more specific conventions, see:
Notation.
Where possible, the notation attempts to conform to modern
conventions, with variations due to our choice of the axiom system
or to make proofs shorter. However, our notation is strictly
sequential (left-to-right). For example, summation is written in the
form Σ𝑘 ∈ 𝐴𝐵 (df-sum 15326) which denotes that index
variable 𝑘 ranges over 𝐴 when evaluating 𝐵. Thus,
Σ𝑘 ∈ ℕ(1 / (2↑𝑘)) = 1 means 1/2 + 1/4 + 1/8 + ...
= 1 (geoihalfsum 15522).
The notation is usually explained in more detail when first introduced.
Axiomatic assertions ($a).
All axiomatic assertions ($a statements)
starting with " ⊢ " have labels starting
with "ax-" (axioms) or "df-" (definitions). A statement with a
label starting with "ax-" corresponds to what is traditionally
called an axiom. A statement with a label starting with "df-"
introduces new symbols or a new relationship among symbols
that can be eliminated; they always extend the definition of
a wff or class. Metamath blindly treats $a statements as new
given facts but does not try to justify them. The mmj2 program
will justify the definitions as sound as discussed below,
except for four of them (df-bi 206, df-clab 2716, df-cleq 2730, df-clel 2817)
that require a more complex metalogical justification by hand.
Proven axioms.
In some cases we wish to treat an expression as an axiom in
later theorems, even though it can be proved. For example,
we derive the postulates or axioms of complex arithmetic as
theorems of ZFC set theory. For convenience, after deriving
the postulates, we reintroduce them as new axioms on
top of set theory. This lets us easily identify which axioms
are needed for a particular complex number proof, without the
obfuscation of the set theory used to derive them. For more, see
mmcomplex.html 2817. When we wish
to use a previously-proven assertion as an axiom, our convention
is that we use the
regular "ax-NAME" label naming convention to define the axiom,
but we precede it with a proof of the same statement with the label
"axNAME" . An example is the complex arithmetic axiom ax-1cn 10860,
proven by the preceding theorem ax1cn 10836.
The Metamath program will warn if an axiom does not match the preceding
theorem that justifies it if the names match in this way.
Definitions (df-...).
We encourage definitions to include hypertext links to proven examples.
Statements with hypotheses.
Many theorems and some axioms, such as ax-mp 5, have hypotheses that
must be satisfied in order for the conclusion to hold, in this case min
and maj. When displayed in summarized form such as in the "Theorem
List" page (to get to it, click on "Nearby theorems" on the ax-mp 5
page), the hypotheses are connected with an ampersand and separated from
the conclusion with a double right arrow, such as in
" ⊢ 𝜑 & ⊢ (𝜑 → 𝜓) ⇒ ⊢ 𝜓". These symbols are not part of
the Metamath language but are just informal notation meaning "and" and
"implies".
Discouraged use and modification.
If something should only be used in limited ways, it is marked with
"(New usage is discouraged.)". This is used, for example, when something
can be constructed in more than one way, and we do not want later
theorems to depend on that specific construction.
This marking is also used if we want later proofs to use proven axioms.
For example, we want later proofs to
use ax-1cn 10860 (not ax1cn 10836) and ax-1ne0 10871 (not ax1ne0 10847), as these
are proven axioms for complex arithmetic. Thus, both
ax1cn 10836 and ax1ne0 10847 are marked as "(New usage is discouraged.)".
In some cases a proof should not normally be changed, e.g., when it
demonstrates some specific technique.
These are marked with "(Proof modification is discouraged.)".
New definitions infrequent.
Typically, we are minimalist when introducing new definitions; they are
introduced only when a clear advantage becomes apparent for reducing
the number of symbols, shortening proofs, etc. We generally avoid
the introduction of gratuitous definitions because each one requires
associated theorems and additional elimination steps in proofs.
For example, we use < and ≤ for inequality expressions, and
use ((sin‘(i · 𝐴)) / i) instead of (sinh‘𝐴)
for the hyperbolic sine.
Minimizing axiom dependencies.
We prefer proofs that depend on fewer and/or weaker axioms, even if
the proofs are longer. In particular, because of the non-constructive
nature of the axiom of choice df-ac 9803, we prefer proofs that do not use
it, or use weaker versions like countable choice ax-cc 10122 or dependent
choice ax-dc 10133. An example is our proof of the Schroeder-Bernstein
Theorem sbth 8833, which does not use the axiom of choice. Similarly,
any theorem in first-order logic (FOL) that contains only setvar
variables that are all mutually distinct, and has no wff variables, can
be proved without using ax-10 2139 through ax-13 2372, by using ax10w 2127
through ax13w 2134 instead.
We do not try to similarly reduce dependencies on definitions, since
definitions are conservative (they do not increase the proving power of
a deductive system), and are introduced in order to be used to increase
readability). An exception is made for Definitions df-clab 2716,
df-cleq 2730, and df-clel 2817, since they can be considered as axioms under
some definitions of what a definition is exactly (see their comments).
Alternate proofs (ALT).
If a different proof is shorter or clearer but uses more or stronger
axioms, we make that proof an "alternate" proof (marked with an ALT
label suffix), even if this alternate proof was formalized first.
We then make the proof that requires fewer axioms the main proof.
Alternate proofs can also occur in other cases when an alternate proof
gives some particular insight. Their comment should begin with
"Alternate proof of ~ xxx " followed by a description of the
specificity of that alternate proof. There can be multiple alternates.
Alternate (*ALT) theorems should have "(Proof modification is
discouraged.) (New usage is discouraged.)" in their comment and should
follow the main statement, so that people reading the text in order will
see the main statement first. The alternate and main statement comments
should use hyperlinks to refer to each other.
Alternate versions (ALTV).
The suffix ALTV is reserved for theorems (or definitions) which are
alternate versions, or variants, of an existing theorem. This is
reserved to statements in mathboxes and is typically used temporarily,
when it is not clear yet which variant to use. If it is decided that
both variants should be kept and moved to the main part of set.mm, then
a label for the variant should be found with a more explicit suffix
indicating how it is a variant (e.g., commutation of some subformula,
antecedent replaced with hypothesis, (un)curried variant, biconditional
instead of implication, etc.). There is no requirement to add
discouragement tags, but their comment should have a link to the main
version of the statement and describe how it is a variant of it.
Old (OLD) versions or proofs.
If a proof, definition, axiom, or theorem is going to be removed, we
often stage that change by first renaming its label with an OLD suffix
(to make it clear that it is going to be removed). Old (*OLD)
statements should have
"(Proof modification is discouraged.) (New usage is discouraged.)" and
"Obsolete version of ~ xxx as of dd-Mmm-yyyy." (not enclosed in
parentheses) in the comment. An old statement should follow the main
statement, so that people reading the text in order will see the main
statement first. This typically happens when a shorter proof to an
existing theorem is found: the existing theorem is kept as an *OLD
statement for one year. When a proof is shortened automatically (using
the Metamath program "MM-PA> MINIMIZE__WITH *" command), then it is not
necessary to keep the old proof, nor to add credit for the shortening.
Variables.
Propositional variables (variables for well-formed formulas or wffs) are
represented with lowercase Greek letters and are generally used
in this order:
𝜑 = phi, 𝜓 = psi, 𝜒 = chi, 𝜃 = theta,
𝜏 = tau, 𝜂 = eta, 𝜁 = zeta, and 𝜎 = sigma.
Individual setvar variables are represented with lowercase Latin letters
and are generally used in this order:
𝑥, 𝑦, 𝑧, 𝑤, 𝑣, 𝑢, and 𝑡.
Variables that represent classes are often represented by
uppercase Latin letters:
𝐴, 𝐵, 𝐶, 𝐷, 𝐸, and so on.
There are other symbols that also represent class variables and suggest
specific purposes, e.g., 0 for a zero element (e.g., fsuppcor 9093)
and connective symbols such as + for some group addition operation
(e.g., grprinvd ).
Class variables are selected in alphabetical order starting
from 𝐴 if there is no reason to do otherwise, but many
assertions select different class variables or a different order
to make their intended meaning clearer.
Turnstile.
"⊢ ", meaning "It is provable that", is the first token
of all assertions
and hypotheses that aren't syntax constructions. This is a standard
convention in logic. For us, it also prevents any ambiguity with
statements that are syntax constructions, such as "wff ¬ 𝜑".
Biconditional (↔).
There are basically two ways to maximize the effectiveness of
biconditionals (↔):
you can either have one-directional simplifications of all theorems
that produce biconditionals, or you can have one-directional
simplifications of theorems that consume biconditionals.
Some tools (like Lean) follow the first approach, but set.mm follows
the second approach. Practically, this means that in set.mm, for
every theorem that uses an implication in the hypothesis, like
ax-mp 5, there is a corresponding version with a biconditional or a
reversed biconditional, like mpbi 229 or mpbir 230. We prefer this
second approach because the number of duplications in the second
approach is bounded by the size of the propositional calculus section,
which is much smaller than the number of possible theorems in all later
sections that produce biconditionals. So although theorems like
biimpi 215 are available, in most cases there is already a theorem that
combines it with your theorem of choice, like mpbir2an 707, sylbir 234,
or 3imtr4i 291.
Quantifiers.
The quantifiers are named as follows:
- ∀: universal quantifier (wal 1537);
- ∃: existential quantifier (df-ex 1784);
- ∃*: at-most-one quantifier (df-mo 2540);
- ∃!: unique existential quantifier (df-eu 2569).
The phrase "uniqueness quantifier" is avoided since it is ambiguous:
it can be understood as claiming either uniqueness (∃*) or unique
existence (∃!).
Substitution.
The expression "[𝑦 / 𝑥]𝜑" should be read "the formula that
results from the proper substitution of 𝑦 for 𝑥 in the formula
𝜑". See df-sb 2069 and the related df-sbc 3712 and df-csb 3829.
Is-a-set.
" 𝐴 ∈ V" should be read "Class 𝐴 is a set (i.e., exists)."
This is a convention based on Definition 2.9 of [Quine] p. 19.
See df-v 3424 and isset 3435.
However, instead of using 𝐼 ∈ V in the antecedent of a theorem for
some variable 𝐼, we now prefer to use 𝐼 ∈ 𝑉 (or another
variable if 𝑉 is not available) to make it more general. That way
we can often avoid extra uses of elex 3440 and syl 17 in the common case
where 𝐼 is already a member of something. For hypotheses
($e statement) of theorems (mostly in inference form), however,
⊢ 𝐴 ∈ V is used rather than ⊢ 𝐴 ∈ 𝑉 (e.g., difexi 5247).
This is because 𝐴 ∈ V is almost always satisfied using an
existence theorem stating " ... ∈ V", and a hard-coded V in
the $e statement saves a couple of syntax building steps that substitute
V into 𝑉. Notice that this does not hold for hypotheses of
theorems in deduction form: Here still ⊢ (𝜑 → 𝐴 ∈ 𝑉) should be
used rather than ⊢ (𝜑 → 𝐴 ∈ V).
Converse.
The symbol " ◡" denotes the converse of a relation, so
" ◡𝑅" denotes the converse of the class 𝑅, which is
typically a relation in that context (see df-cnv 5588). The converse of a
relation 𝑅 is sometimes denoted by R-1 in textbooks,
especially when 𝑅 is a function, but we avoid this notation since it
is generally not a genuine inverse (see f1cocnv1 6729 and funcocnv2 6724 for
cases where it is a left or right-inverse). This can be used to define
a subset, e.g., df-tan 15709 notates "the set of values whose cosine is a
nonzero complex number" as (◡cos “ (ℂ ∖ {0})).
Function application.
The symbols "(𝐹‘𝑥)" should be read "the value
of (function) 𝐹 at 𝑥" and has the same meaning as the more
familiar but ambiguous notation F(x). For example,
(cos‘0) = 1 (see cos0 15787). The left apostrophe notation
originated with Peano and was adopted in Definition *30.01 of
[WhiteheadRussell] p. 235, Definition 10.11 of [Quine] p. 68, and
Definition 6.11 of [TakeutiZaring] p. 26. See df-fv 6426.
In the ASCII (input) representation there are spaces around the grave
accent; there is a single accent when it is used directly,
and it is doubled within comments.
Infix and parentheses.
When a function that takes two classes and produces a class
is applied as part of an infix expression, the expression is always
surrounded by parentheses (see df-ov 7258).
For example, the + in (2 + 2); see 2p2e4 12038.
Function application is itself an example of this.
Similarly, predicate expressions
in infix form that take two or three wffs and produce a wff
are also always surrounded by parentheses, such as
(𝜑 → 𝜓), (𝜑 ∨ 𝜓), (𝜑 ∧ 𝜓), and
(𝜑 ↔ 𝜓)
(see wi 4, df-or 844, df-an 396, and df-bi 206 respectively).
In contrast, a binary relation (which compares two _classes_ and
produces a _wff_) applied in an infix expression is _not_
surrounded by parentheses.
This includes set membership 𝐴 ∈ 𝐵 (see wel 2109),
equality 𝐴 = 𝐵 (see df-cleq 2730),
subset 𝐴 ⊆ 𝐵 (see df-ss 3900), and
less-than 𝐴 < 𝐵 (see df-lt 10815). For the general definition
of a binary relation in the form 𝐴𝑅𝐵, see df-br 5071.
For example, 0 < 1 (see 0lt1 11427) does not use parentheses.
Unary minus.
The symbol - is used to indicate a unary minus, e.g., -1.
It is specially defined because it is so commonly used.
See cneg 11136.
Function definition.
Functions are typically defined by first defining the constant symbol
(using $c) and declaring that its symbol is a class with the
label cNAME (e.g., ccos 15702).
The function is then defined labeled df-NAME; definitions
are typically given using the maps-to notation (e.g., df-cos 15708).
Typically, there are other proofs such as its
closure labeled NAMEcl (e.g., coscl 15764), its
function application form labeled NAMEval (e.g., cosval 15760),
and at least one simple value (e.g., cos0 15787).
Another way to define functions is to use recursion (for more details
about recursion see below). For an example of how to define functions
that aren't primitive recursive using recursion, see the Ackermann
function definition df-ack 45894 (which is based on the sequence builder
seq, see df-seq 13650).
Factorial.
The factorial function is traditionally a postfix operation,
but we treat it as a normal function applied in prefix form, e.g.,
(!‘4) = ;24 (df-fac 13916 and fac4 13923).
Unambiguous symbols.
A given symbol has a single unambiguous meaning in general.
Thus, where the literature might use the same symbol with different
meanings, here we use different (variant) symbols for different
meanings. These variant symbols often have suffixes, subscripts,
or underlines to distinguish them. For example, here
"0" always means the value zero (df-0 10809), while
"0g" is the group identity element (df-0g 17069),
"0." is the poset zero (df-p0 18058),
"0𝑝" is the zero polynomial (df-0p 24739),
"0vec" is the zero vector in a normed subcomplex vector space
(df-0v 28861), and
"0" is a class variable for use as a connective symbol
(this is used, for example, in p0val 18060).
There are other class variables used as connective symbols
where traditional notation would use ambiguous symbols, including
"1", "+", "∗", and "∥".
These symbols are very similar to traditional notation, but because
they are different symbols they eliminate ambiguity.
ASCII representation of symbols.
We must have an ASCII representation for each symbol.
We generally choose short sequences, ideally digraphs, and generally
choose sequences that vaguely resemble the mathematical symbol.
Here are some of the conventions we use when selecting an
ASCII representation.
We generally do not include parentheses inside a symbol because
that confuses text editors (such as emacs).
Greek letters for wff variables always use the first two letters
of their English names, making them easy to type and easy to remember.
Symbols that almost look like letters, such as ∀,
are often represented by that letter followed by a period.
For example, "A." is used to represent ∀,
"e." is used to represent ∈, and
"E." is used to represent ∃.
Single letters are now always variable names, so constants that are
often shown as single letters are now typically preceded with "_"
in their ASCII representation, for example,
"_i" is the ASCII representation for the imaginary unit i.
A script font constant is often the letter
preceded by "~" meaning "curly", such as "~P" to represent
the power class 𝒫.
Originally, all setvar and class variables used only single letters
a-z and A-Z, respectively. A big change in recent years was to
allow the use of certain symbols as variable names to make formulas
more readable, such as a variable representing an additive group
operation. The convention is to take the original constant token
(in this case "+" which means complex number addition) and put
a period in front of it to result in the ASCII representation of the
variable ".+", shown as +, that can
be used instead of say the letter "P" that had to be used before.
Choosing tokens for more advanced concepts that have no standard
symbols but are represented by words in books, is hard. A few are
reasonably obvious, like "Grp" for group and "Top" for topology,
but often they seem to end up being either too long or too
cryptic. It would be nice if the math community came up with
standardized short abbreviations for English math terminology,
like they have more or less done with symbols, but that probably
won't happen any time soon.
Another informal convention that we have somewhat followed, that is
also not uncommon in the literature, is to start tokens with a
capital letter for collection-like objects and lower case for
function-like objects. For example, we have the collections On
(ordinal numbers), Fin, Prime, Grp, and we have the functions sin,
tan, log, sup. Predicates like Ord and Lim also tend to start
with upper case, but in a sense they are really collection-like,
e.g., Lim indirectly represents the collection of limit ordinals,
but it cannot be an actual class since not all limit ordinals
are sets.
This initial upper versus lower case letter convention is sometimes
ambiguous. In the past there's been a debate about whether
domain and range are collection-like or function-like, thus whether
we should use Dom, Ran or dom, ran. Both are used in the literature.
In the end dom, ran won out for aesthetic reasons
(Norm Megill simply just felt they looked nicer).
Typography conventions.
Class symbols for functions (e.g., abs, sin)
should usually not have leading or trailing blanks in their
HTML representation.
This is in contrast to class symbols for operations
(e.g., gcd, sadd, eval), which usually do
include leading and trailing blanks in their representation.
If a class symbol is used for a function as well as an operation
(according to Definition df-ov 7258, each operation value can be
written as function value of an ordered pair), the convention for its
primary usage should be used, e.g., (iEdg‘𝐺) versus
(𝑉iEdg𝐸) for the edges of a graph 𝐺 = 〈𝑉, 𝐸〉.
LaTeX definitions.
Each token has a "LaTeX definition" which is used by the Metamath
program to output tex files. When writing LaTeX definitions,
contributors should favor simplicity over perfection of the display, and
should only use core LaTeX symbols or symbols from standard packages; if
packages other than amssymb, amsmath, mathtools, mathrsfs, phonetic,
graphicx are needed, this should be discussed. A useful resource is
The Comprehensive LaTeX
Symbol List.
Number construction independence.
There are many ways to model complex numbers.
After deriving the complex number postulates we
reintroduce them as new axioms on top of set theory.
This lets us easily identify which axioms are needed
for a particular complex number proof, without the obfuscation
of the set theory used to derive them.
This also lets us be independent of the specific construction,
which we believe is valuable.
See mmcomplex.html 7258 for details.
Thus, for example, we don't allow the use of ∅ ∉ ℂ,
as handy as that would be, because that would be
construction-specific. We want proofs about ℂ to be independent
of whether or not ∅ ∈ ℂ.
Minimize hypotheses.
In most cases we try to minimize hypotheses, so that the statement be
more general and easier to use. There are exceptions. For example, we
intentionally add hypotheses if they help make proofs independent of a
particular construction (e.g., the contruction of the complex numbers
ℂ). We also intentionally add hypotheses for many real and
complex number theorems to expressly state their domains even when they
are not needed. For example, we could show that
⊢ (𝐴 < 𝐵 → 𝐵 ≠ 𝐴) without any hypotheses, but we require that
theorems using this result prove that 𝐴 and 𝐵 are real numbers,
so that the statement we use is ltnei 11029. Here are the reasons as
discussed in https://groups.google.com/g/metamath/c/2AW7T3d2YiQ 11029:
- Having the hypotheses immediately shows the intended domain of
applicability (is it ℝ, ℝ*, ω, or something else?),
without having to trace back to definitions.
- Having the hypotheses forces the intended use of the statement,
which generally is desirable.
- Many out-of-domain values are dependent on contingent details of
definitions, so hypothesis-free theorems would be non-portable and
"brittle".
- Only a few theorems can have their hypotheses removed in this
fashion, due to coincidences for our particular set-theoretical
definitions. The poor user (especially a novice learning, e.g., real
number arithmetic) is going to be confused not knowing when hypotheses
are needed and when they are not. For someone who has not traced back
the set-theoretical foundations of the definitions, it is seemingly
random and is not intuitive at all.
- Ultimately, this is a matter of consensus, and the consensus in
the group was in favor of keeping sometimes redundant hypotheses.
Natural numbers.
There are different definitions of "natural" numbers in the literature.
We use ℕ (df-nn 11904) for the set of positive integers starting
from 1, and ℕ0 (df-n0 12164) for the set of nonnegative integers
starting at zero.
Decimal numbers.
Numbers larger than nine are often expressed in base 10 using the
decimal constructor df-dec 12367, e.g., ;;;4001 (see 4001prm 16774
for a proof that 4001 is prime).
Theorem forms.
We will use the following descriptive terms to categorize theorems:
- A theorem is in "closed form" if it has no $e hypotheses
(e.g., unss 4114). The term "tautology" is also used, especially in
propositional calculus. This form was formerly called "theorem form"
or "closed theorem form".
- A theorem is in "deduction form" (or is a "deduction") if it
has zero or more $e hypotheses, and the hypotheses and the conclusion
are implications that share the same antecedent. More precisely, the
conclusion is an implication with a wff variable as the antecedent
(usually 𝜑), and every hypothesis ($e statement) is either:
- an implication with the same antecedent as the conclusion, or
- a definition. A definition can be for a class variable (this is a
class variable followed by =, e.g., the definition of 𝐷 in
lhop 25085) or a wff variable (this is a wff variable followed by
↔); class variable definitions are more common.
In practice, a proof of a theorem in deduction form will also contain
many steps that are implications where the antecedent is either that
wff variable (usually 𝜑) or is a conjunction (𝜑 ∩ ...)
including that wff variable (𝜑). E.g., a1d 25, unssd 4116.
Although they are no real deductions, theorems without $e hypotheses,
but in the form (𝜑 → ...), are also said to be in "deduction
form". Such theorems usually have a two step proof, applying a1i 11 to a
given theorem, and are used as convenience theorems to shorten many
proofs. E.g., eqidd 2739, which is used more than 1500 times.
- A theorem is in "inference form" (or is an "inference") if
it has one or more $e hypotheses, but is not in deduction form,
i.e., there is no common antecedent (e.g., unssi 4115).
Any theorem whose conclusion is an implication has an associated
inference, whose hypotheses are the hypotheses of that theorem
together with the antecedent of its conclusion, and whose conclusion is
the consequent of that conclusion. When both theorems are in set.mm,
then the associated inference is often labeled by adding the suffix "i"
to the label of the original theorem (for instance, con3i 154 is the
inference associated with con3 153). The inference associated with a
theorem is easily derivable from that theorem by a simple use of
ax-mp 5. The other direction is the subject of the Deduction Theorem
discussed below. We may also use the term "associated inference" when
the above process is iterated. For instance, syl 17 is an
inference associated with imim1 83 because it is the inference
associated with imim1i 63 which is itself the inference
associated with imim1 83.
"Deduction form" is the preferred form for theorems because this form
allows us to easily use the theorem in places where (in traditional
textbook formalizations) the standard Deduction Theorem (see below)
would be used. We call this approach "deduction style".
In contrast, we usually avoid theorems in "inference form" when that
would end up requiring us to use the deduction theorem.
Deductions have a label suffix of "d", especially if there are other
forms of the same theorem (e.g., pm2.43d 53). The labels for inferences
usually have the suffix "i" (e.g., pm2.43i 52). The labels of theorems
in "closed form" would have no special suffix (e.g., pm2.43 56) or, if
the non-suffixed label is already used, then we add the suffix "t" (for
"theorem" or "tautology", e.g., ancomst 464 or nfimt 1899). When an
inference with an "is a set" hypothesis (e.g., 𝐴 ∈ V) is converted
to a theorem (in closed form) by replacing the hypothesis with an
antecedent of the form (𝐴 ∈ 𝑉 →, we sometimes suffix the closed
form with "g" (for "more general") as in uniex 7572 versus uniexg 7571. In
this case, the inference often has no suffix "i".
When submitting a new theorem, a revision of a theorem, or an upgrade
of a theorem from a Mathbox to the Main database, please use the
general form to be the default form of the theorem, without the suffix
"g" . For example, "brresg" lost its suffix "g" when it was revised for
some other reason, and now it is brres 5887. Its inference form which was
the original "brres", now is brresi 5889. The same holds for the suffix
"t".
Deduction theorem.
The Deduction Theorem is a metalogical theorem that provides an
algorithm for constructing a proof of a theorem from the proof of its
corresponding deduction (its associated inference). See for instance
Theorem 3 in [Margaris] p. 56. In ordinary mathematics, no one actually
carries out the algorithm, because (in its most basic form) it involves
an exponential explosion of the number of proof steps as more hypotheses
are eliminated. Instead, in ordinary mathematics the Deduction Theorem
is invoked simply to claim that something can be done in principle,
without actually doing it. For more details, see mmdeduction.html 5889.
The Deduction Theorem is a metalogical theorem that cannot be applied
directly in Metamath, and the explosion of steps would be a problem
anyway, so alternatives are used. One alternative we use sometimes is
the "weak deduction theorem" dedth 4514, which works in certain cases in
set theory. We also sometimes use dedhb 3634. However, the primary
mechanism we use today for emulating the deduction theorem is to write
proofs in deduction form (aka "deduction style") as described earlier;
the prefixed 𝜑 → mimics the context in a deduction proof system.
In practice this mechanism works very well. This approach is described
in the deduction form and natural deduction page mmnatded.html 3634; a
list of translations for common natural deduction rules is given in
natded 28668.
Recursion.
We define recursive functions using various "recursion constructors".
These allow us to define, with compact direct definitions, functions
that are usually defined in textbooks with indirect self-referencing
recursive definitions. This produces compact definition and much
simpler proofs, and greatly reduces the risk of creating unsound
definitions. Examples of recursion constructors include
recs(𝐹) in df-recs 8173, rec(𝐹, 𝐼) in df-rdg 8212,
seqω(𝐹, 𝐼) in df-seqom 8249, and seq𝑀( + , 𝐹) in
df-seq 13650. These have characteristic function 𝐹 and initial value
𝐼. (Σg in df-gsum 17070 isn't really designed for arbitrary
recursion, but you could do it with the right magma.) The logically
primary one is df-recs 8173, but for the "average user" the most useful
one is probably df-seq 13650- provided that a countable sequence is
sufficient for the recursion.
Extensible structures.
Mathematics includes many structures such as ring, group, poset, etc.
We define an "extensible structure" which is then used to define group,
ring, poset, etc. This allows theorems from more general structures
(groups) to be reused for more specialized structures (rings) without
having to reprove them. See df-struct 16776.
Undefined results and "junk theorems".
Some expressions are only expected to be meaningful in certain contexts.
For example, consider Russell's definition description binder iota,
where (℩𝑥𝜑) is meant to be "the 𝑥 such that 𝜑"
(where 𝜑 typically depends on x).
What should that expression produce when there is no such 𝑥?
In set.mm we primarily use one of two approaches.
One approach is to make the expression evaluate to the empty set
whenever the expression is being used outside of its expected context.
While not perfect, it makes it a bit more clear when something
is undefined, and it has the advantage that it makes more
things equal outside their domain which can remove hypotheses when
you feel like exploiting these so-called junk theorems.
Note that Quine does this with iota (his definition of iota
evaluates to the empty set when there is no unique value of 𝑥).
Quine has no problem with that and we don't see why we should,
so we define iota exactly the same way that Quine does.
The main place where you see this being systematically exploited is in
"reverse closure" theorems like 𝐴 ∈ (𝐹‘𝐵) → 𝐵 ∈ dom 𝐹,
which is useful when 𝐹 is a family of sets. (by this we
mean it's a set set even in a type theoretic interpretation.)
The second approach uses "(New usage is discouraged.)" to prevent
unintentional uses of certain properties.
For example, you could define some construct df-NAME whose
usage is discouraged, and prove only the specific properties
you wish to use (and add those proofs to the list of permitted uses
of "discouraged" information). From then on, you can only use
those specific properties without a warning.
Other approaches often have hidden problems.
For example, you could try to "not define undefined terms"
by creating definitions like ${ $d 𝑦𝑥 $. $d 𝑦𝜑 $.
df-iota $a ⊢ (∃!𝑥𝜑 → (℩𝑥𝜑) = ∪ {𝑥 ∣ 𝜑}) $. $}.
This will be rejected by the definition checker, but the bigger
theoretical reason to reject this axiom is that it breaks equality -
the metatheorem (𝑥 = 𝑦 → P(x) = P(y) ) fails
to hold if definitions don't unfold without some assumptions.
(That is, iotabidv 6402 is no longer provable and must be added
as an axiom.) It is important for every syntax constructor to
satisfy equality theorems *unconditionally*, e.g., expressions
like (1 / 0) = (1 / 0) should not be rejected.
This is forced on us by the context free term
language, and anything else requires a lot more infrastructure
(e.g., a type checker) to support without making everything else
more painful to use.
Another approach would be to try to make nonsensical
statements syntactically invalid, but that can create its own
complexities; in some cases that would make parsing itself undecidable.
In practice this does not seem to be a serious issue.
No one does these things deliberately in "real" situations,
and some knowledgeable people (such as Mario Carneiro)
have never seen this happen accidentally.
Norman Megill doesn't agree that these "junk" consequences are
necessarily bad anyway, and they can significantly shorten proofs
in some cases. This database would be much larger if, for example,
we had to condition fvex 6769 on the argument being in the domain
of the function. It is impossible to derive a contradiction
from sound definitions (i.e. that pass the definition check),
assuming ZFC is consistent, and he doesn't see the point of all the
extra busy work and huge increase in set.mm size that would result
from restricting *all* definitions.
So instead of implementing a complex system to counter a
problem that does not appear to occur in practice, we use
a significantly simpler set of approaches.
Organizing proofs.
Humans have trouble understanding long proofs. It is often preferable
to break longer proofs into smaller parts (just as with traditional
proofs). In Metamath this is done by creating separate proofs of the
separate parts.
A proof with the sole purpose of supporting a final proof is a lemma;
the naming convention for a lemma is the final proof label followed by
"lem", and a number if there is more than one. E.g., sbthlem1 8823 is the
first lemma for sbth 8833. The comment should begin with "Lemma for",
followed by the final proof label, so that it can be suppressed in
theorem lists (see the Metamath program "MM> WRITE THEOREM_LIST"
command).
Also, consider proving reusable results separately, so that others will
be able to easily reuse that part of your work.
Limit proof size.
It is often preferable to break longer proofs into
smaller parts, just as you would do with traditional proofs.
One reason is that humans have trouble understanding long proofs.
Another reason is that it's generally best to prove
reusable results separately,
so that others will be able to easily reuse them.
Finally, the Metamath program "MM-PA> MINIMIZE__WITH *" command can take
much longer with very long proofs.
We encourage proofs to be no more than 200 essential steps, and
generally no more than 500 essential steps,
though these are simply guidelines and not hard-and-fast rules.
Much smaller proofs are fine!
We also acknowledge that some proofs, especially autogenerated ones,
should sometimes not be broken up (e.g., because
breaking them up might be useless and inefficient due to many
interconnections and reused terms within the proof).
In Metamath, breaking up longer proofs is done by creating multiple
separate proofs of separate parts.
A proof with the sole purpose of supporting a final proof is a
lemma; the naming convention for a lemma is the final proof's name
followed by "lem", and a number if there is more than one. E.g.,
sbthlem1 8823 is the first lemma for sbth 8833.
Proof stubs.
It's sometimes useful to record partial proof results, e.g.,
incomplete proofs or proofs that depend on something else not fully
proven.
Some systems (like Lean) support a "sorry" axiom, which lets you assert
anything is true, but this can quickly run into trouble, because
the Metamath tooling is smart and may end up using it
to prove everything.
If you want to create a proof based on some other claim, without
proving that claim, you can choose to define the claim as an axiom.
If you temporarily define a claim as an axiom, we encourage you to
include "Temporarily provided as axiom" in its comment.
Such incomplete work will generally only be accepted in a mathbox
until the rest of the work is complete.
When you're working on your personal copy of the database
you can use "?" in proofs to indicate an unknown step.
However, since proofs with "?" will (obviously) fail
verification, we don't accept proofs with unknown steps in
the public database.
Hypertext links.
We strongly encourage comments to have many links to related material,
with accompanying text that explains the relationship. These can help
readers understand the context. Links to other statements, or to
HTTP/HTTPS URLs, can be inserted in ASCII source text by prepending a
space-separated tilde (e.g., " ~ df-prm " results in " df-prm 16305").
When the Metamath program is used to generate HTML, it automatically
inserts hypertext links for syntax used (e.g., every symbol used), every
axiom and definition depended on, the justification for each step in a
proof, and to both the next and previous assertions.
Hypertext links to section headers.
Some section headers have text under them that describes or explains the
section. However, they are not part of the description of axioms or
theorems, and there is no way to link to them directly. To provide for
this, section headers with accompanying text (indicated with "*"
prefixed to mmtheorems.html#mmdtoc 16305 entries) have an anchor in
mmtheorems.html 16305 whose name is the first $a or $p statement that
follows the header. For example there is a glossary under the section
heading called GRAPH THEORY. The first $a or $p statement that follows
is cedgf 27259. To reference it we link to the anchor using a
space-separated tilde followed by the space-separated link
mmtheorems.html#cedgf, which will become the hyperlink
mmtheorems.html#cedgf 27259. Note that no theorem in set.mm is allowed to
begin with "mm" (this is enforced by the Metamath program "MM> VERIFY
MARKUP" command). Whenever the program sees a tilde reference beginning
with "http:", "https:", or "mm", the reference is assumed to be a link
to something other than a statement label, and the tilde reference is
used as is. This can also be useful for relative links to other pages
such as mmcomplex.html 27259.
Bibliography references.
Please include a bibliographic reference to any external material used.
A name in square brackets in a comment indicates a
bibliographic reference. The full reference must be of the form
KEYWORD IDENTIFIER? NOISEWORD(S)* [AUTHOR(S)] p. NUMBER -
note that this is a very specific form that requires a page number.
There should be no comma between the author reference and the
"p." (a constant indicator).
Whitespace, comma, period, or semicolon should follow NUMBER.
An example is Theorem 3.1 of [Monk1] p. 22,
The KEYWORD, which is not case-sensitive,
must be one of the following: Axiom, Chapter, Compare, Condition,
Corollary, Definition, Equation, Example, Exercise, Figure, Item,
Lemma, Lemmas, Line, Lines, Notation, Part, Postulate, Problem,
Property, Proposition, Remark, Rule, Scheme, Section, or Theorem.
The IDENTIFIER is optional, as in for example
"Remark in [Monk1] p. 22".
The NOISEWORDS(S) are zero or more from the list: from, in, of, on.
The AUTHOR(S) must be present in the file identified with the
htmlbibliography assignment (e.g., mmset.html) as a named anchor
(NAME=). If there is more than one document by the same author(s),
add a numeric suffix (as shown here).
The NUMBER is a page number, and may be any alphanumeric string such as
an integer or Roman numeral.
Note that we _require_ page numbers in comments for individual
$a or $p statements. We allow names in square brackets without
page numbers (a reference to an entire document) in
heading comments.
If this is a new reference, please also add it to the
"Bibliography" section of mmset.html.
(The file mmbiblio.html is automatically rebuilt, e.g.,
using the Metamath program "MM> WRITE BIBLIOGRAPHY" command.)
Acceptable shorter proofs.
Shorter proofs are welcome, and any shorter proof we accept
will be acknowledged in the theorem description. However,
in some cases a proof may be "shorter" or not depending on
how it is formatted. This section provides general guidelines.
Usually we automatically accept shorter proofs that (1)
shorten the set.mm file (with compressed proofs), (2) reduce
the size of the HTML file generated with SHOW STATEMENT xx
/ HTML, (3) use only existing, unmodified theorems in the
database (the order of theorems may be changed, though), and
(4) use no additional axioms.
Usually we will also automatically accept a _new_ theorem
that is used to shorten multiple proofs, if the total size
of set.mm (including the comment of the new theorem, not
including the acknowledgment) decreases as a result.
In borderline cases, we typically place more importance on
the number of compressed proof steps and less on the length
of the label section (since the names are in principle
arbitrary). If two proofs have the same number of compressed
proof steps, we will typically give preference to the one
with the smaller number of different labels, or if these
numbers are the same, the proof with the fewest number of
characters that the proofs happen to have by chance when
label lengths are included.
A few theorems have a longer proof than necessary in order
to avoid the use of certain axioms, for pedagogical purposes,
and for other reasons. These theorems will (or should) have
a "(Proof modification is discouraged.)" tag in their
description. For example, idALT 23 shows a proof directly from
axioms. Shorter proofs for such cases won't be accepted,
of course, unless the criteria described continues to be
satisfied.
Information on syntax, axioms, and definitions.
For a hyperlinked list of syntax, axioms, and definitions, see
mmdefinitions.html 23.
If you have questions about a specific symbol or axiom, it is best
to go directly to its definition to learn more about it.
The generated HTML for each theorem and axiom includes hypertext
links to each symbol's definition.
Reserved symbols: 'LETTER.
Some symbols are reserved for potential future use.
Symbols with the pattern 'LETTER are reserved for possibly
representing characters (this is somewhat similar to Lisp).
We would expect '\n to represent newline, 'sp for space, and perhaps
'\x24 for the dollar character.
The challenge of varying mathematical conventions
We try to follow mathematical conventions, but in many cases
different texts use different conventions.
In those cases we pick some reasonably common convention and stick to
it.
We have already mentioned that the term "natural number" has
varying definitions (some start from 0, others start from 1), but
that is not the only such case.
A useful example is the set of metavariables used to represent
arbitrary well-formed formulas (wffs).
We use an open phi, φ, to represent the first arbitrary wff in an
assertion with one or more wffs; this is a common convention and
this symbol is easily distinguished from the empty set symbol.
That said, it is impossible to please everyone or simply "follow
the literature" because there are many different conventions for
a variable that represents any arbitrary wff.
To demonstrate the point,
here are some conventions for variables that represent an arbitrary
wff and some texts that use each convention:
- open phi φ (and so on): Tarski's papers,
Rasiowa & Sikorski's
The Mathematics of Metamathematics (1963),
Monk's Introduction to Set Theory (1969),
Enderton's Elements of Set Theory (1977),
Bell & Machover's A Course in Mathematical Logic (1977),
Jech's Set Theory (1978),
Takeuti & Zaring's
Introduction to Axiomatic Set Theory (1982).
- closed phi ϕ (and so on):
Levy's Basic Set Theory (1979),
Kunen's Set Theory (1980),
Paulson's Isabelle: A Generic Theorem Prover (1994),
Huth and Ryan's Logic in Computer Science (2004/2006).
- Greek α, β, γ:
Duffy's Principles of Automated Theorem Proving (1991).
- Roman A, B, C:
Kleene's Introduction to Metamathematics (1974),
Smullyan's First-Order Logic (1968/1995).
- script A, B, C:
Hamilton's Logic for Mathematicians (1988).
- italic A, B, C:
Mendelson's Introduction to Mathematical Logic (1997).
- italic P, Q, R:
Suppes's Axiomatic Set Theory (1972),
Gries and Schneider's A Logical Approach to Discrete Math
(1993/1994),
Rosser's Logic for Mathematicians (2008).
- italic p, q, r:
Quine's Set Theory and Its Logic (1969),
Kuratowski & Mostowski's Set Theory (1976).
- italic X, Y, Z:
Dijkstra and Scholten's
Predicate Calculus and Program Semantics (1990).
- Fraktur letters:
Fraenkel et. al's Foundations of Set Theory (1973).
Distinctness or freeness
Here are some conventions that address distinctness or freeness of a
variable:
- Ⅎ𝑥𝜑 is read " 𝑥 is not free in (wff) 𝜑";
see df-nf 1788 (whose description has some important technical
details). Similarly, Ⅎ𝑥𝐴 is read 𝑥 is not free in (class)
𝐴, see df-nfc 2888.
- "$d 𝑥𝑦 $." should be read "Assume 𝑥 and 𝑦 are distinct
variables."
- "$d 𝜑𝑥 $." should be read "Assume 𝑥 does not occur in
ϕ." Sometimes a theorem is proved using Ⅎ𝑥𝜑 (df-nf 1788)
in place of "$d 𝜑𝑥 $." when a more general result is desired;
ax-5 1914 can be used to derive the $d version. For an example of
how to get from the $d version back to the $e version, see the
proof of euf 2576 from eu6 2574.
- "$d 𝐴𝑥 $." should be read "Assume 𝑥 is not a variable
occurring in class 𝐴."
- "$d 𝐴𝑥 $. $d 𝜓𝑥 $.
$e |- (𝑥 = 𝐴 → (𝜑 ↔ 𝜓)) $." is an idiom often used instead
of explicit substitution, meaning "Assume ψ results from the
proper substitution of 𝐴 for 𝑥 in ϕ." Therefore, we often
use the term "implicit substitution" for such a hypothesis.
- Class and wff variables should appear at the beginning of distinct
variable conditions, and setvars should be in alphabetical order.
E.g., "$d 𝑍𝑥𝑦 $.", "$d 𝜓𝑎𝑥 $.". This convention should
be applied for new theorems (formerly, the class and wff variables
mostly appear at the end) and will be assured by a formatter in the
future.
- " ⊢ (¬ ∀𝑥𝑥 = 𝑦 → ...)" occurs early in some cases, and
should be read "If x and y are distinct
variables, then..." This antecedent provides us with a technical
device (called a "distinctor" in Section 7 of [Megill] p. 444)
to avoid the need for the
$d statement early in our development of predicate calculus, permitting
unrestricted substitutions as conceptually simple as those in
propositional calculus. However, the $d eventually becomes a
requirement, and after that this device is rarely used.
There is a general technique to replace a $d x A or
$d x ph condition in a theorem with the corresponding
Ⅎ𝑥𝐴 or Ⅎ𝑥𝜑; here it is.
⊢ T[x, A] where $d 𝑥𝐴,
and you wish to prove ⊢ Ⅎ𝑥𝐴 ⇒ ⊢ T[x, A].
You apply the theorem substituting 𝑦 for 𝑥 and 𝐴 for 𝐴,
where 𝑦 is a new dummy variable, so that
$d y A is satisfied.
You obtain ⊢ T[y, A], and apply chvar to obtain ⊢
T[x, A] (or just use mpbir 230 if T[x, A] binds 𝑥).
The side goal is ⊢ (𝑥 = 𝑦 → ( T[y, A] ↔ T[x, A] )),
where you can use equality theorems, except
that when you get to a bound variable you use a non-dv bound variable
renamer theorem like cbval 2398. The section
mmtheorems32.html#mm3146s 2398 also describes the
metatheorem that underlies this.
Additional rules for definitions
Standard Metamath verifiers do not distinguish between axioms and
definitions (both are $a statements).
In practice, we require that definitions (1) be conservative
(a definition should not allow an expression
that previously qualified as a wff but was not provable
to become provable) and be eliminable
(there should exist an algorithmic method for converting any
expression using the definition into
a logically equivalent expression that previously qualified as a wff).
To ensure this, we have additional rules on almost all definitions
($a statements with a label that does not begin with ax-).
These additional rules are not applied in a few cases where they
are too strict (df-bi 206, df-clab 2716, df-cleq 2730, and df-clel 2817);
see those definitions for more information.
These additional rules for definitions are checked by at least
mmj2's definition check (see
mmj2 master file mmj2jar/macros/definitionCheck.js).
This definition check relies on the database being very much like
set.mm, down to the names of certain constants and types, so it
cannot apply to all Metamath databases... but it is useful in set.mm.
In this definition check, a $a-statement with a given label and
typecode ⊢ passes the test if and only if it
respects the following rules (these rules require that we have
an unambiguous tree parse, which is checked separately):
The expression must be a biconditional or an equality (i.e. its
root-symbol must be ↔ or =).
If the proposed definition passes this first rule, we then
define its definiendum as its left hand side (LHS) and
its definiens as its right hand side (RHS).
We define the *defined symbol* as the root-symbol of the LHS.
We define a *dummy variable* as a variable occurring
in the RHS but not in the LHS.
Note that the "root-symbol" is the root of the considered tree;
it need not correspond to a single token in the database
(e.g., see w3o 1084 or wsb 2068).
The defined expression must not appear in any statement
between its syntax axiom ($a wff ) and its definition,
and the defined expression must not be used in its definiens.
See df-3an 1087 for an example where the same symbol is used in
different ways (this is allowed).
No two variables occurring in the LHS may share a
disjoint variable (DV) condition.
All dummy variables are required to be disjoint from any
other (dummy or not) variable occurring in this labeled expression.
Either
(a) there must be no non-setvar dummy variables, or
(b) there must be a justification theorem.
The justification theorem must be of form
⊢ ( definiens root-symbol definiens' )
where definiens' is definiens but the dummy variables are all
replaced with other unused dummy variables of the same type.
Note that root-symbol is ↔ or =, and that setvar
variables are simply variables with the setvar typecode.
One of the following must be true:
(a) there must be no setvar dummy variables,
(b) there must be a justification theorem as described in rule 5, or
(c) if there are setvar dummy variables, every one must not be free.
That is, it must be true that
(𝜑 → ∀𝑥𝜑) for each setvar dummy variable 𝑥
where 𝜑 is the definiens.
We use two different tests for nonfreeness; one must succeed
for each setvar dummy variable 𝑥.
The first test requires that the setvar dummy variable 𝑥
be syntactically bound
(this is sometimes called the "fast" test, and this implies
that we must track binding operators).
The second test requires a successful
search for the directly-stated proof of (𝜑 → ∀𝑥𝜑)
Part c of this rule is how most setvar dummy variables
are handled.
Rule 3 may seem unnecessary, but it is needed.
Without this rule, you can define something like
cbar $a wff Foo x y $.
${ $d x y $. df-foo $a |- ( Foo x y <-> x = y ) $. $}
and now "Foo x x" is not eliminable;
there is no way to prove that it means anything in particular,
because the definitional theorem that is supposed to be
responsible for connecting it to the original language wants
nothing to do with this expression, even though it is well formed.
A justification theorem for a definition (if used this way)
must be proven before the definition that depends on it.
One example of a justification theorem is vjust 3423.
Definition df-v 3424 ⊢ V = {𝑥 ∣ 𝑥 = 𝑥} is justified
by the justification theorem vjust 3423
⊢ {𝑥 ∣ 𝑥 = 𝑥} = {𝑦 ∣ 𝑦 = 𝑦}.
Another example of a justification theorem is trujust 1541;
Definition df-tru 1542 ⊢ (⊤ ↔ (∀𝑥𝑥 = 𝑥 → ∀𝑥𝑥 = 𝑥))
is justified by trujust 1541 ⊢ ((∀𝑥𝑥 = 𝑥 → ∀𝑥𝑥 = 𝑥) ↔ (∀𝑦𝑦 = 𝑦 → ∀𝑦𝑦 = 𝑦)).
Here is more information about our processes for checking and
contributing to this work:
Multiple verifiers.
This entire file is verified by multiple independently-implemented
verifiers when it is checked in, giving us extremely high
confidence that all proofs follow from the assumptions.
The checkers also check for various other problems such as
overly long lines.
Discouraged information.
A separate file named "discouraged" lists all
discouraged statements and uses of them, and this file is checked.
If you change the use of discouraged things, you will need to change
this file.
This makes it obvious when there is a change to anything discouraged
(triggering further review).
LRParser check.
Metamath verifiers ensure that $p statements follow from previous
$a and $p statements.
However, by itself the Metamath language permits certain kinds of
syntactic ambiguity that we choose to avoid in this database.
Thus, we require that this database unambiguously parse
using the "LRParser" check (implemented by at least mmj2).
(For details, see mmj2 master file src/mmj/verify/LRParser.java).
This check
counters, for example, a devious ambiguous construct
developed by saueran at oregonstate dot edu
posted on Mon, 11 Feb 2019 17:32:32 -0800 (PST)
based on creating definitions with mismatched parentheses.
Proposing specific changes.
Please propose specific changes as pull requests (PRs) against the
"develop" branch of set.mm, at:
https://github.com/metamath/set.mm/tree/develop 1541.
Community.
We encourage anyone interested in Metamath to join our mailing list:
https://groups.google.com/g/metamath 1541.
(Contributed by the Metamath team, 27-Dec-2016.) Date of last revision.
(Revised by the Metamath team, 22-Sep-2022.)
(Proof modification is discouraged.) (New usage is
discouraged.) |