Proof Assistant – Unification and Metamath Database Lookup Strategy – An Outline.

A Few Preliminaries

My adventures with parsing showed me that the "naive" approach to a subject that is complicated and already well researched is unlikely to yield satisfactory results. However, Metamath is logically "agnostic", and conventional approaches such as rewriting expressions into "normal" forms may be unsuitable. The objective here is to add to mmj2, creating facilities that can be accessed in server-mode, and which are not dependent upon the logic or grammar of the input Metamath database – except to say that the grammar must be verified by mmj2 and that every statement is parseable. Then…after I fail two or three times I will know enough to access the literature intelligently :)

Numero Dos: By "proof assistant" I am referring to a utility that fills in missing statement label "Ref" values, as on this proof page:

http://us.metamath.org/mpegif/isset.html

Or, in the format I use for in-stream .mm proofs written as comments:

  $( ======================================================== $)
  $( *** Deduction combining antecedents and consequents. *** $)
  ${
    syl34d.1 $e |- ( ph -> ( ps -> ch ) ) $.
    syl34d.2 $e |- ( ph -> ( th -> ta ) ) $.
    syl34d   $p |- ( ph -> ( ( ch -> th ) -> ( ps -> ta ) ) )
          $=
      wph wch wth wi wch wta wi wps wta wi wph wth wta wch 
      syl34d.2 syl3d wph wps wch wta syl34d.1 syl4d syld
          $.
    $( <MM> <PROOF>
       #/HYP REF      EXPRESSION
       ===== ======== ==========================================
       1:    syl34d.1 |- ( ph -> ( ps -> ch ) )               $.
       2:    syl34d.2 |- ( ph -> ( th -> ta ) )               $.
       3:2   syl3d    |- ( ph -> 
                           ( ( ch -> th ) -> ( ch -> ta ) )   $.
       4:1   syl4d    |- ( ph -> 
                           ( ( ch -> ta ) -> ( ps -> ta ) )   $.
       5:3,4 syld     |- ( ph -> 
                           ( ( ch -> th ) -> ( ps -> ta ) ) ) $.
       QED
       </PROOF> </MM>
    $)
  $}

The user will be required to complete a table or rows containing the Step, Hyp and Expression columns, then the Proof Assistant will complete the Ref column. To speed things up, we might ask the user to put an "H" in the Hyp column if the Expression is one of the theorem's hypotheses -- or incorporate the theorem's hypotheses automatically as steps 1, 2, etc.

While this may not seem ambitious, the difficulty level of an industrial strength solution is higher than might be imagined. Right now set.mm has more than 6000 theorems and axioms. A key challenge lies in efficient search because the search must be repeated for each step of every proof! And, a realistic solution must be capable of handling an even larger database of theorems, perhaps as many as 100,000 theorems, or more. (Yes, $/cpu cycle is decreasing faster than the size of set.mm is increasing – right now – but the objective is a system that works well interactively, and if the search time grows, even linearly with the number of theorems in the database, there might not be enough CPU power on the planet to keep the customers happy.)

( OFF TOPIC NOTE concerning above Metamath link: (fyi, Ponder theorem "isset" above, and its friend "visset". visset is saying: "Every set variable is a set" and isset is a way of asserting that a given class, A is a set. The syntax axiom (!?) "cv" converts a set variable to a class grammatical type (every set is a class), so it is interesting to discover visset. mmj2 treats cv purely as a "Type Conversion Syntax Axiom", but I am annoyed with this perspective as it feels as if cv is more than a grammatical device. In due time enlightenment will come :)

norm answers:

For most authors, classes have a two-part definition:

: (1) All set variables are classes.
: (2) If ph is a wff, then { x | ph } is a class.

In set.mm, these are of course the syntax constructors "cv" and "cab" respectively.

Jech's definition consists of (2) only. Then he says (Set Theory p. 4, with variable names changed to our convention):

: "Every set [variable] can be considered a class. If x is a set [variable], consider the formula y e. x and the class { y | y e. x }. That the set x is uniquely determined by its elements follows from the axiom of extensionality."

I interpret this as meaning that the use of a set variable x where a class is expected is really an abbreviation for the class { y | y e. x }. Thus we could, in principle, eliminate "cv" by explicitly using { y | y e. x } instead.

"visset" is a theorem. It states: x e. V. If we expanded it to eliminate "cv", it would read: { y | y e. x } e. V. (And V, of course, abbreviates { x | x = x }.)

Does this clarify it? It is also helpful to read Quine Set Theory and Its Logic pp. 15-21, to understand how formulas with classes translate back into the primitive language of set theory. --norm 11-Nov-2005

ocat thanks:

Thanks! I did not intend to imply that set.mm is wrong or dubious in some way :) My quandry has been about the nature of type conversions, not just cv: syntactic or axiomatic? mmj2 treats them as purely syntactic, but the English translation has a wff-ish flavor: every set is a class. I suspect that the reason I too easily swallowed the idea of cv and friends being purely syntactic is the use of sub-typing in object oriented programming languages. But those are different because the properties are simply manufactured in code instead of, as is the case in symbolic logic, being perceived to emerge from logically true statements about the results of operations on the objects. I am now leaning towards the view that type conversions are definitional, and require justification.

norm again:

I don't think of cv as a type conversion, in spite of its apparent resemblance to one; I think it is different. For example, once it is "converted" to a class variable - if you view it that way - you still can't substitute a compound class expression for it (even if that class expression qualifies as a set i.e. is a member of V). Wouldn't a true type conversion allow this? Basically, the x in "class x" is still a set variable in the same way that the ph in "class { y | ph }" is still a wff variable. It is an accident of the syntax that "class x" has only one symbol after "class", constructing the one-symbol class expression "x". But I don't know what the official (computer science?) definition of a type conversion is. Anyway, I changed the description of cv from "a set is a class" to "a set variable is a class expression" and added a reference to Jech. What do you think? --norm 13-Nov-2005

ocat ventures to the narrowest portion of the limb:

cv defines a "substitution rule", perhaps, instead of a "type conversion", per se. The content does not change, merely the "type" constant which is used by the Metamath Proof Verification Engine to determine whether a certain expression can be substituted for a variable. The Proof Engine is the Final Authority… though, of course, being able to translate a formula into English is highly desirable (I eagerly anticipate the day when HDM can translate axsup's formula back into "A non-empty, bounded-above set of reals has a supremum.")

: Or, just let Metamath do the dirty work, to keep HDM's job simpler: suprcl :) --norm

"set" is an interesting "grammatical type", aka "non-terminal" because it has no production rules of its own. However valid and convenient it may be as a construct, the Language Police appear to consider it suspect as an "Undefined non-terminal":

Dick Grune and Ceriel J.H. Jacobs, "Parsing Techniques - A Practical Guide" [ http://www.cs.vu.nl/~dick/PTAPG.html ]

Excerpt:

    "2.8.1 Undefined non-terminals
    The right-hand sides of some rules may contain non-
    terminals for which no production rule is given.
    Remarkably, this does not seriously affect the sentence
    generation process described in 2.5.2: if a sentential
    form containing an undefined non-terminal turns up for
    processing in a left-most production process, there will
    be no match, and the sentential form is a blind alley
    and will be discarded. The rule with the right-hand side
    containing the undefined non-terminal will never have
    issue and can indeed be removed from the grammar. (If we
    do this, we may of course remove the last definition of
    another non-terminal, which will then in turn become
    undefined, etc.) From a theoretical point of view there
    is nothing wrong with an undefined nonterminal, but if a
    user-specified grammar contains one, there is almost
    certainly an error, and any grammar-processing program
    should mark such an occurrence as an error."

Based on the amount of energy devoted to Type Conversion Syntax Axioms, not to mention the nuances of Nulls Permitted Axioms, I think I would advise novice language inventors to steer clear of both. Then, there will be a one-to-one correspondence between the .mm Syntax Axioms and the grammar production rules. Conversions would be treated as theorems, and Nulls would be handled by explicitly describing each variant in a syntactic domain using a separate Syntax Axiom. (Off-topic, for future reference to .mm language inventors, I think that making the $f label match the variable name is excellent – and make all $f's global except perhaps for dummy variables. And only use scopes (${ / $}) to assign $e's to a single $a or $p. These conventions will facilitate conversion to/from .mm format.)

--ocat 13-Nov-2005

raph weighs in:

It occurs to me that the closest analogy to the cv "syntax axiom" may be this production in a typical C-like grammar:

  value ::= lvalue

Writing "2 + 2 = 4;" in C is just as invalid as writing "E. 3 3 = pi" in Metamath, and for very similar reasons. Conversely, you could imagine an awkward C variant in which this production has explicit syntax, so that to increment i you'd write something like "i = cv(i) + 1;". It would taste terrible, nobody would buy it, but it is plausible.

C, of course, does have a richer grammar for lvalues than just variables, but it's possible to imagine a set.mm variant with similar properties. For example, some formal descriptions allow variables (often drawn from a finite alphabet) to be subscripted with an integer.

norm rambles on:

In set.mm, the notation is the same for the engine and the user. I chose what I felt to be as close to literature notation as possible (mod ambiguity, so the engine would work), to provide me with a familiar language that I can work with comfortably in a text editor without constantly mentally translating to and from say Polish. A drawback is that it can be quite complex not only to prove its soundness, but even to analyze with standard grammatical techniques. This was an arbitrary personal choice that is permitted, for better or worse, by the flexibility inherent in the Metamath language. In peano.mm, on the other hand, Bob Solovay chose Polish prefix notation throughout which makes a theoretical analysis much simpler. I think it is important to avoid reading something deeper into my personal notational choices than is really there.

It might instructive to look at the Metamath Solitaire Java applet. (Or at Ghilbert, where I think similar remarks will apply.) In the applet, the user display and the internal notation used by the engine are separated. Polish prefix notation is required by its unification algorithm, which allows the syntax construction steps to be omitted in proofs, unlike in standard Metamath proofs. Thus Polish prefix notation is used internally, and all syntax constructions are built with n-ary prefix connectives. "cv" is just a unary connective that is structually no different from the unary connective "wn" for negation. From mm.java:

    // Negation
    tmpConnective = new Connective("wn", "wff", 1, "-. $1");
    tmpConnective.setArgtype(0, "wff");
    connectiveVec.addElement(tmpConnective);

    ....

    // Convert variable to class (invisible notation)
    tmpConnective = new Connective("cv", "class", 1, "$1");
    tmpConnective.setArgtype(0, "var");
    connectiveVec.addElement(tmpConnective);

The unification algorithm treats it exactly like it does the unary negation connective. The only difference is the way it is displayed to the user. The 4th argument of Connective is the screen display notation. Unlike the other syntax constructions, no screen connective is displayed for "cv"; the reason is not mathematical but is only to keep with standard literature notation.

In the axioms, weq is really wceq, and there is no weq like in set.mm.

    // ax-10 $a |- ( A. x x = y -> ( A. x P -> A. y P ) ) $.
    tmpAxiom = new Axiom("ax-10",
        "wi wal $1 weq cv $1 cv $2 wi wal $1 $3 wal $2 $3",
        "Axiom of quantifier substitution (predicate calculus)");
    axiomVec.addElement(tmpAxiom);

Regarding type conversion, it is true that "cv" has "var" (aka "set") as its input type and "class" as its output type, unlike "wn" whose input and output types are both "wff". And indeed "cv" is the only unary connective with this property, as it turns out. But there are also binary connectives with different input and output types, such as "cab" (inputs are "var" and "wff", output is "class"). Do you think these somehow have a fundamentally different status from "cv" because they are binary instead of unary?

Also, do you think that the "cv" vs. "visset" discussion would have arisen if set.mm's syntax was chosen to match the applet's internal notation? It could be easily translated to it, of course. (Conveniently, all proofs would remain the same - the separation of the proofs from the notation is a nice feature of metamath, I think.) --norm 14-Nov-2005

Good points. If we use functional notation

    A = cv(x)

then "cv" is seen as just a mapping from set to class. Right?

P.S. I notice that "visset" is a very popular theorem! Wow. The more I look at Metamath and set.mm the more awestruck I am by the magnitude and grandeur of the achievement. There is no need for SETI -- we have enough here already :) --ocat

Revision based on above arguments from Norm:

Suppose T1 is a (logical) theorem whose formula parses as follows:

    parse(T1(x)) = f(cv(x))

where:

    x is of type X, and
    cv is a mapping from X to Y, and
    f is a syntax axiom taking type Y and
        yielding type wff.

Now, the question is, can theorem T2, (which we are proving using T1) be unified with T1 assuming that T2 parses as follows:

    parse(T2(y)) = f(y)

where:

    y is of type Y

In other words, can the Unification algorithm righteously substitute y for cv(x) without knowing anything else about the "meaning" of cv?

Previously, I would have said, "Yes, because cv is a 'type conversion' syntax axiom". But Norm has argued that the syntax of set.mm should not be interpreted in this way. It may be true that for set.mm the answer is "Yes", but in the general case (of some arbitrary .mm database), substituting 'y' for 'cv(x)' is not justified unless additional information is available to the Unification algorithm – i.e. it "knows" what 'cv(x)' means.

So…my new position regarding Unification is that the parse tree of the Assertion being unified with an Axiom/Theorem must completely overlay (match) all parse tree nodes of the Axiom/Theorem except for variable hypothesis nodes. And, these variable hypothesis nodes are the "graft" points where we substitute expressions for variables in Unification (we never substitute an expression for an expression!)

--ocat 15-Nov-2005

Sketch of Unification Process

A) Each justified step in a proof table is a separate proof. Justification for a step is either that the step reiterates an hypothesis or assertion, or that the step Expression is derivable from previously given assertions (axioms or theorems) and/or hypotheses, using the rules of the Metamath Proof Verification Engine regarding substitutions and distinct variables.

B) What is elsewhere referred to as "unification" is used here to mean a process for determining valid substitutions from a proof step Expression into a given assertion's variables – and if these substitutions exist.

[ Ref: "Unification is a procedure for searching for a consistent set of substitutions of elements." http://www.rci.rutgers.edu/~cfs/472_html/Logic_KR/resolution.html ]

In the case of an unambiguous grammar, assuming successful grammatical parsing of every statement in the .mm database, unification can be regarded as, first, overlaying an assertion's parse tree with a proof step Expression's parse tree and making sure that none of the assertion's syntax axiom nodes are visible. Then, the nodes where the proof step's parse tree graft onto the assertion's parse tree provide the substitutions, which must be unique (only one substitution per variable). The substitutions are then applied to the Candidate assertion, which is then compared to the original Expression – they must now be equal.

For example, unify:

    ax-1 = ( ph -> ( ps -> ph ) )

    with a proof step Expression "E":

    E = ( ( ph -> ph ) -> ( ps -> ( ph -> ph ) ) )
    
         "ax-1"                 "E"   
         ****                   **** 
         *wi*                   *wi*
         ****                   ****
        .    .                  .   .  
        .    .                 .     .
     ----    ****           ****     ****
     -ph-    *wi*           *wi*     *wi*
     ----    ****           ****     ****
             .  .           .  .     .   .
             .  .           .  .     .    .
          ----  ----     ----  ----  ----  ****
          -ps-  -ph-     -ph-  -ph-  -ps-  *wi*        
          ----  ----     ----  ----  ----  ****
                                           .  . 
                                           .  .  
                                         ---- ---- 
                                         -ph- -ph-
                                         ---- ----

From the diagram it is obvious that E can be unified with ax-1 because each "****" box in ax-1 is overlayed by an identical box from E. This is a necessary condition for unification because the grammar was stipulated to be unambiguous, and so, each expression has only one parse tree.

And, from the diagram, we see the correct, unique variable substitutions to make. The "—-" boxes in ax-1 are the graft points on the ax-1 parse tree: graft the corresponding subtree from E over each "—-" box in the ax-1 parse tree – and if the variable in the "—-" box occurs more than once in ax-1, then the same substitution takes place for each occurrence (i.e. if E were defined as follows then unification would be impossible:

  E' = ( ( ph -> ph ) -> ( ps -> ( ch -> th ) ) ).

Note: the example above does not portray the full complexity of unifying expressions containing Type Conversion(1) and Nulls Permitted(2) Syntax Axioms (as opposed to just Notation Syntax Axioms, as we call them in mmj2). Nor does it cover the scenario of Logical Hypotheses(3).

1. If a Type Conversion occurs above a "****" box in the assertion's parse tree, then there must be an identical overlaying node in the Expression's parse tree. This is, again, because the grammar is unambiguous and unique parse trees have been stipulated to have been obtained prior to unification.

2. A "—-" box containing a Nulls Permitted Syntax Axiom must also be overlayed by an identical node in the Expression's parse tree. This is because "null" is a constant, and cannot there can be no substitution for a constant according to the rules of the Metamath Proof Verification engine. (So…this consideration also applies to Constant Syntax Axioms such as "c0"!)

3. Each Logical Hypothesis of the assertion must also be Unified (!) with an hypothesis from the proof step – and vice-versa. If the proof step Hyp field specifies "1,2,4" then that means that the proof step can only be matched to an assertion with exactly 3 Logical Hypotheses -- each of which must undergo unification with one of the Hyp Expressions specified by the user (we will allow for the case where the user does not provide the correct sequence of hypotheses, compared to the assertion's .mm file ordering in spite of the fact that extra work will be required to rearrange the proof step's Hyp's.) Note: if a variable occurs on more than one formula (assertion and/or Logical Hypotheses), then the same substitution must be made everywhere in a valid unification.

C) The problem of finding a Proof Step Ref using the step's Expression and Hyp values requires a search to identify "Candidates" for unification. There may be many assertions satisfying the parse tree overlay process described above! And, the "degenerate" case of an assertion expression such as "|- ph" must be considered (this obtains from modus ponens as well as the trivial proof that a hypothesis yields itself.)

Once a set of Candidate assertions is available, secondary tests must be applied to determine which, if any of the Candidates satisfy the requirements of Unification (such as unique substitution for "ph" in the example above) and any disjoint variable restrictions used in the Metamath Proof Verification Engine.

And, there may be more than one Candidate assertion that passes every test, including any distinct variable restrictions. In this situation, assuming that the Candidate is not the same Theorem we are trying to prove (!), perhaps the user should be happy that we found even one and will be grateful if we return the first Candidate we find? (Additional restrictions on the Candidates will be needed, for example specification of the position of the theorem to be proved in the .mm database – meaning that no subsequent assertions can be used in its proof -- i.e. no recursion!)

Searching a Metamath Database: Context

We wish to avoid sequential search of a .mm database for each step of a proof. Ergo, a direct lookup method is desired, but without having to rewrite each proof step's Expression into a "normal form"; the algorithm should be as general as possible, knowing nothing about propositional or any other form of logic or notation.

The problem with direct access is that, essentially, building the set of "Candidate" assertions for Unification involves a sort of geometric matching – the selected Candiates' parse trees must be congruent to the proof step's parse tree. And, unfortunately, we don't know in advance how much of each Candidate's tree to overlay. Just the root, or the root plus a subtree? Remember that ax-mp may be the chosen Candidate even for a very complex and lengthy Expression (the Candidate's root node may be a "Degenerate" Expression of the form "ph" – a generic wff.)

Here are some items that can be matched or used to winnow a .mm database into a set of Candidates for Unification:

1. Number of Logical Hypotheses: must match proof step's number of Hyp's exactly. If the proof step's Hyp column specifies 4 hypotheses, then only axioms and theorems with 4 Logical Hypotheses need apply.

2. It is stipulated that Proof Assistant is not used for "syntax proofs". That means that each proof step's Expression's Type Code is "|-" or the user-designated Provable Logic Statement Type Code (in mmj2 terminology). Thus, only axioms and theorems with Type Code "|-" are to be considered – syntax axioms are excluded, and that means that our Candidates will be of type "wff", or the user-designated Logical Statement Type Code equivalent that corresponds to "|-".

3. The Candidate's assertion's parse tree root node always exists (duh), and is either Degenerate (Variable Hypothesis or Nulls Permitted Syntax Axiom – the latter being included as a technicality, for completeness), or Not Degenerate (Notation or Type Conversion Syntax Axiom):

    - If Degenerate, the Type Code of the
      Var Hyp or Nulls Axiom must match the
      Type Code of the proof step's Expression;

    - If Not Degenerate, the root node's (Syntax
      Axiom) Label must equal the Label of the
      root node of the proof step's Expression's
      parse tree.

In other words, if the parse tree root node for the proof step's Expression = "wi", then a suitable Candidate for Unification must also have a "wi" root node, or be Degenerate and have a "wff" type Variable Hypothesis in the root. (A proof step Null root could unify with a Null or a "ph" root node, but not in set.mm of course.)

4. Similar considerations to #3 apply for each Logical Hypothesis of a Candidate assertion for Unification with a proof step's Logical Hypothesis. Each Logical Hypothesis also has a parse tree, and the root nodes on corresponding Candidate assertion and proof step Logical Hypotheses must match.

For example, consider ax-mp

    ${
      maj $e |- ( ph -> ps ) $.
      min $e |- ph           $.
      ax-mp $p |- ps         $.
    $}

ax-mp would be a satisfactory Candidate for Unification with a proof step having one hypothesis root node = "wi", a Degenerate second hypothesis of type "wff", and a Degenerate assertion of type "wff". (Subsequent checks of unique variable substitutions would then be needed to confirm the match.)

5. Sequence Number of a Candidate for Unification: must be less than the Sequence number of the Theorem being proved. (The Proof Assistant may provide an entry field allowing input of either a maximum Sequence Number, or a Label which would specify the last Axiom or Theorem that could possibly be used.)

6. Other metrics might be considered, such as length or height of the portion of the Candidate's parse tree to be overlayed. These seem to be of dubious utility. If the portion of the Candidate's parse tree to be overlayed is longer – written in RPN format – than the proof step's Expression, excluding Variable Hypotheses, then the Candidate can be rejected out of hand. Likewise with height.

7. Many assertions have parse trees that are completely filled out with non-Degenerate nodes below the root down to level 2 or 3.

For example, ax-2 has a "syntax depth" of 3, with levels 1 and 2 "syntactically completed":

    ax-2 $a |- ( ( ph -> ( ps -> ch ) ) ->
                 ( ( ph -> ps ) -> ( ph -> ch ) ) ) $.
                 
                     ax-2
                     ****
                     *wi*
                    .****.
                  .        .
                .            .
           ****               ****
           *wi*               *wi*
           ****               ****
          .   .              .    .
         .     .            .      . 
      ----   ****         ****      ****
      -ph-   *wi*         *wi*      *wi* 
      ----   ****         ****      ****
             .  .         .  .      .   . 
           .     .      .     .    .     .
        ----   ----   ----  ----   ----  ----
        -ps-   -ch-   -ph-  -ps-   -ph-  -ch-
        ----   ----   ----  ----   ----  ----

Searching a Metamath Database: Alternatives

Variation 1:

In theory, a lookup key or a hash-code could be constructed using the "syntactically completed" levels, along with – perhaps – the number of Logical Hypotheses. For ax-2 such a key string might look like this:

    "*wi*wi wi*0"

Then, to locate suitable Candidate assertions for an Expression, E which is syntactically complete to level 2, the lookup could proceed as follows:

1) Search Level 2 hash table 2) Search Level 1 hash table 3) Search Level 0 (degenerate) hash table

This approach still does not solve the problem of correctly ordering the Logical Hypotheses. Since we will not (absolutely) require (but request) that the correct sequence of Logical Hypotheses be provided to the Proof Assistant, building a key using the syntactic structure of the Logical Hypotheses is not straight-forward – especially since one of more of the hypotheses may be of the degenerate form "ph".

Fortunately, "degenerate" form Logical Hypotheses are in the minority, and a multi-step lookup could be performed a key built with sorted top level nodes from the hypotheses. For example, if the Logical Hypotheses of an assertion have top level parse nodes = "wi", "wn", then a key = "wi*wn" could be constructed – and in the event of an unsuccessful search, a secondary lookup for degenerate top level nodes could be performed.

Variation 2:

A preferred approach for the mmj2 system will likely take into account the fact that every symbol, assertion, theorem, hypothesis, etc. is accessed by (object) reference instead of by name. This means that "wi" is only referred to during input and output. Thus, the top level "wi" node of an Expression's parse tree contains a reference to the "wi" object. There is no need to perform a hash or tree lookup, the address is already at hand. To facilitate this alternative, a reference to a Proof Assistant lookup object would be added to mmj.lang.Assrt, and another refernence would be added to mmj.lang.Cnst to handle the degenerate form of assertion (i.e. for set.mm Cnst "wff" would contain a reference to a separate lookup table holding ax-mp, etc.)

Considering set.mm as a typical database, though still on the small side ( :), there are approximately 41 wff type syntax axioms (class syntax axioms will not appear in the root node of a theorem or logical axiom parse tree in set.mm). Given equal distribution of the 6,000 theorems and axioms among the wff syntax constructors yields an average of 150 Candidates for a proof step expression. These are reduced somewhat by the degenerate assertions such as ax-mp, which would be searched separately.

To further reduce the number of Candidates that would need to be painstakingly examined with in- depth Unification, a secondary set of tables can be added based on number of Logical Hypotheses. A guesstimate is that 75% of Theorems and Logical axioms have 0, 1, or 2 or 3 Logical Hypotheses. These secondary tables could be constructed if and when the primary table for a syntax constructor's size exceeds a certain number, such as 10 or 20.

A tertiary level of lookup tables might be deployed for exceedingly common constructors such as "wi" that may well be in the root nodes of thousands of theorems – and after splitting these up according to the number of Logical Hypotheses, several hundred might be present in a secondary table. Techniques mentioned in Variation 1 might be adapted for use here in the tertiary lookups. For example, a hash key could be built for syntactially complete level 2 parse trees. Or, hash keys based on the root nodes of the Logical Hypotheses might be created. And, a combination might be tried for special cases such as assertions with exactly 1 Logical Hypothesis – in this case the sequence of hypotheses is moot, and a compound key could be constructed.

Note: it is further envisioned that within any lookup table containing a list or array of assertions to be examined that the assertions will be sequenced by their order within the Metamath database – i.e. their Sequence Number. This will enable a lookup to be terminated once the input Max Sequence Number for the theorem in the Proof Assistant is exceeded (i.e. if we are proving theorem 100 we do not need to examine 101, 102 – they are disallowed by definition in the Metamath Proof Verification Engine!)

Examination and review of set.mm and perhaps ql.mm will be needed before making any final decisions about how to proceed with these lookups. At the very least, set.mm and ql.mm will provide instruction as to the wrong lookup strategies. Ideally, the lookup process would yield no more than 10 or 20 assertions for which Unification would be attempted.

The costliness of Unification is not entirely clear either, but it is envisioned that rejection of Candidates will be performed as expediently as possible. Resequencing proof step hypotheses is likely to add expense to the Unification process, as well as complexity. Detective work will be needed to figure out which variables need to be Unified – and this will require Unification of Logical Hypotheses! A process of elimination will be needed to work through the impossible sequences of hypotheses.

Other real-life proof examples to ponder

http://us2.metamath.org:8888/mpegif/sylan9.html

http://us2.metamath.org:8888/mpegif/rcla42v.html

Or, take for example, Theorem #5187 of set.mm, "climadd", which has 9 logical hypotheses and a deceptively short Expression but requires 193 proof steps!

http://us.metamath.org/mpegif/climadd.html

set.mm wff-type Syntax Constructors