AsteroidMeta: MaximallyCompressedMetamathDatabases

Here is a first draft of "compressed1.mm" which documents how a Metamath database can be "maximally" (I think) compressed. The .mm lines are indented to the right by 4 columns to prevent the Asteroid from manipulating the text.

    $( compressed1.mm - Version of 9-May-2008

    #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
        Metamath source file for experimentation with Maximally Compressed
        Metamath databases. 
    #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

                               ~~ PUBLIC DOMAIN ~~

    This file is placed in the public domain per the Creative Commons Public
    Domain Dedication:  http://creativecommons.org/licenses/publicdomain/
    The public domain release applies worldwide.  In case this is not
    legally possible, the right is granted to use the work for any purpose,
    without any conditions, unless such conditions are required by law.

    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    Maximally Compressed Metamath Databases.
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

        The idea behind "maximally" compressed Metamath databases is
        that a theorem's formula and the formulas of its Logical
        Hypotheses can be generated in Most General Form using just
        its proof; the syntax labels in the proof are not needed for
        this purpose. Also, any $d restrictions which are required
        for a theorem can be generated from the $d restrictions of
        the axioms and theorems it refers to in its proof (all $d
        restrictions are, ultimately, obtained from the axioms.)

        A "Maximally Compressed Metamath Database" consists of
        .mm source (exemplified by the sample below) -- containing
        all of the $c, $v, $f, $a, $d and $e's for the $a's -- 
        followed by and a binary bitstream as defined in this file.

        The bitstream consists of a Header section, a Body, and
        a Trailer.

        The bit stream is encoded using Variable Length
        Words which increase in length by one bit each time
        a Word is used consisting of all B'1' bits (binary one bits).

        Header: Describes the parameters used to encode the .mm
                theorem bitstream.

            -   Initial Word Size: 
                    * n B'1' (Binary one) bits where 'n' is the Initial 
                      Word Size.

                    * NOTE: Initial Word Size is considered the first
                            word. The next word will be n + 1 bits
                            long.

                    * Default = 9 (i.e. B'111111111').

            -   Maximum Logical Hypotheses:

                    * maximum number of Logical (Essential) Hypotheses 
                      which can be used in an Axiom or Theorem. References
                      inside the bitstream to Logical Hypotheses are 
                      always local to a theorem's proof and are 
                      referred to as 1, 2, 3, ... within the proof
                      (i.e. B'1', B'10', B'11', etc.)

                    * Default = 32 (i.e. B'0000010000')

                    * From Maximum Logical Hypotheses and Initial Word Size
                      the Maximum Logical Axioms is computed (inferred).
                      This is the number of axioms where are reserved
                      for use in the numbering scheme. It is computed
                      as follows:

                        Maximum Axioms = 2**Initial Word Size
                                         - Maximum Logical Hypotheses
                                         - 1.

                        Default = 991 (i.e. B'1111011111').

        Body: Consists of 0 or more encoded theorems, each consisting
              of Words encoding the Theorem Number (label) followed
              by Proof Steps.

            - Theorem Number:

                  * Encoded as one Word.

                  * Theorem Numbers are assigned sequentially by
                    adding 1 to the previous Theorem Number, except
                    for the first Theorem Number which equals Maximum
                    Logical Axioms plus 1 (by definition).

            - Proof Steps:

                  * Each Proof Step is encoded as one Word.

                  * If the Theorem Number consists of all binary ones
                    (B'1') when converted into binary, then the Word 
                    size in the bitstream is increased by one bit,
                    and therefore, the theorem's Proof Step Word
                    size is one greater than the Theorem Number's
                    word size.

                  * Proof Steps are in Reverse Polish Notation 
                    order (as per the Metamath step). Thus, the 'qed'
                    Proof Step is the final Proof Step for the 
                    Theorem.

                  * Each Proof Step is encoded as a single Word which
                    contains either:

                        a) a previous Theorem Number (e.g. 992, 993, ...)

or

                        b) a Logical Axiom number (e.g. 33, 34, ..., 991)

or

                        c) a Logical Hypothesis Number (e.g. 1, 2, ..., 32).
                           NOTE: a theorem's Logical Hypotheses are not
                                 declared, they are simply used at the
                                 appropriate locations within the proof.
                                 Thus, if a proof contains references to
                                 Logical Hypotheses 1, 2, and 3, then
                                 *by definition* the Theorem uses three
                                 Logical Hypotheses.

                  * A Proof Step number need not match the Metamath
                    label of the assertion or logical hypothesis. It
                    is imagined that a lookup table could be used to
                    translate between Proof Step number and Metamath
                    label.

                  * The end of the Proof Steps for a Theorem is
                    indicated by the following Theorem Number -- which
                    is distinguished by its value, which is one greater
                    than the current Theorem Number -- or by the bitstream
                    Footer, which consists of one Word of binary zeroes
                    (B'0').

        Trailer: One Word containing all binary zeroes (B'0000....000').

$)

      $c ( $.  
      $c ) $.  
      $c -> $. 
      $c wff $.
      $c |- $.

      $v ph $.
      $v ps $.
      $v ch $.
      $v th $.
      $v ta $.

      wph $f wff ph $.
      wps $f wff ps $.
      wch $f wff ch $.
      wth $f wff th $.
      wta $f wff ta $.

      wi $a wff ( ph -> ps ) $.

      ${ $( referred to in the binary bitstream as #33 $)
        min   $e |- ph $.
        maj   $e |- ( ph -> ps ) $.
        ax-mp $a |- ps $.
      $}

      $( #34 $)
      ax-1 $a |- ( ph -> ( ps -> ph ) ) $.

      $( #35 $)
      ax-2 $a |- ( ( ph -> ( ps -> ch ) ) -> 
                   ( ( ph -> ps ) -> ( ph -> ch ) ) ) $.

    ${ $( dummy provided for mmj2 use only $)
      dummy.1 $e |- ph $.
      dummy   $p |- ph $= ? $.
    $}

    $( === end of native .mm source === $)