AsteroidMeta: metamathModuleMetadata

Here is the proposal I sent to Norm Megill about adding "metadata" statements to .mm files such as set.mm so that programs can more easily convert from .mm format to other formats such as Ghilbert or HCodebert. I recommend downloading the latest metamath.zip to see set.mm's newest statements. (Uppercase is used to make the metadata statements stand out, but an extract program should look for lower or upper.) Comments/feedback/etc.? Now is the time to provide input!

http://us2.metamath.org:8888/index.html#downloads

Also check out the "#*#*#*#" and "=-=-=" that existed previously. You will find that the "modules" (books) already have chapter/section breakdowns (I can see (someone) creating a new webpage showing a Table of Contents based on the module/section breakdown of set.mm which would not only pull out the theorem labels but the first sentence of the comment preceding each theorem. More than 10 years of work are embedded in set.mm and it is impressive, to say the least.)

[Note: conversion of set.mm to another system may be more challenging than first imagined. The issue of overloaded operators and conversions of kinds/grammatical type codes comes into play (see weq vs. wceq, for example.) Overloaded operators are dealt with in mmj2 which constructs a set of Grammar Rules from Metamath syntax axioms and deals with combinations of Nulls and Type Conversions at the same time.]

    After due consideration of multiple factors,
    especially Occam's Rule, I conclude that there
    is no need to add anything to break-out set.mm
    according to Book/Chapter/Section, or to 
    distinguish between the MODULE break-out and
    Book/Chapter/Section. (I think MODULE (see below)
    metadata comment lines will do just fine as
    Book markers/delimiters.)
 
    I see that the comments beginning with 
 
        $(
        #*#*#*#*#*#*#*
 
    are the intended "modules" from the author's 
    viewpoint, and that comments beginning with
 
        $(
        =-=-=-=-=-=-=-
 
    designate "sections" within those modules. 
 
    Assuming that you continue to maintain 
    consistency, an extract program should have 
    no problem automatically breaking out a 
    file into modules and sections, which can then
    be automatically numbered -- and captions can
    be pulled from the module/section header 
    comments. 
 
    So what I did was create the minimal set of
    metadata to add to set.mm, in the form of
    Metamath comments that begin with "$( <MM>".
    In the future, when we see "$( <MM>" we can
    *know* that the comment is metadata for 
    post-processors, such as the Ghilbert extract,
    or perhaps an html page builder program. Here
    is how the MODULE metadata looks:
 
    $( <MM> <MODULE> <ID>LOGICB</ID> <PREREQ> </PREREQ> </MODULE> </MM> $)
 
    <MODULE> -- Identifies as "MODULE" Metadata, which
                contains attributes "ID" and "PREREQ"
                (multiple prerequisites are possible).
 
                Ideally, Module ID should be 7 characters
                in length, or less, and should be composed
                of letters; digits are permitted but not
                as the first character (the goal is to
                support 8 character file names on all
                platforms and to reserve the 8th character
                position to distringuish between axiom
                and theorem files -- the Ghilbert extract
                will append an "A" or a "T" to the module
                name...)
 
                As you can see from the automated file
                comparison below there are these modules:
 
        __ID__   _PREREQ_  __comments________________________________________
        PREFACE            (contains only comments, so null Ghilbert extract)
        LOGICB   
        LOGICP   LOGICB
        ZFC      LOGICP
        NUMBERS  ZFC
        TYPESET  HILBERT   (contains only comments, so null Ghilbert extract)
        HILBERT  ZFC
 
    The convention should be that the "$( <MM>" 
    metadata header comment identifies the 
    start of a module, and that the module continues
    until the start of the next module or the
    end of the file. AND... if the "$( <MM>" 
    comment is immediately followed by a
        $(
        #*#*#*#*#*#*#*
    comment, then the module's Caption will be
    pulled from the "#*#*#*" comment; otherwise
    the module caption will be blank -- note that
    I changed the Typesetting module's Caption
    comment from 
        $(
        =-=-=-=-=-=-=-
    to 
        $(
        #*#*#*#*#*#*#*
 
    because TYPESET is logically a separate module,
    not a section within NUMBERS. 
 
    The way the PREREQ data will work is that when
    building a Ghilbert extract from set.mm (or
    whichever .mm file), the PREREQ will be used
    to import the prerequisite interface file(s). In
    Ghilbert, prerequisites are cumulative, so
    if ZFC's prequisites are LOGICB and LOGICP, and 
    LOGICP's prerequisite is LOGICB, then only
    LOGICP needs to be specified as a prerequisite
    for ZFC. 
 
    Also, just as a preliminary idea unsupported by
    experimentation, each MODULE's Ghilbert extract
    will result in two ".gh" files, one for axioms
    and one for theorems/proofs. 
 
    NOTE: the file comparison below is hard to read,
    so in English, what I did was precede every
    "#*#*#*" header comment in set.mm with a "$( <MM>"
    module metadata comment (even at the start of
    set.mm!!!) And, I changed the "=-=-=-" characters
    in the Typesetting section of set.mm to 
    "#*#*#*", thus giving Typesetting module-level
    status. (And I changed the file date to 9/14.)
 
    NOTE2: I wasn't sure whether HILBERT's prerequisite
    was ZFC or NUMBERS, but I picked ZFC (guessing.)
 
    NOTE3: I have no investment in the MODULE ID values,
    so adjust however seems correct to you. It will
    be (probably) a few months before I have coded
    the Ghilbert extract enhancement to mmj2 anyway...
 
    Thanks!
 
 
    ======================================================
    P.S. "1a1" below means line 1 of new file
    inserted before line 1 of old file. "6c7" means 
    line 7 of new file replaces line 6 of old file. Etc.
    =======================================================
 
    Compare: (<)\temp\metamath\set.mm (3819672 bytes)
       with: (>)\temp\metamath\setMODULES.mm (3820189 bytes)
 
    1a1
    > $( <MM> <MODULE> <ID>PREFACE</ID> <PREREQ> </PREREQ> </MODULE> </MM> $)
    6c7
    < set.mm - Version of 13-Sep-2005
    ---
    > set.mm - Version of 14-Sep-2005
    865a866
    > $( <MM> <MODULE> <ID>LOGICB</ID> <PREREQ> </PREREQ> </MODULE> </MM> $)
    7357a7359
    > $( <MM> <MODULE> <ID>LOGICP</ID> <PREREQ>LOGICB</PREREQ> </MODULE> </MM> $)
    11252c11255
    < 
    ---
    > $( <MM> <MODULE> <ID>ZFC</ID> <PREREQ>LOGICP</PREREQ> </MODULE> </MM> $)
    48260a48263
    > $( <MM> <MODULE> <ID>NUMBERS</ID> <PREREQ>ZFC</PREREQ> </MODULE> </MM> $)
    67163,67166c67167,67171
    < $(
    < =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    <        Appendix:  Typesetting definitions for the tokens in this file
    < =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    ---
    > $( <MM> <MODULE> <ID>TYPESET</ID> <PREREQ>HILBERT</PREREQ> </MODULE> </MM> $)
    > $(
    > #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
    >        Appendix:  Typesetting definitions for the tokens in this file
    > #*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
    68105a68110
    > $( <MM> <MODULE> <ID>HILBERT</ID> <PREREQ>ZFC</PREREQ> </MODULE> </MM> $)