back to mmj2

mmj3 - A Re-engineering of mmj2

The 1-Mar-2008 release of mmj2 is working beautifully and appears to be quite adequate. One or two minor tweaks may be made, but major changes seem unwise given the fact that what we have now actually does the job. In particular, I loathe the idea of re-engineering mmj2 and replacing the existing mmj2 code because people seem to be using what is there now and who knows what horrors we might encounter with drastic changes. Better to do a major cloning job, salvaging some code and algorithms without feeling constrained by the existing users and their antiquated hardware.

There are numerous shortcomings of mmj2. It was developed one chunk at a time without knowing what would be required at the end. Now we know more about what is needed (though graphic rendering of formulas has not been implemented, not in 2D and not in 3D.) And we know more about what can be computed quickly – e.g. what is possible to achieve.

Also, computer hardware has advanced significantly since 2004 and the Next Big Thing appears to be multi-core processors containing dozens, or even 1K separate computing units. Also we have learned of Google's elegant Map Reduce programming framework (also available as open source package "Hadoop"), which distributes gigantic computations across thousands of separate computers. The future is beckoning and the mmj2 code we have now is unable to take advantage of the new hardware and software paradigms (sorry, my bad…)

One of the many architectural blunders in mmj2 was failing to honor the logical entities that make up a Metamath database. In mmj2 each MObj is reused, so that there is one instance of "ph" in the system, and one instance of "ax-mp", etc. In reality, the Metamath statements that define these mathematical objects define model objects. So, a formula containing the variable "ph" ought to contain a reference to a separate instance of "ph", not a reference to the "ph" – it is nonsensical to say that there is only one "ph" variable in a Metamath database because the different occurrences in different theorems are not the same. Likewise with References to theorems like "ax-mp" in proof steps. A proof step which is justified by ax-mp creates an instance of ax-mp, it doesn't just refer to the ax-mp. What exists in mmj2 now works fine but it limits the modifiability of mmj2, especially for multi-processing environments wherein each separate process would like to have its own instances of these MObj's.

Anyway, this re-engineering project captured my imagination and I think it might be worth investing in. I have done much harder work on systems containing more than 1 million lines of code – ugly, nasty code written by other people over many years – so re-engineering the mmj2 code ought to be very easy. I've always believed that Metamath would be just as interesting to Computer Scientists as it would be to mathematicians and logicians, and I feel even more strongly about that now. Metamath is like catnip to programmers because of its elegance and "engineered simplicity". Now that mega-multi-core processors are nearing widespread availability there are many more opportunities in these areas of inquiry.

(more later…feel free to chime in.)

Well, I have done some re-architecting of the mmj2 MObj hierarchy of classes, and made a few other critiques of mmj2 in note form on my hard drive.

Very enticing stuff. But a real back-breaker in terms of rewriting everything!

So I have another idea, which is to create an mmj2 utility which can be used to read, validate and parse a .mm file and then either a) invoke another program/class, passing each MObj in turn, or b) writing out file(s). The idea would be to "ghilbertize" the converted data using s-exps for both the parse trees and the proof trees, though a Java "interface" could be provided allowing a user to substitute a different output formatter.

In addition, the utility could provide a "callback" for access to mmj2 facilities, such as the Proof Assistant and unification logic – which might be useful in some circumstances, such as experimental code.

One other thing this would provide would be the converter which was promised to jcorneli, though he had not yet been ready to use it.

For completeness, mmj2 would probably need to store Begin Scope and End Scope .mm statements and output them with the converted data, though if a user of the data desired to use only the Metamath assertion "frames" then the scope data would be redundant and could be ignored.

One difference between this output format and the Ghilbert-ish format would be that Metamath labels would be output as operators in the parse/proof trees, and formulas would onlyl be output for syntax axioms (which I believe Ghilbert calls "terms".)

If there is any interest or concern about the spec details, post a note and tell me what you would want/need. Or write your request on the back of a $100 bill and mail it to me :-)

One reason why this approach might actually be pretty "ok" is that computers are coming with much more RAM these days, and so having to start-up mmj2 and maybe keep it in memory isn't as huge of an issue as back in Ye Olden Days – and just output files could be used if loading mmj2 were an issue. My mmj2 PC only has 256MB of RAM, which isn't even enough to run Ubuntu… and 1GB or 2GB PC's are pretty much the norm. RAM is cheap, programmers are expensive…

--ocat 3-May-2008

I have been looking into distributed processing, such as Java's RMI (Remote Method Invocation). These techniques are very relevant in the "modern" era of multi-core machines, esp. in clusters of machines. In a situation where thousands of cheap, small machines are connected, the algorithms need to be adapted, especially if memory is not shared.

The problem faced by Metamath/mmj2/ghilbert style systems is that, essentially, all of the data objects have inter-relationships, and lots of contextual knowledge is needed. So what we'd like to do is factor the system in such a way that smallish, simpler objects/things can carry their own data with them, without them having to know everything contained in a .mm database.

This means that the ghilbert style of encoding the Metamath as s-exp's and separating proofs from assertions, leaving behind what are basically just the "call signatures" of the theorems is very interesting! An assertion can be transported as a String, which is very convenient (as opposed to having to serialize/deserialize Java objects containing objects which may refer to other objects, etc.

My latest "invention" (idea) is that once the .mm file has been read in and translated, the original variables used by the .mm file are irrelevant, except at output/display time. What can be done instead is to encode the variables so that their Type (or "kind") Code is part of the variable name, and an integer suffix can distinguish it from every other variable of that Type within the (local) context of the assertion.

Also, Syntax Axiom labels can be reduced in the same way, so that the Type Code returned by the syntax operator is part of the "name" of the operation.

And, the Type Codes can be converted into integers…

Thus, an entire parse tree can be linearized into an array of integers, and easily distributed to an available processor for unification and/or searching, proving, etc. Upon completion and transmission back to the Mother Ship, the variables and syntax operators can be re-converted to Metamath Native for display.

There are extensions to this, very similar to Ghilbert, by the way, involving the "call signature" declarations.

So what we end up doing is – potentially – sacrificing some small amount of efficiency (perhaps, depending), for the sake of a structure that is most conveniently processed in a highly parallel environment. Additionally, use of integers instead of labels and variable names may facilitate table lookups in some cases (primarily they provide uniformity and maximum use of the namespace – that is, 64K Type Codes can be encoded in a 2 byte unsigned integer, etc.)

--ocat 12-May-2008

mmj3ReengineeringOfmmj2

mmj3 - A Re-engineering of mmj2