public class Tokenizer
extends java.lang.Object
implements java.io.Closeable
This class (presently) provides input services for a MetaMath.mm file. Its primary purpose is to return MetaMath tokens, but if desired, the whitespace in between tokens can be obtained, as well as other information.
Note: this file validates the input file to ensure that only valid characters are present. These are the various 7-bit ASCII printable characters and white space characters documented in metamath.pdf at www.metamath.org. ANSI or DOS (cr or cr/lf) line terminators are accepted and are considered part of the white space. Invalid characters will generate a MMIOError exception (a FAR more serious problem than a mere MetaMathParseException exception ;-)
Note: Tokenizer should work with UTF-8 and other character sets that code the 7-bit ASCII characters the same way(?), but because it uses hard coded values it would need to be cloned if someone wants to be able to input full Unicode (an interface could be developed.)
Note: This class does not expand MetaMath include statements (i.e. "$[ infiniteset.mm $]"). That feat can be handled in the parsing logic -- this is just a simple, uneducated tokenizer class. Or....conveniently use the technique described in mmverify.py at www.metamath.org.
(from mmverify.py): "Before using this program, any compressed proofs must be expanded with the Metamath program, e.g.:
$ ./metamath 'r set.mm' 'sa p *' 'w s set.mm' q > /dev/null
That somewhat cryptic (unix/linux) command says 1) Read file set.mm; 2) Save all proofs in default format (uncompressed); 3) Save to disk; and 4) (?) send Metamath output commentary about what it is being asked to do to to the bit bucket...
Here's a version from a Windows ".bat" file that saves the expanded file to a different file name:
metamath.exe "r myset.mm" "v proof *" "sa p *" "w s expmyset.mm" "exit" >> expmyset.txt
Constructor and Description |
---|
Tokenizer(java.io.Reader r,
java.lang.String s)
Constructs Tokenizer from a Reader.
|
Tokenizer(java.io.Reader r,
java.lang.String s,
long nbrCharsToBypass)
Constructs Tokenizer from a Reader, with "skipahead n", where n =
nbrCharsToBypass.
|
Modifier and Type | Method and Description |
---|---|
<T extends MMJException> |
addContext(int errorFldChars,
T e) |
<T extends MMJException> |
addContext(T e) |
void |
close() |
long |
countNbrLines()
Count number of lines in input file.
|
long |
getCurrentCharNbr()
Return current character number in the file.
|
long |
getCurrentColumnNbr()
Return current column number in the current line in the file.
|
long |
getCurrentLineNbr()
Return current line number in the file.
|
java.lang.String |
getSourceId()
Get "Source Id" for the file (for use in error/diagnostic/testing
messages.)
|
int |
getToken(java.lang.StringBuilder strBuf,
int offset)
Gets next MetaMath token from the input file and stores it in
strBuf . |
int |
getWhiteSpace(java.lang.StringBuilder strBuf,
int offset)
Gets next chunk of whitespace from MetaMath file and stores it in
strBuf . |
public Tokenizer(java.io.Reader r, java.lang.String s) throws java.io.IOException
r
- Reader (may be StringReader or BufferedReader but PushbackReader
and LineNumberReader are not helpful choices.)s
- Source Id Text, such as filename or test ID. May be empty string
if N/A. Used solely for diagnostic messages.java.io.IOException
- if I/O errorpublic Tokenizer(java.io.Reader r, java.lang.String s, long nbrCharsToBypass) throws java.io.IOException, MMIOException
r
- Reader (may be StringReader or BufferedReader but PushbackReader
and LineNumberReader are not helpful choices.)s
- Source Id Text, such as filename or test ID. May be empty string
if N/A. Used solely for diagnostic messages.nbrCharsToBypass
- Used to reposition reader with previously
returned charNbr from getCurrentCharNbr
method.java.io.IOException
- if I/O errorMMIOException
- if I/O errorjava.lang.IllegalArgumentException
- if nbrCharsToBypass
is less than
zero.public int getToken(java.lang.StringBuilder strBuf, int offset) throws java.io.IOException
Gets next MetaMath token from the input file and stores it in
strBuf
.
strBuf
- the StringBuilder
where the next available MetaMath
token will be insertedoffset
- insertion point offset in strBuf for token. (If strBuf
contains 3 characters then set offset to 3 because it is
relative to zero.)java.io.IOException
- if I/O errorpublic int getWhiteSpace(java.lang.StringBuilder strBuf, int offset) throws java.io.IOException
strBuf
.strBuf
- the StringBuilder
where the next whitespace chunk
will be insertedoffset
- insertion point offset in strBuf for whitespacejava.io.IOException
- if I/O errorpublic long getCurrentLineNbr()
Current line number corresponds to the line number of the last token or whitespace chunk returned.
Line number is incremented when a character is processed AFTER end of line.
Note: "cr/lf" and "cr" are valid line terminators. Using them in combination is inadvisable. The algorithm used here differs from the Java language algorithm in that cr/cr/lf is counted as TWO line terminators instead of just one (and cr/cr/cr/lf is 3, etc.)
public long getCurrentColumnNbr()
Note: tab characters are ignored for purposes of computing column number.
public long getCurrentCharNbr()
Following getToken or getWhiteSpace this will be the character number of the last character returned in the token or whitespace.
Intended to be used with startPosition method, say, with MetaMath file include processing which closes a reader and then re-starts where it left off.
public java.lang.String getSourceId()
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
java.io.IOException
public long countNbrLines() throws java.io.IOException
This is provided for testing purposes. Delete if desired. Since rewind() is not supported, use of countNbrLines cannot be mixed with other functions -- it is a one-shot use of a newly constructed Tokenizer, and a new Tokenizer would need to be constructed for other uses.
java.io.IOException
- if I/O errorpublic <T extends MMJException> T addContext(T e)
public <T extends MMJException> T addContext(int errorFldChars, T e)