Module optlex
This module does lexer-based optimizations.
Notes:
- TODO: General string delimiter conversion optimizer.
- TODO: (numbers) warn if overly significant digit.
Functions
optimize (option, toklist, semlist, toklnlist) | The main entry point. |
Local Functions
atlinestart (i) | Returns true if current token is at the start of a line. |
atlineend (i) | Returns true if current token is at the end of a line. |
commenteols (lcomment) | Counts comment EOLs inside a long comment. |
checkpair (i, j) | Compares two tokens (i, j) and returns the whitespace required. |
repack_tokens () | Repack tokens, removing deletions caused by optimization process. |
do_number (i) | Does number optimization. |
do_string (I) | Does string optimization. |
do_lstring (I) | Does long string optimization. |
do_lcomment (I) | Does long comment optimization. |
do_comment (i) | Does short comment optimization. |
keep_lcomment (opt_keep, info) | Returns true if string found in long comment. |
Functions
- optimize (option, toklist, semlist, toklnlist)
-
The main entry point.
- currently, lexer processing has 2 passes
- processing is done on a line-oriented basis, which is easier to grok due to the next point...
- since there are various options that can be enabled or disabled, processing is a little messy or convoluted
Parameters:
- option {[string]=bool,...}
- toklist {string,...}
- semlist {string,...}
- toklnlist {int,...}
Returns:
- {string,...} toklist
- {string,...} semlist
- {int,...} toklnlist
Local Functions
- atlinestart (i)
-
Returns true if current token is at the start of a line.
It skips over deleted tokens via recursion.
Parameters:
- i int
Returns:
-
bool
- atlineend (i)
-
Returns true if current token is at the end of a line.
It skips over deleted tokens via recursion.
Parameters:
- i int
Returns:
-
bool
- commenteols (lcomment)
-
Counts comment EOLs inside a long comment.
In order to keep line numbering, EOLs need to be reinserted.
Parameters:
- lcomment string
Returns:
-
int
- checkpair (i, j)
-
Compares two tokens (i, j) and returns the whitespace required.
See documentation for a reference table of interactions.
Only two grammar/real tokens are being considered:
- if
""
, no separation is needed, - if
" "
, then at least one whitespace (or EOL) is required.
Note: This doesn't work at the start or the end or for EOS!
Parameters:
- i int
- j int
Returns:
- if
- repack_tokens ()
- Repack tokens, removing deletions caused by optimization process.
- do_number (i)
-
Does number optimization.
Optimization using string formatting functions is one way of doing this, but here, we consider all cases and handle them separately (possibly an idiotic approach...).
Scientific notation being generated is not in canonical form, this may or may not be a bad thing.
Note: Intermediate portions need to fit into a normal number range.
Optimizations can be divided based on number patterns:
- hexadecimal:
(1) no need to remove leading zeros, just skip to (2)
(2) convert to integer if size equal or smaller
- change if equal size -> lose the 'x' to reduce entropy (3) number is then processed as an integer (4) note: does not make 0[xX] consistent
- integer: (1) reduce useless fractional part, if present, e.g. 123.000 -> 123. (2) remove leading zeros, e.g. 000123
- float: (1) split into digits dot digits (2) if no integer portion, take as zero (can omit later) (3) handle degenerate .000 case, after which the fractional part must be non-zero (if zero, it's matched as float .0) (4) remove trailing zeros for fractional portion (5) p.q where p > 0 and q > 0 cannot be shortened any more (6) otherwise p == 0 and the form is .q, e.g. .000123 (7) if scientific shorter, convert, e.g. .000123 -> 123e-6
- scientific: (1) split into (digits dot digits) [eE] ([+-] digits) (2) if significand is zero, just use .0 (3) remove leading zeros for significand (4) shift out trailing zeros for significand (5) examine exponent and determine which format is best: number with fraction, or scientific
Note: Number with fraction and scientific number is never converted to integer, because Lua 5.3 distinguishes between integers and floats.
Parameters:
- i int
- hexadecimal:
(1) no need to remove leading zeros, just skip to (2)
(2) convert to integer if size equal or smaller
- do_string (I)
-
Does string optimization.
Note: It works on well-formed strings only!
Optimizations on characters can be summarized as follows:
\a\b\f\n\r\t\v -- no change \\ -- no change \"\' -- depends on delim, other can remove \ \[\] -- remove \ \<char> -- general escape, remove \ (Lua 5.1 only) \<eol> -- normalize the EOL only \ddd -- if \a\b\f\n\r\t\v, change to latter if other < ascii 32, keep ddd but zap leading zeros but cannot have following digits if >= ascii 32, translate it into the literal, then also do escapes for \\,\",\' cases <other> -- no change
Switch delimiters if string becomes shorter.
Parameters:
- I int
- do_lstring (I)
-
Does long string optimization.
- remove first optional newline
- normalize embedded newlines
- reduce '=' separators in delimiters if possible
Note: warning flagged if trailing whitespace found, not trimmed.
Parameters:
- I int
- do_lcomment (I)
-
Does long comment optimization.
- trim trailing whitespace
- normalize embedded newlines
- reduce '=' separators in delimiters if possible
Note: It does not remove first optional newline.
Parameters:
- I int
- do_comment (i)
-
Does short comment optimization.
- trim trailing whitespace
Parameters:
- i int
- keep_lcomment (opt_keep, info)
-
Returns true if string found in long comment.
This is a feature to keep copyright or license texts.
Parameters:
- opt_keep bool
- info string
Returns:
-
bool