Here's a stable version of the x86backend. Highlights of the changes
include:
src/mlton/main/main.sml
src/mlton/control/control.sig
src/mlton/control/control.sml
-> added native-optimize control
-> commented out "cannot use -p and -native"
/src/mlton/backend/x86-codegen
-> Overflow checking
-> more verbose with -v
-> changed a number of val _ = if ... then () else Error.bug to Assert.assert
most of these were on things that hadn't been triggered in months:
some sanity checking of the arguments to prims
the entire x86-validate phase
-> native-optimize n
a setting of 0 skips all but the essential parts of the x86-simplify
phase (i.e., just add in block header and footers)
a setting of 1 is the default "one-pass" simplification
a setting > 1 basically loops on the simplification until no
individual optimization causes a change
Most of the time, a setting > 1 doesn't make any difference at all.
However, if the elimDeadDsts peephole succeeds, we can sometimes eliminate
all of the code in a block, which sometimes means that optimization which
replaces a jump to an empty block with a jump to the final destination
will then succeed, and this can cascade some additional optimizations.
-> bug fixes for some register allocation problems
-> a new peephole functor
This is just a little bit slower than the old one, but it's more flexible
on matching patterns and also has the advantage of being able to
simultaneously process multiple optimizations at once. A little more
tweaking of what optimizations are combined with other ones should gain a
little more speed.
-> profiling support
Here's a randon example of the profiling labels:
.local MLtonProfile44$$0.concat_0$$1.L_254$$Begin
MLtonProfile44$$0.concat_0$$1.L_254$$Begin:
.p2align 2
L_254:
movl (16*1)(%edi),%eax
movl %eax,(4*1)(%edi)
jmp *(%edi)
.local MLtonProfile44$$0.concat_0$$1.L_254$$End
MLtonProfile44$$0.concat_0$$1.L_254$$End:
The breakdown is as follows:
MLtonProfileNNN <-- all profiling labels start like this,
the NNN is a unique number added to ensure no
label name duplication
0.concat_0 <-- a "level 0" profile label.
corresponds to the string in the profileName field
of the MachineOutput.Block.t datatype
should roughly correspond to a function in the .cps
1.L_254 <-- a "level 1" profile label.
corresponds to block that got created at translation
time. Most of the time, as in the example above,
this will correspond to an actual label in the
assembly. Sometimes, however, the label won't appear,
because the code falls through to that block.
Begin/End <-- enclosing pairs.
It should be the case that every line of assembly is contained within a
unique MLtonProfile...Begin/End pair.
I think that the level 0 profiling will be the most useful. Level 1 might
be useful, but I'm not sure. Sometimes a block of code ends up with two
Level 1 labels, because two blocks got combined. I don't think this
should ever happen with level 0 (it would mean that there was a place in
one cps function that automatically fell through to a place in another cps
function, which doesn't sound right to me).
The support for profiling is flexible enough to add additional levels with
more information. For example, it might be useful to add a level 2 with
the string "runtime" to surround all the blocks corresponding to limit
checks and array initialization(maybe?). That would allow one to easily
combine all the runtime overhead. Likewise, I could add in stuff to give
all the blocks in one chunk a level/name pair that would allow aggregation
at the chunk level, although, again, I don't know if that would be very
useful.
I think that's about it besides some various other bug fixes. Now that I
have Steve's recent changes, I want to go look over some output and see
about turning off some of the optimizations and seeing what really helps.
For example, all of those algebraic identities in the cps-shrinker have
eliminated almost all of the occurences of the corresponding peephole
optimiztions. I've just been looking at the counters at the end of the
compilation (now appearing in the -v compiles), so I need to check out
what's really being triggered and determine if they are things that should
be propagated back to the cps-shrinker or if they really have a place in
the backend. There are some other interactions that I want to check out,
but nothing that should correspond to any major code changes.