new liveness pass

Top Pagina
Bijlagen:
Bericht als e-mail
+ (text/plain)
Delete this message
Reply to this message
Auteur: Stephen Weeks
Datum:  
Aan: MLton
Onderwerp: new liveness pass

Most excellent. The new liveness pass that I wrote sped up allocate registers
by quite a bit. For a self compile, it cut down the time from 84 seconds to 11
seconds. The pass is based on the liveness algorithm described in section 4.13,
page 132, of Morgan's "Building and Optimizing Compiler". BTW, the Dragon book
and Muchnick's book provided no help at all on speeding up liveness. They
suggest using bit-vectors, which is infeasible for MLton due to the large size
of CPS functions.

Anyways, here is a description of the new algorithm.

Walk over the whole program and
 1. Build the predecessor graph of basic blocks.  Each basic block records the
    set of its predecessors and the set of variables live at the beginning of
    the block.
 2. For each variable record the block in which is defined and the list of
    blocks where it is used.


Now, for each variable, propagate the liveness information backwards from uses
along basic blocks until the definition block is reached.

That's it. The reason why it's so fast is that it processes one variable at a
time, and hence the operation to determine if that variable is in the live list
for a particular block is constant time -- the variable is either at the head of
the list or it's not there.

Matthew, maybe you can use some variant of this algorithm, if we don't figure
out how to avoid doing liveness twice altogether.

Also, here is the log for a self compile with the new pass. The self compile
time is down to 765 seconds. This is the first time we're under 800! The funny
thing is, the last self compile was at 948 seconds, so the difference can not be
explained by this pass alone. The (unchanged) x86 codegen sped up by 59 seconds!
My only conjecture is that the backend is leaving around some property lists on
labels, which is slowing down the codegen. The new liveness pass doesn't
leave quite as many around, so maybe that was helping. I'll look into making
sure the backend cleans up after itself.

% cat /proc/cpuinfo
processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 8
model name    : Pentium III (Coppermine)
stepping    : 3
cpu MHz        : 731.480093
cache size    : 256 KB
fdiv_bug    : no
hlt_bug        : no
sep_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 2
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm
bogomips    : 729.09


Compiling mlton (takes a while)
time mlton -v -no-polyvariance mlton.cm
MLton 20001130 (built Fri Dec  1 15:47:44 2000 on starlinux.epr.com)
  created this file on Fri Dec  1 15:49:53 2000.
Do not edit this file.
Flag settings: 
   aux: false
   chunk: coalesce 2000
   contify strategy: Both
   defines: [NODEBUG,MLton_safe=TRUE,MLton_detectOverflow=TRUE]
   fixed heap: None
   indentation: 3
   includes: [mlton.h]
   inline: NonRecursive {product = 320,small = 60}
   input file: mlton.cm
   instrument: false
   instrument Sxml: false
   keep Cps: false
   match: left to right
   messages: true
   native: true
   native-commented: 0
   native-copy-prop: true
   native-ieee-fp: false
   native-move-hoist: true
   native-optimize: 1
   native-split: Some(100000)
   polyvariance: None
   print at fun entry: false
   profile: false
   show types: false
compile starting
   parse and elaborate starting
   parse and elaborate finished in 54.500
   core-ml size is 89,907,652 bytes
   numPeeks = 14
   average position in property list = 0.0
   numPeeks = 2450808
   average position in bucket = 0.221
   dead starting
   dead finished in 0.090
   basis size is 867,352 bytes
   numPeeks = 72743
   average position in property list = 0.0
   numPeeks = 2450808
   average position in bucket = 0.221
   size = 187411
   gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileuOZ0Ia /tmp/fileuiNTnF.c -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
   /tmp/fileuOZ0Ia /tmp/fileeaXAYV
   infer starting
      unification starting
      unification finished in 1.570
      finish infer starting
      finish infer finished in 20.950
   infer finished in 23.290
   xml.unsimplified size is 37,952,500 bytes
   numPeeks = 1077855
   average position in property list = 0.000
   numPeeks = 2589647
   average position in bucket = 0.271
   infer simplify starting
   infer simplify finished in 3.510
   xml size is 22,331,948 bytes
   numPeeks = 4570806
   average position in property list = 0.546
   numPeeks = 2589647
   average position in bucket = 0.271
   size = 122292
   num types in program = 20501
   num types in table = 35249
   hash table size is 0 bytes
   mono starting
   mono finished in 5.540
   mono.unsimplified size is 42,350,988 bytes
   numPeeks = 9074433
   average position in property list = 0.275
   numPeeks = 3306823
   average position in bucket = 0.601
   mono simplify starting
   mono simplify finished in 5.740
   mono size is 34,543,836 bytes
   numPeeks = 13789800
   average position in property list = 0.428
   numPeeks = 3306823
   average position in bucket = 0.601
   size = 191119
   num types in program = 12946
   num types in table = 64373
   hash table size is 0 bytes
   implement exceptions starting
   implement exceptions finished in 0.320
   sxml.unsimplified size is 35,101,760 bytes
   numPeeks = 14086561
   average position in property list = 0.419
   numPeeks = 3309224
   average position in bucket = 0.602
   implement exceptions simplify starting
   implement exceptions simplify finished in 2.970
   sxml size is 33,016,448 bytes
   numPeeks = 17578537
   average position in property list = 0.470
   numPeeks = 3309224
   average position in bucket = 0.602
   polyvariance starting
   polyvariance finished in 0.0
   sxml.poly size is 33,016,448 bytes
   numPeeks = 17578537
   average position in property list = 0.470
   numPeeks = 3309224
   average position in bucket = 0.602
   size = 178729
   num types in program = 12513
   num types in table = 64653
   hash table size is 0 bytes
   closure convert starting
      flow analysis starting
      flow analysis finished in 0.760
      flow size is 4,252 bytes
      numPeeks = 18621855
      average position in property list = 0.443
      numPeeks = 3325768
      average position in bucket = 0.605
      free variables starting
      free variables finished in 0.380
      globalize starting
      globalize finished in 0.290
      convert starting
      convert finished in 32.310
   closure convert finished in 34.180
   cps.unsimplified size is 69,048,712 bytes
   numPeeks = 25297174
   average position in property list = 1.679
   numPeeks = 3685461
   average position in bucket = 0.672
   closure convert simplify starting
      simplify starting
     num functions 12069
     num local functions 138606
     num primExps 156489
     numPeeks = 25548537
     average position in property list = 1.662
     numPeeks = 3685461
     average position in bucket = 0.672
     remove-unused starting
     remove-unused finished in 4.440
     num functions 10422
     num local functions 80482
     num primExps 139488
     numPeeks = 28693549
     average position in property list = 1.483
     numPeeks = 3685786
     average position in bucket = 0.672
     leaf-inline starting
        inline starting
        inline finished in 4.020
     leaf-inline finished in 4.020
     num functions 7864
     num local functions 57906
     num primExps 138322
     numPeeks = 30304546
     average position in property list = 1.411
     numPeeks = 3685786
     average position in bucket = 0.672
     raise-to-jump starting
        inferHandlers starting
        inferHandlers finished in 0.160
     raise-to-jump finished in 4.650
     num functions 7864
     num local functions 57524
     num primExps 138295
     numPeeks = 31724934
     average position in property list = 1.357
     numPeeks = 3685786
     average position in bucket = 0.672
     contify starting
     contify finished in 3.110
     num functions 3668
     num local functions 54020
     num primExps 130444
     numPeeks = 32978353
     average position in property list = 1.312
     numPeeks = 3689431
     average position in bucket = 0.674
     constant propagation starting
        inferHandlers starting
        inferHandlers finished in 0.140
        fixed point starting
        fixed point finished in 3.620
     constant propagation finished in 7.940
     num functions 3668
     num local functions 53397
     num primExps 95704
     numPeeks = 35346094
     average position in property list = 1.244
     numPeeks = 3823941
     average position in bucket = 0.823
     useless starting
        analyze starting
        analyze finished in 5.060
     useless finished in 9.970
     num functions 3668
     num local functions 50915
     num primExps 87502
     numPeeks = 37873441
     average position in property list = 1.186
     numPeeks = 3961408
     average position in bucket = 0.838
     remove-unused starting
     remove-unused finished in 2.550
     num functions 3606
     num local functions 49804
     num primExps 85347
     numPeeks = 39588129
     average position in property list = 1.136
     numPeeks = 3961500
     average position in bucket = 0.838
     simplify-types starting
        fixed point starting
        fixed point finished in 0.060
     simplify-types finished in 2.950
     num functions 3606
     num local functions 41655
     num primExps 82150
     numPeeks = 41543485
     average position in property list = 1.095
     numPeeks = 3966951
     average position in bucket = 0.838
     poly-equal starting
     poly-equal finished in 0.170
     num functions 3618
     num local functions 42299
     num primExps 82663
     numPeeks = 41667937
     average position in property list = 1.091
     numPeeks = 3967620
     average position in bucket = 0.838
     contify starting
     contify finished in 2.140
     num functions 3520
     num local functions 42285
     num primExps 82537
     numPeeks = 42529380
     average position in property list = 1.072
     numPeeks = 3967718
     average position in bucket = 0.838
     inline starting
     inline finished in 4.390
     num functions 978
     num local functions 67568
     num primExps 134281
     numPeeks = 44670916
     average position in property list = 1.026
     numPeeks = 3967718
     average position in bucket = 0.838
     remove-unused starting
     remove-unused finished in 0.860
     num functions 978
     num local functions 64888
     num primExps 133203
     numPeeks = 47282825
     average position in property list = 0.971
     numPeeks = 3967731
     average position in bucket = 0.838
     raise-to-jump starting
        inferHandlers starting
        inferHandlers finished in 0.170
     raise-to-jump finished in 3.440
     num functions 978
     num local functions 64832
     num primExps 133178
     numPeeks = 48835397
     average position in property list = 0.944
     numPeeks = 3967731
     average position in bucket = 0.838
     contify starting
     contify finished in 3.080
     num functions 977
     num local functions 64830
     num primExps 133176
     numPeeks = 50122736
     average position in property list = 0.923
     numPeeks = 3967732
     average position in bucket = 0.838
     introduce-loops starting
     introduce-loops finished in 0.050
     num functions 977
     num local functions 64854
     num primExps 133176
     numPeeks = 50297035
     average position in property list = 0.920
     numPeeks = 3967756
     average position in bucket = 0.838
     loop-invariant starting
     loop-invariant finished in 3.010
     num functions 977
     num local functions 62335
     num primExps 126505
     numPeeks = 51676987
     average position in property list = 0.899
     numPeeks = 3967756
     average position in bucket = 0.838
     flatten starting
        analyze starting
        analyze finished in 0.150
     flatten finished in 3.700
     num functions 977
     num local functions 62413
     num primExps 84082
     numPeeks = 53677400
     average position in property list = 0.871
     numPeeks = 3970447
     average position in bucket = 0.838
     redundant starting
     redundant finished in 0.770
     num functions 977
     num local functions 62413
     num primExps 84082
     numPeeks = 54394802
     average position in property list = 0.861
     numPeeks = 3970447
     average position in bucket = 0.838
     remove-unused starting
     remove-unused finished in 2.900
     num functions 977
     num local functions 62119
     num primExps 82433
     numPeeks = 56609014
     average position in property list = 0.829
     numPeeks = 3970460
     average position in bucket = 0.838
      simplify finished in 74.090
   closure convert simplify finished in 74.090
   cps size is 49,689,064 bytes
   numPeeks = 56609014
   average position in property list = 0.829
   numPeeks = 3970460
   average position in bucket = 0.838
   backend starting
      compute representations starting
      compute representations finished in 0.030
      inferHandlers starting
      inferHandlers finished in 0.140
      chunkify starting
      chunkify finished in 2.810
      allocate registers starting
      allocate registers finished in 10.570
      reg size is 4,420 bytes
      numPeeks = 67004509
      average position in property list = 0.916
      numPeeks = 3970460
      average position in bucket = 0.838
   backend finished in 15.630
    size is 59,136,316 bytes
   numPeeks = 68683322
   average position in property list = 0.923
   numPeeks = 3986796
   average position in bucket = 0.838
   x86 code gen starting
      outputC starting
      outputC finished in 0.440
      outputAssembly starting
     translateChunk totals 18.250
     simplify totals 103.800
        verifyLiveInfo totals 31.100
        computeJumpInfo totals 2.680
        elimGoto totals 13.200
           elimIff: 2
           elimSwitch: 17
           elimSimpleGoto totals 2.660
           elimComplexGoto totals 3.980
        verifyJumpInfo totals 1.350
        peepholeBlock_pre totals 2.030
           commuteBinALMD: 476
           elimBinAL0L: 0
           elimBinAL0R: 0
           elimAddSub1: 1787
           elimMDPow2: 183
        toLivenessBlock totals 9.820
        moveHoist totals 14.080
        peepholeLivenessBlock totals 9.840
           elimALCopy: 17039
           elimFltACopy: 20
           elimDeadDsts: 92
           elimSelfMove: 1076
           elimFltSelfMove: 0
           commuteBinALMD: 1059
           commuteFltBinA: 17
           conditionalJump: 3249
        copyPropagate totals 5.190
        peepholeLivenessBlock_minor totals 4.920
           elimDeadDsts_minor: 0
           elimSelfMove_minor: 0
           elimFltSelfMove_minor: 0
        verifyLivenessBlock totals 0.0
        toBlock totals 0.530
        peepholeBlock_post totals 4.930
           elimBinALMDouble: 33
           elimFltBinADouble: 0
           elimCMPTST: 0
        generateTransfers totals 2.290
     allocateRegisters totals 322.390
        toLiveness totals 133.150
        toNoLiveness totals 0.010
        Assembly.allocateRegisters totals 188.560
           Instruction.allocateRegisters totals 98.020
          pre totals 22.920
          post totals 34.130
          allocateOperand totals 24.600
          allocateFltOperand totals 0.0
          allocateFltStackOperands totals 0.0
           Directive.allocateRegisters totals 13.560
     validate totals 0.0
      outputAssembly finished in 452.100
   x86 code gen finished in 496.480
   numPeeks = 73619953
   average position in property list = 0.925
   numPeeks = 4048821
   average position in bucket = 0.841
compile finished in 737.110
gcc -S -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebOiKTi.s /tmp/fileW6a8tX.c
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileVuChSy.o /tmp/filebOiKTi.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filesavxTO.o /tmp/filemC95zY.9.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexn25xG.o /tmp/fileBXQlhQ.8.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileNUFuxn.o /tmp/fileDYOgyZ.7.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexEGwPJ.o /tmp/fileRJxRHU.6.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileWxcryO.o /tmp/fileDhBbVB.5.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileOq5dTS.o /tmp/filey4L0Da.4.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebdLNzJ.o /tmp/filelifDPe.3.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileABQjvN.o /tmp/fileYyPBMd.2.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filer5RlPz.o /tmp/filemVttax.1.s
gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileIn50AH.o /tmp/file4EVnXu.0.s
gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o mlton /tmp/fileVuChSy.o /tmp/fileIn50AH.o /tmp/filer5RlPz.o /tmp/fileABQjvN.o /tmp/filebdLNzJ.o /tmp/fileOq5dTS.o /tmp/fileWxcryO.o /tmp/filexEGwPJ.o /tmp/fileNUFuxn.o /tmp/filexn25xG.o /tmp/filesavxTO.o -L/home/sweeks/mlton/lib -lmlton -lm -lgmp
size mlton
757.02user 7.94system 12:47.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (19269major+397606minor)pagefaults 0swaps
   text       data        bss        dec        hex    filename
3703961     646120      23272    4373353     42bb69    mlton