Most excellent. The new liveness pass that I wrote sped up allocate registers by quite a bit. For a self compile, it cut down the time from 84 seconds to 11 seconds. The pass is based on the liveness algorithm described in section 4.13, page 132, of Morgan's "Building and Optimizing Compiler". BTW, the Dragon book and Muchnick's book provided no help at all on speeding up liveness. They suggest using bit-vectors, which is infeasible for MLton due to the large size of CPS functions. Anyways, here is a description of the new algorithm. Walk over the whole program and 1. Build the predecessor graph of basic blocks. Each basic block records the set of its predecessors and the set of variables live at the beginning of the block. 2. For each variable record the block in which is defined and the list of blocks where it is used. Now, for each variable, propagate the liveness information backwards from uses along basic blocks until the definition block is reached. That's it. The reason why it's so fast is that it processes one variable at a time, and hence the operation to determine if that variable is in the live list for a particular block is constant time -- the variable is either at the head of the list or it's not there. Matthew, maybe you can use some variant of this algorithm, if we don't figure out how to avoid doing liveness twice altogether. Also, here is the log for a self compile with the new pass. The self compile time is down to 765 seconds. This is the first time we're under 800! The funny thing is, the last self compile was at 948 seconds, so the difference can not be explained by this pass alone. The (unchanged) x86 codegen sped up by 59 seconds! My only conjecture is that the backend is leaving around some property lists on labels, which is slowing down the codegen. The new liveness pass doesn't leave quite as many around, so maybe that was helping. I'll look into making sure the backend cleans up after itself. % cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 3 cpu MHz : 731.480093 cache size : 256 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm bogomips : 729.09 Compiling mlton (takes a while) time mlton -v -no-polyvariance mlton.cm MLton 20001130 (built Fri Dec 1 15:47:44 2000 on starlinux.epr.com) created this file on Fri Dec 1 15:49:53 2000. Do not edit this file. Flag settings: aux: false chunk: coalesce 2000 contify strategy: Both defines: [NODEBUG,MLton_safe=TRUE,MLton_detectOverflow=TRUE] fixed heap: None indentation: 3 includes: [mlton.h] inline: NonRecursive {product = 320,small = 60} input file: mlton.cm instrument: false instrument Sxml: false keep Cps: false match: left to right messages: true native: true native-commented: 0 native-copy-prop: true native-ieee-fp: false native-move-hoist: true native-optimize: 1 native-split: Some(100000) polyvariance: None print at fun entry: false profile: false show types: false compile starting parse and elaborate starting parse and elaborate finished in 54.500 core-ml size is 89,907,652 bytes numPeeks = 14 average position in property list = 0.0 numPeeks = 2450808 average position in bucket = 0.221 dead starting dead finished in 0.090 basis size is 867,352 bytes numPeeks = 72743 average position in property list = 0.0 numPeeks = 2450808 average position in bucket = 0.221 size = 187411 gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileuOZ0Ia /tmp/fileuiNTnF.c -L/home/sweeks/mlton/lib -lmlton -lm -lgmp /tmp/fileuOZ0Ia /tmp/fileeaXAYV infer starting unification starting unification finished in 1.570 finish infer starting finish infer finished in 20.950 infer finished in 23.290 xml.unsimplified size is 37,952,500 bytes numPeeks = 1077855 average position in property list = 0.000 numPeeks = 2589647 average position in bucket = 0.271 infer simplify starting infer simplify finished in 3.510 xml size is 22,331,948 bytes numPeeks = 4570806 average position in property list = 0.546 numPeeks = 2589647 average position in bucket = 0.271 size = 122292 num types in program = 20501 num types in table = 35249 hash table size is 0 bytes mono starting mono finished in 5.540 mono.unsimplified size is 42,350,988 bytes numPeeks = 9074433 average position in property list = 0.275 numPeeks = 3306823 average position in bucket = 0.601 mono simplify starting mono simplify finished in 5.740 mono size is 34,543,836 bytes numPeeks = 13789800 average position in property list = 0.428 numPeeks = 3306823 average position in bucket = 0.601 size = 191119 num types in program = 12946 num types in table = 64373 hash table size is 0 bytes implement exceptions starting implement exceptions finished in 0.320 sxml.unsimplified size is 35,101,760 bytes numPeeks = 14086561 average position in property list = 0.419 numPeeks = 3309224 average position in bucket = 0.602 implement exceptions simplify starting implement exceptions simplify finished in 2.970 sxml size is 33,016,448 bytes numPeeks = 17578537 average position in property list = 0.470 numPeeks = 3309224 average position in bucket = 0.602 polyvariance starting polyvariance finished in 0.0 sxml.poly size is 33,016,448 bytes numPeeks = 17578537 average position in property list = 0.470 numPeeks = 3309224 average position in bucket = 0.602 size = 178729 num types in program = 12513 num types in table = 64653 hash table size is 0 bytes closure convert starting flow analysis starting flow analysis finished in 0.760 flow size is 4,252 bytes numPeeks = 18621855 average position in property list = 0.443 numPeeks = 3325768 average position in bucket = 0.605 free variables starting free variables finished in 0.380 globalize starting globalize finished in 0.290 convert starting convert finished in 32.310 closure convert finished in 34.180 cps.unsimplified size is 69,048,712 bytes numPeeks = 25297174 average position in property list = 1.679 numPeeks = 3685461 average position in bucket = 0.672 closure convert simplify starting simplify starting num functions 12069 num local functions 138606 num primExps 156489 numPeeks = 25548537 average position in property list = 1.662 numPeeks = 3685461 average position in bucket = 0.672 remove-unused starting remove-unused finished in 4.440 num functions 10422 num local functions 80482 num primExps 139488 numPeeks = 28693549 average position in property list = 1.483 numPeeks = 3685786 average position in bucket = 0.672 leaf-inline starting inline starting inline finished in 4.020 leaf-inline finished in 4.020 num functions 7864 num local functions 57906 num primExps 138322 numPeeks = 30304546 average position in property list = 1.411 numPeeks = 3685786 average position in bucket = 0.672 raise-to-jump starting inferHandlers starting inferHandlers finished in 0.160 raise-to-jump finished in 4.650 num functions 7864 num local functions 57524 num primExps 138295 numPeeks = 31724934 average position in property list = 1.357 numPeeks = 3685786 average position in bucket = 0.672 contify starting contify finished in 3.110 num functions 3668 num local functions 54020 num primExps 130444 numPeeks = 32978353 average position in property list = 1.312 numPeeks = 3689431 average position in bucket = 0.674 constant propagation starting inferHandlers starting inferHandlers finished in 0.140 fixed point starting fixed point finished in 3.620 constant propagation finished in 7.940 num functions 3668 num local functions 53397 num primExps 95704 numPeeks = 35346094 average position in property list = 1.244 numPeeks = 3823941 average position in bucket = 0.823 useless starting analyze starting analyze finished in 5.060 useless finished in 9.970 num functions 3668 num local functions 50915 num primExps 87502 numPeeks = 37873441 average position in property list = 1.186 numPeeks = 3961408 average position in bucket = 0.838 remove-unused starting remove-unused finished in 2.550 num functions 3606 num local functions 49804 num primExps 85347 numPeeks = 39588129 average position in property list = 1.136 numPeeks = 3961500 average position in bucket = 0.838 simplify-types starting fixed point starting fixed point finished in 0.060 simplify-types finished in 2.950 num functions 3606 num local functions 41655 num primExps 82150 numPeeks = 41543485 average position in property list = 1.095 numPeeks = 3966951 average position in bucket = 0.838 poly-equal starting poly-equal finished in 0.170 num functions 3618 num local functions 42299 num primExps 82663 numPeeks = 41667937 average position in property list = 1.091 numPeeks = 3967620 average position in bucket = 0.838 contify starting contify finished in 2.140 num functions 3520 num local functions 42285 num primExps 82537 numPeeks = 42529380 average position in property list = 1.072 numPeeks = 3967718 average position in bucket = 0.838 inline starting inline finished in 4.390 num functions 978 num local functions 67568 num primExps 134281 numPeeks = 44670916 average position in property list = 1.026 numPeeks = 3967718 average position in bucket = 0.838 remove-unused starting remove-unused finished in 0.860 num functions 978 num local functions 64888 num primExps 133203 numPeeks = 47282825 average position in property list = 0.971 numPeeks = 3967731 average position in bucket = 0.838 raise-to-jump starting inferHandlers starting inferHandlers finished in 0.170 raise-to-jump finished in 3.440 num functions 978 num local functions 64832 num primExps 133178 numPeeks = 48835397 average position in property list = 0.944 numPeeks = 3967731 average position in bucket = 0.838 contify starting contify finished in 3.080 num functions 977 num local functions 64830 num primExps 133176 numPeeks = 50122736 average position in property list = 0.923 numPeeks = 3967732 average position in bucket = 0.838 introduce-loops starting introduce-loops finished in 0.050 num functions 977 num local functions 64854 num primExps 133176 numPeeks = 50297035 average position in property list = 0.920 numPeeks = 3967756 average position in bucket = 0.838 loop-invariant starting loop-invariant finished in 3.010 num functions 977 num local functions 62335 num primExps 126505 numPeeks = 51676987 average position in property list = 0.899 numPeeks = 3967756 average position in bucket = 0.838 flatten starting analyze starting analyze finished in 0.150 flatten finished in 3.700 num functions 977 num local functions 62413 num primExps 84082 numPeeks = 53677400 average position in property list = 0.871 numPeeks = 3970447 average position in bucket = 0.838 redundant starting redundant finished in 0.770 num functions 977 num local functions 62413 num primExps 84082 numPeeks = 54394802 average position in property list = 0.861 numPeeks = 3970447 average position in bucket = 0.838 remove-unused starting remove-unused finished in 2.900 num functions 977 num local functions 62119 num primExps 82433 numPeeks = 56609014 average position in property list = 0.829 numPeeks = 3970460 average position in bucket = 0.838 simplify finished in 74.090 closure convert simplify finished in 74.090 cps size is 49,689,064 bytes numPeeks = 56609014 average position in property list = 0.829 numPeeks = 3970460 average position in bucket = 0.838 backend starting compute representations starting compute representations finished in 0.030 inferHandlers starting inferHandlers finished in 0.140 chunkify starting chunkify finished in 2.810 allocate registers starting allocate registers finished in 10.570 reg size is 4,420 bytes numPeeks = 67004509 average position in property list = 0.916 numPeeks = 3970460 average position in bucket = 0.838 backend finished in 15.630 size is 59,136,316 bytes numPeeks = 68683322 average position in property list = 0.923 numPeeks = 3986796 average position in bucket = 0.838 x86 code gen starting outputC starting outputC finished in 0.440 outputAssembly starting translateChunk totals 18.250 simplify totals 103.800 verifyLiveInfo totals 31.100 computeJumpInfo totals 2.680 elimGoto totals 13.200 elimIff: 2 elimSwitch: 17 elimSimpleGoto totals 2.660 elimComplexGoto totals 3.980 verifyJumpInfo totals 1.350 peepholeBlock_pre totals 2.030 commuteBinALMD: 476 elimBinAL0L: 0 elimBinAL0R: 0 elimAddSub1: 1787 elimMDPow2: 183 toLivenessBlock totals 9.820 moveHoist totals 14.080 peepholeLivenessBlock totals 9.840 elimALCopy: 17039 elimFltACopy: 20 elimDeadDsts: 92 elimSelfMove: 1076 elimFltSelfMove: 0 commuteBinALMD: 1059 commuteFltBinA: 17 conditionalJump: 3249 copyPropagate totals 5.190 peepholeLivenessBlock_minor totals 4.920 elimDeadDsts_minor: 0 elimSelfMove_minor: 0 elimFltSelfMove_minor: 0 verifyLivenessBlock totals 0.0 toBlock totals 0.530 peepholeBlock_post totals 4.930 elimBinALMDouble: 33 elimFltBinADouble: 0 elimCMPTST: 0 generateTransfers totals 2.290 allocateRegisters totals 322.390 toLiveness totals 133.150 toNoLiveness totals 0.010 Assembly.allocateRegisters totals 188.560 Instruction.allocateRegisters totals 98.020 pre totals 22.920 post totals 34.130 allocateOperand totals 24.600 allocateFltOperand totals 0.0 allocateFltStackOperands totals 0.0 Directive.allocateRegisters totals 13.560 validate totals 0.0 outputAssembly finished in 452.100 x86 code gen finished in 496.480 numPeeks = 73619953 average position in property list = 0.925 numPeeks = 4048821 average position in bucket = 0.841 compile finished in 737.110 gcc -S -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebOiKTi.s /tmp/fileW6a8tX.c gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileVuChSy.o /tmp/filebOiKTi.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filesavxTO.o /tmp/filemC95zY.9.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexn25xG.o /tmp/fileBXQlhQ.8.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileNUFuxn.o /tmp/fileDYOgyZ.7.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filexEGwPJ.o /tmp/fileRJxRHU.6.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileWxcryO.o /tmp/fileDhBbVB.5.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileOq5dTS.o /tmp/filey4L0Da.4.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filebdLNzJ.o /tmp/filelifDPe.3.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileABQjvN.o /tmp/fileYyPBMd.2.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/filer5RlPz.o /tmp/filemVttax.1.s gcc -c -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o /tmp/fileIn50AH.o /tmp/file4EVnXu.0.s gcc -DNODEBUG -DMLton_safe=TRUE -DMLton_detectOverflow=TRUE -I/home/sweeks/mlton/include -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -o mlton /tmp/fileVuChSy.o /tmp/fileIn50AH.o /tmp/filer5RlPz.o /tmp/fileABQjvN.o /tmp/filebdLNzJ.o /tmp/fileOq5dTS.o /tmp/fileWxcryO.o /tmp/filexEGwPJ.o /tmp/fileNUFuxn.o /tmp/filexn25xG.o /tmp/filesavxTO.o -L/home/sweeks/mlton/lib -lmlton -lm -lgmp size mlton 757.02user 7.94system 12:47.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (19269major+397606minor)pagefaults 0swaps text data bss dec hex filename 3703961 646120 23272 4373353 42bb69 mlton