JIT Compilation
Example #1: Linear Transformation
One might ask how significant (and complex) can the effects of value specific optimization be in just-in-time compilation. To demonstrate the large number of compilation variations available we'll consider an extremely simple case study. Consider the following very simple C function.
static double m;
static double c;
double foo(double x)
{
return m*x + c;
}
This is almost a minimal example of a unary floating point function, taking a single argument and returning a single floating point result. This function to perform a linear scaling transformation relies on two external parameters which are set up in advance, "m" and "c", both themselves floating point values. A traditional optimizing compiler for x86 would typically generate the following executable code.
foo: fldl x
fmull m
faddl c
ret
The above very short instruction sequence of only three instructions (requiring three loads from memory, a multiplication and an addition) would appear to optimal in the general case. However to a just-in-time compiler there are a large number of potentially better code sequences depending upon the run-time values of "m" and "c".
When m is zero, the result will always be the constant "c", so we can generate the x86/387 code for returning that constant. The 387 instruction set has special instructions for loading the values zero and one, so these can be used when "c" has the appropriate value.
# m is 0.0 and c is 0.0
foo: fldz
ret
# m is 0.0 and c is 1.0
foo: fld1
ret
Likewise, it turns out to be faster/shorter to return the constant minus one on the 387, by loading the value one using the special instruction above and then negating the result.
# m is 0.0 and c is -1.0
foo: fld1
fchs
ret
If the constant c can be represented exactly as a 4-byte floating point value, it can be stored in memory as a float and automatically extended to double precision as it is read in. This optimization also applies to many of the memory references below, but to save space, I'll only mention it once.
# m is 0.0, and c is (double)(float)c
foo: flds c
ret
Otherwise we should simply load and return the double constant c.
# m is 0.0.
foo: fldl c
ret
When m has the value one, we don't actually need to perform the multiplication.
# m is 1.0
foo: fldl c
faddl x
ret
And when m is one and c is zero, there is neither a multiplication nor an addition to perform.
# m is 1.0 and c is 0.0
foo: fldl x
ret
As above when c is one, we can use appropriate instruction.
# m is 1.0 and c is 1.0
foo: fld1
faddl x
ret
When c is minus one, we can use a right-subtract instruction instead of loading one and negating it.
# m is 1.0 and c is -1.0
foo: fld1
fsubrl x
ret
When m is two, it turns out to be faster to perform an addition rather than a multiplication.
# m is 2.0
foo: fldl x
fadd %st(0),%st
faddl c
ret
Once again when c is zero, we can skip the addition
# m is 2.0 and c is 0.0
foo: fldl x
fadd %st(0),%st
ret
When m is two and c is two, we can use two additions.
# m is 2.0 and c is 2.0
foo: fld1
faddl x
fadd %st(0),%st
ret
When m is two and c is minus two, we can use an addition and a subtraction.
# m is 2.0 and c is -2.0
foo: fldl1
fsubrl x
fadd %st(0),%st
ret
When m is the value minus one, we can again avoid the multiplication.
# m is -1.0
foo: fldl c
fsubl x
ret
And when m is minus one and c has the value zero, we simplify everything to a negation.
# m is -1.0 and c is 0.0
foo: fldl x
fchs
ret
And again, as before when c is one, we can use the "fld1" instruction.
# m is -1.0 and c is 1.0
foo: fld1
fsubl x
ret
Without introduction, the following cases handle when m is minus two.
# m is -2.0 and c is 0.0
foo: fldl x
fadd %st(0),%st
fchs
ret
# m is -2.0 and c is 2.0
foo: fld1
fsubl x
fadd %st(0),%st
ret
# m is -2.0 and c is -2.0
foo: fld1
fchs
fsubl x
fadd %st(0),%st
ret
# m is -2.0
foo: fldl x
fadd %st(0),%st
fsubl c
ret
In the general case for m (values other than zero, one, two or minus one), we can skip the addition when c is zero.
# c is 0.0
foo: fldl m
fmull x
ret
In the case where m and c are the same, we can avoid reading the value from memory twice, and make use of the 387 stack registers.
# m is the same as c
foo: fldl mc
fldl x
fmul %st(1),%st
faddp %st,%st(1)
ret
Likewise, when m is the negation of c, we can avoid a load from memory.
# m is -c, and c is -m
foo: fldl m
fldl x
fmul %st(1),%st
fsubp %st, %st(1)
ret
Hopefully, the above helps demonstrate that being aware of the constants used in a floating point expression can have a significant impact on the most efficient code sequences used for evaluation. A similar analysis for code generation of multi-way branching was presented by the author at the 2008 GCC Developer Conference [PDF]. If these sorts of variations are possible for trivial expressions, consider their impact on larger performance critical functions, including dot products, matrix multiplications, regular expression matching, SQL queries and many more.
