JIT Compilation
Example #1: Linear Transformation
One might ask how significant (and complex) can the effects of value specific optimization be in just-in-time compilation. To demonstrate the large number of compilation variations available we'll consider an extremely simple case study. Consider the following very simple C function.
static double m; static double c; double foo(double x) { return m*x + c; }
This is almost a minimal example of a unary floating point function, taking a single argument and returning a single floating point result. This function to perform a linear scaling transformation relies on two external parameters which are set up in advance, "m" and "c", both themselves floating point values. A traditional optimizing compiler for x86 would typically generate the following executable code.
foo: fldl x fmull m faddl c ret
The above very short instruction sequence of only three instructions (requiring three loads from memory, a multiplication and an addition) would appear to optimal in the general case. However to a just-in-time compiler there are a large number of potentially better code sequences depending upon the run-time values of "m" and "c".
When m is zero, the result will always be the constant "c", so we can generate the x86/387 code for returning that constant. The 387 instruction set has special instructions for loading the values zero and one, so these can be used when "c" has the appropriate value.
# m is 0.0 and c is 0.0 foo: fldz ret
# m is 0.0 and c is 1.0 foo: fld1 ret
Likewise, it turns out to be faster/shorter to return the constant minus one on the 387, by loading the value one using the special instruction above and then negating the result.
# m is 0.0 and c is -1.0 foo: fld1 fchs ret
If the constant c can be represented exactly as a 4-byte floating point value, it can be stored in memory as a float and automatically extended to double precision as it is read in. This optimization also applies to many of the memory references below, but to save space, I'll only mention it once.
# m is 0.0, and c is (double)(float)c foo: flds c ret
Otherwise we should simply load and return the double constant c.
# m is 0.0. foo: fldl c ret
When m has the value one, we don't actually need to perform the multiplication.
# m is 1.0 foo: fldl c faddl x ret
And when m is one and c is zero, there is neither a multiplication nor an addition to perform.
# m is 1.0 and c is 0.0 foo: fldl x ret
As above when c is one, we can use appropriate instruction.
# m is 1.0 and c is 1.0 foo: fld1 faddl x ret
When c is minus one, we can use a right-subtract instruction instead of loading one and negating it.
# m is 1.0 and c is -1.0 foo: fld1 fsubrl x ret
When m is two, it turns out to be faster to perform an addition rather than a multiplication.
# m is 2.0 foo: fldl x fadd %st(0),%st faddl c ret
Once again when c is zero, we can skip the addition
# m is 2.0 and c is 0.0 foo: fldl x fadd %st(0),%st ret
When m is two and c is two, we can use two additions.
# m is 2.0 and c is 2.0 foo: fld1 faddl x fadd %st(0),%st ret
When m is two and c is minus two, we can use an addition and a subtraction.
# m is 2.0 and c is -2.0 foo: fldl1 fsubrl x fadd %st(0),%st ret
When m is the value minus one, we can again avoid the multiplication.
# m is -1.0 foo: fldl c fsubl x ret
And when m is minus one and c has the value zero, we simplify everything to a negation.
# m is -1.0 and c is 0.0 foo: fldl x fchs ret
And again, as before when c is one, we can use the "fld1" instruction.
# m is -1.0 and c is 1.0 foo: fld1 fsubl x ret
Without introduction, the following cases handle when m is minus two.
# m is -2.0 and c is 0.0 foo: fldl x fadd %st(0),%st fchs ret
# m is -2.0 and c is 2.0 foo: fld1 fsubl x fadd %st(0),%st ret
# m is -2.0 and c is -2.0 foo: fld1 fchs fsubl x fadd %st(0),%st ret
# m is -2.0 foo: fldl x fadd %st(0),%st fsubl c ret
In the general case for m (values other than zero, one, two or minus one), we can skip the addition when c is zero.
# c is 0.0 foo: fldl m fmull x ret
In the case where m and c are the same, we can avoid reading the value from memory twice, and make use of the 387 stack registers.
# m is the same as c foo: fldl mc fldl x fmul %st(1),%st faddp %st,%st(1) ret
Likewise, when m is the negation of c, we can avoid a load from memory.
# m is -c, and c is -m foo: fldl m fldl x fmul %st(1),%st fsubp %st, %st(1) ret
Hopefully, the above helps demonstrate that being aware of the constants used in a floating point expression can have a significant impact on the most efficient code sequences used for evaluation. A similar analysis for code generation of multi-way branching was presented by the author at the 2008 GCC Developer Conference [PDF]. If these sorts of variations are possible for trivial expressions, consider their impact on larger performance critical functions, including dot products, matrix multiplications, regular expression matching, SQL queries and many more.