NextMove Software
  • Home
  • Products
  • Technology
  • About Us

Corporate Offices

NextMove Software
Innovation Centre
320 Cambridge Science Park
Milton Road
Cambridge
England
CB4 0WG
General Inquiries: info@nextmovesoftware.com
Support: support@nextmovesoftware.com

JIT Compilation

Example #1: Linear Transformation

One might ask how significant (and complex) can the effects of value specific optimization be in just-in-time compilation. To demonstrate the large number of compilation variations available we'll consider an extremely simple case study. Consider the following very simple C function.

static double m;
static double c;

double foo(double x)
{
  return m*x + c;
}

This is almost a minimal example of a unary floating point function, taking a single argument and returning a single floating point result. This function to perform a linear scaling transformation relies on two external parameters which are set up in advance, "m" and "c", both themselves floating point values. A traditional optimizing compiler for x86 would typically generate the following executable code.

foo:    fldl x
        fmull m
        faddl c
	ret

The above very short instruction sequence of only three instructions (requiring three loads from memory, a multiplication and an addition) would appear to optimal in the general case. However to a just-in-time compiler there are a large number of potentially better code sequences depending upon the run-time values of "m" and "c".

When m is zero, the result will always be the constant "c", so we can generate the x86/387 code for returning that constant. The 387 instruction set has special instructions for loading the values zero and one, so these can be used when "c" has the appropriate value.

# m is 0.0 and c is 0.0
foo:    fldz
        ret
# m is 0.0 and c is 1.0
foo:    fld1
        ret

Likewise, it turns out to be faster/shorter to return the constant minus one on the 387, by loading the value one using the special instruction above and then negating the result.

# m is 0.0 and c is -1.0
foo:    fld1
        fchs
	ret

If the constant c can be represented exactly as a 4-byte floating point value, it can be stored in memory as a float and automatically extended to double precision as it is read in. This optimization also applies to many of the memory references below, but to save space, I'll only mention it once.

# m is 0.0, and c is (double)(float)c
foo:    flds c
        ret

Otherwise we should simply load and return the double constant c.

# m is 0.0.
foo:    fldl c
        ret

When m has the value one, we don't actually need to perform the multiplication.

# m is 1.0
foo:    fldl c
        faddl x
	ret

And when m is one and c is zero, there is neither a multiplication nor an addition to perform.

# m is 1.0 and c is 0.0
foo:    fldl x
        ret

As above when c is one, we can use appropriate instruction.

# m is 1.0 and c is 1.0
foo:    fld1
        faddl x
        ret

When c is minus one, we can use a right-subtract instruction instead of loading one and negating it.

# m is 1.0 and c is -1.0
foo:    fld1
        fsubrl x
        ret

When m is two, it turns out to be faster to perform an addition rather than a multiplication.

# m is 2.0
foo:    fldl x
        fadd %st(0),%st
        faddl c
	ret

Once again when c is zero, we can skip the addition

# m is 2.0 and c is 0.0
foo:    fldl x
        fadd %st(0),%st
	ret

When m is two and c is two, we can use two additions.

# m is 2.0 and c is 2.0
foo:    fld1
        faddl x
        fadd %st(0),%st
	ret

When m is two and c is minus two, we can use an addition and a subtraction.

# m is 2.0 and c is -2.0
foo:    fldl1
        fsubrl x
        fadd %st(0),%st
	ret

When m is the value minus one, we can again avoid the multiplication.

# m is -1.0
foo:    fldl c
        fsubl x
	ret

And when m is minus one and c has the value zero, we simplify everything to a negation.

# m is -1.0 and c is 0.0
foo:    fldl x
        fchs
	ret

And again, as before when c is one, we can use the "fld1" instruction.

# m is -1.0 and c is 1.0
foo:    fld1
        fsubl x
	ret

Without introduction, the following cases handle when m is minus two.

# m is -2.0 and c is 0.0
foo:    fldl x
        fadd %st(0),%st
        fchs
        ret
# m is -2.0 and c is 2.0
foo:    fld1
        fsubl x
        fadd %st(0),%st
        ret
# m is -2.0 and c is -2.0
foo:    fld1
        fchs
        fsubl x
        fadd %st(0),%st
        ret
# m is -2.0
foo:    fldl x
        fadd %st(0),%st
        fsubl c
        ret

In the general case for m (values other than zero, one, two or minus one), we can skip the addition when c is zero.

# c is 0.0
foo:    fldl m
        fmull x
        ret

In the case where m and c are the same, we can avoid reading the value from memory twice, and make use of the 387 stack registers.

# m is the same as c
foo:    fldl mc
        fldl x
        fmul %st(1),%st
        faddp %st,%st(1)
        ret

Likewise, when m is the negation of c, we can avoid a load from memory.

# m is -c, and c is -m
foo:    fldl m
        fldl x
        fmul %st(1),%st
        fsubp %st, %st(1)
	ret

Hopefully, the above helps demonstrate that being aware of the constants used in a floating point expression can have a significant impact on the most efficient code sequences used for evaluation. A similar analysis for code generation of multi-way branching was presented by the author at the 2008 GCC Developer Conference [PDF]. If these sorts of variations are possible for trivial expressions, consider their impact on larger performance critical functions, including dot products, matrix multiplications, regular expression matching, SQL queries and many more.

©2025 NextMove Software. All rights reserved.