Working our way towards release we're currently going through a heavy bugfixing phase on the Macintosh version of Eets. One particularly interesting bug was causing incorrect level replays and the official solution video (not really videos but ingame replays) to be incorrect. We had a floating point determinism problem
Level completion replays and solution videos in Eets work by replaying user input. It makes a lot of sense to do this as user input is, relatively speaking, low frequency. To achieve the same results by recording the state of the all the objects in the game would make replay files a *lot* larger. (Consequently - The same technique can be used in writing a network game. Often known as the lockstep technique)
To be able to playback recorded user input into the game engine and have it play out exactly the same, the engine must be completely deterministic. A couple of key components needs to be addressed or things go very wrong.
For starters, the engine's random number generators are in fact not random at all. They need to play out the same random appearing numbers after being seeded each time. Secondly the mathematics of gameplay and engine need to be completely deterministic. This is actually not as easy as it sounds.
Next up, the physics engine needs to be designed from the beginning with determinism in mind. In particular iterative solvers are likely culprits for breaking determinism. Finally, the floating point math in all of it, needs to be deterministic. The floating point math situation is pontially one of the most tricky parts. Calls out to function in the operating system and other libraries - you frequently have not control over and they vary from platform to platform.
Once would think floating point math would always yield the same results, but the results actually vary slightly between processor, operating system, compiler and instruction set. Typically it's rounding method differences that are at play in this diverging scenario. (Did you know? - banking software avoids floats or doubles because of the way they handle rounding).
When floating point math was first catered for in silicon; a number of different ways of doing things made it out into the wild. The bulk of the computing public is using x86 type chips these days. In the early days of x86 floating point math was done in software, and understandably it was pretty slow. At some point the x87 co-processors were introduced. They were physically separate processors, and had their own instruction sets. Now the same instruction set exists today within your average Intel and AMD processor. It still gets used in software today but there are even more possibilities thrown into the mix. First came MMX (the multimedia instruction set), then MMX2, SSE, SSE2, and finally SEE3. Not to mention 3DNow and a similarly targeted instructions set. All of these instructions sets and their corresponding silicon implement floating point math in various ways and to varying degrees.
The Institute of Electrical and Electronic Engineers ratified a standard way of doing floating point math. The standard is known as ieee754. Making sure that your floating point math happens in a standardised way goes a long way to reducing the potential for different results.
Since most games engines update through time iteratively a small error early in the piece can create vastly different results down the track.
Mathematical methods like cos, tan and the other trigonometric functions are also common causes for different results on systems. The reason being that they're so called transcendental functions. That is they generate results by evaluating geometric series. Unfortunately this allows for plenty of scope for error.
Some tips on how to locate and improve floating point math determinism problems.
- Use modern ieee754 compliant processor instruction sets SSE and up. Many compilers can be told to automatically use them for floating point math, otherwise you can manually use them via compiler intrinsics or assembly code.
- Make sure you know what level of floating point optimisations are being using by the compiler when compiling. For the Microsoft Compilers look for problematic switches like /fp:fast. For gcc look for -mfpmath=sse and -msse and/or -msee2(x86 specific).
- Check results from Transcendental functions (tan, sin, cos and their ilk). If they're causing problems what in software versions outside of the system library that you have control over.