Building incredible multiplayer games since 2012.

Making Zombies Run (Fast) in Haxe

Dan Ogles

Making Zombies Run (Fast) in Haxe

As we’ve talked about before, we use the Haxe programming language at Proletariat. This allows us to write the same code and translate it to multiple languages: C# in Unity and Javascript for our Node.js servers. One question I often hear from developers when I talk about this is, “isn’t that slow?” There is an expectation that writing in one language and then translating that into another language would introduce all sorts of inefficiencies that would cause your programs to run much slower.

The answer might surprise you: in many cases, Haxe code is actually faster! I’m going to talk about some of the features of Haxe that allow for that, as well as suggest some things that I think Haxe could add or improve on in the future.

Static Typing

Haxe is a statically-typed language, meaning that every variable’s type is known at compile-time. This is beneficial for development because it allows the compiler to find errors that may have been missed, particularly in large and complex codebases. Another advantage, however, is that platforms that are also statically-typed can take advantage of this. If Haxe knows a variable is an “int,” then the C++ output will be an “int.” In general, the datatypes in Haxe map very closely to the datatypes in native targets, with little or no overhead involved. While this can mean occasional subtle differences in behavior between targets, it’s generally a very good thing and indicative of Haxe keeping performance in mind.

Inline Functions

This is the single biggest performance feature in Haxe. Function calls can be expensive, depending on your platform target. Dynamic languages such as Javascript and PHP can incur a lot of function call overhead, because calling a function involves name resolution. This can be a big problem for small functions that get called a lot. In Haxe, you can inline functions, so that the function’s code is placed directly at the point where it is called, avoiding the overhead entirely.

Even for languages that support inlining or may inline functions during JIT compilation, like C++, Java, and C#, using Haxe’s function inlining gives you more direct control.

Avoiding Allocations

Allocating new object instances can be expensive. The Haxe compiler is actually smart enough to avoid memory allocations altogether in some cases. Check out the example here:

class Point {
  	public var x:Int;
	public var y:Int;
	public inline function new(x:Int, y:Int) {
  		this.x = x;
    		this.y = y;
  	}
  	public inline function getLength() : Float {
    		return Math.sqrt(x*x + y*y);	
  	}
    	public inline function toString() return '($x, $y)';
}

class Test {
    static function main() {
	var x = new Point(10, 10);
        trace(x.getLength());
    }
}

The main function apparently allocates a Point object and traces its length. However, take a look at the Javascript code that generates:

Test.main = function() {
	console.log(Math.sqrt(200));
};

What happened? The Haxe compiler was smart enough to realize that it didn’t need to allocate the Point class at all. Furthermore, the call to getLength() was inlined, and it knew that x and y were constant, so could calculate at compile time x*x + y*y = 200. This is a simple example, but shows that Haxe does far more than simple transliteration of your code. It’s smart!

Native Access

At some point, to get full performance, you really need to use some features particular to your native platform. Haxe is cross-platform by default, but gives you plenty of escape hatches if you need them. The Haxe/C++ and Neko targets have the CFFI (C Foreign Function Interface), which allow you to directly call C functions from inside Haxe. The C# target allows you to directly import any .NET class and use it seamlessly. The Javascript target allows you to simply place Javascript code inline with your Haxe code, if you wish. These features are really critical if you’re trying to squeeze that last bit of performance out of your app.

Fast Libraries

There are a bunch of fantastic libraries for Haxe out there that have high performance and support multiple targets. The OpenFL project is a Flash-compatible API, but allows you to target HTML 5 or Native C++ for desktop and mobile (and soon, consoles). Nape is a very fast 2D physics engine. Kha is a new low-level 2D/3D graphics and sound API that supports a wide variety of targets. All these libraries are open-source and available on haxelib, making them easy to install and integrate with.

Open Source

The Haxe compiler and standard library are all Open Source and available on Github. While this obviously doesn’t improve performance by itself, what it does mean is that you can see exactly what the compiler and runtime is doing, and submit pull requests for improvements. Proletariat has contributed a couple of small changes to the compiler in the past that improved the generated code, leading to faster runtime performance.

Overall, Haxe has shown to be a really amazing tool for developing high-performance cross-platform games. That said, there is always room for improvement! Here are some suggestions I have for future versions of Haxe that will help developers write even faster code.

1. Array changes

I would like to see the code data structures like Array become closer to their native counterparts where possible. For instance, right now, this is legal Haxe code:

var x = [];
x[2000] = 1;
return x[3000];

The problem with this is that the native targets must insert bounds checks in both the array access getter and setter, that check against the length of the array. In the getter case, if the index is past the end of the array, null is returned. In the setter case (if the array is not large enough), it will be resized. These bounds checks are often in addition to the bounds checks that the native target does already (e.g., C# will throw an exception if you read past the end of an array).

This behavior is generally not an issue, unless you start iterating through hundreds/thousands of elements. One might argue you should use a Vector in that case, and in some instances that’s true. The downside is that you lose some syntactic sugar, and that Vectors are not resizable. It’s possible to write your own class that wraps a Vector and resizes as necessary, but doing this in an optimal way for each platform would be challenging.

Philosophically, it seems to me that if one of Haxe’s selling points is performance, the built-in data structures should err toward the highest performance available on the target platform. I think it would be preferable to have out-of-bounds array access as “undefined behavior,” possibly throwing an exception when compiling with -debug. Explicit “get”/”set” functions could be provided to provide the old behavior. Alternatively, if the current behavior is desired by most, a compile flag to optionally turn off these checks would be helpful.

Other important Array functions that could be added:

  • reserve: provide a hint to the runtime to pre-allocate memory for the array. This could safely no-op in platforms that don’t support it.
  • shrinkToFit: provide a hint to the runtime to reduce its allocated memory to the current length of the array. Can no-op on platforms that don’t support it.
  • data: Return a platform-specific reference to the underlying buffer.
  • set length/clear: allow you to trim or empty an array, without deallocating memory. This is useful if you want to reuse the underlying buffer.
  • spliceVoid: remove a section of the array, without allocating a separate array to return the removed section
  • removeUnordered: remove an element from the array, without guaranteeing the new ordering of the array. This avoids the need to shift all array elements.
  • removeUnorderedByIndex: avoid searching if you already know the index of the item to remove.

Some of these can be done with helper functions, so maybe don’t need to be in Array itself.

2. Anonymous Structure  Tax

Haxe has a handy feature where you can declare ad-hoc anonymous types, and still allow them to be type checked:

var x = { name:”Dan”, gender:”Male” };
trace(x.age); // Compile error! (Wouldn’t you like to know! ;)

On platforms like C++ and C#, using anonymous typesinstead of classes can have severe performance implications. In those platforms, reflection must be used to access these elements, which is much slower than normal access. This unfortunately means that anonymous types  are generally off-limits in code that needs high performance. The biggest offenders here are Iterator and Iterable: their methods are called via reflection.

One solution tossed around was to make native interfaces for all anonymous types in generated code, and automatically have any class that is compatible with that typedef implement the interface. One problem with this is that it would not work with native or extern classes, since they could not implement these interfaces. I would argue this is the exception rather than the rule, but perhaps a typedef could be marked with something like @:dynamic to indicate that it needs to operate through reflection.

A cheaper solution that would solve more than half the problem is just to make Iterator and Iterable proper interfaces in Haxe, rather than typedefs. 🙂

Memory Layout

The bane of modern processors is indirection: CPUs are so much faster than the memory subsystem that cache misses can easily end up taking the majority of time in an iteration-heavy algorithm. For best performance, it is generally optimal to arrange your data as flatly/linearly as possible in order to keep the CPU cache primed. For more information on the impact memory access can have on runtime performance, and the advantages of Data-Oriented Design, I highly recommend Mike Acton’s talk on the subject.

I think the most illustrative example of the problem here is a Point class:

class Point { var x:Float; var y:Float; }

and I create a Shape class, which has an origin and a type:

class Shape { var origin:Point; var type:ShapeType; }

Then I make a list of all the shapes in the scene:

var shapes : Array<Shape>;

This is how the shapes list looks in memory:

image00

There are multiple layers of indirection here. The shape array in memory is actually an array of pointers, and each Shape has a pointer to a Point. This is the canonical way to represent this in Haxe, and it’s extremely bad for cache coherency. Iterating over the list of shapes and accessing their origins requires the CPU to grab memory from all over the place, completely thrashing the cache. This is how we’d like this to look, for better cache coherency:

image01

In C, these datatypes are known as “structs.” There are (differing) ways to mark a Haxe class as a struct in both the C# and C++ targets. Unfortunately, you lose cross-platform capability if you do this, because being a struct implies copy-on-assignment. This means that assignment of a variable with @:struct behaves differently in Javascript than in C++.

Supporting structs in all language targets would be challenging (or impossible). Many targets do not have a native concept of structs. However, it should be possible to make @:struct classes imply copy-on-assignment for all targets. This would mean that @:struct would work consistently on all targets: in Javascript, assignment of a @:struct variable would actually translate to an auto-generated “copy” function call that would return a new instance. In effect, @:struct would act as a hint for targets that support it, but compatibility with all targets would be preserved.

Language support for structs only solves part of the problem, though. Sometimes you might want a proper class, but still be able to “embed” that class inside another instance. In C++, if you have a member variable of ClassType, the members of ClassType are organized linearly inside the host class. In Haxe and many other languages, a member variable of ClassType is represented as a pointer to an instance somewhere else in memory, where all its member variables reside. This has the unfortunate side-effect of making modularization have an associated performance cost, due to added indirection.

One way to solve this would be to have some metadata hint to the compiler indicating that a member variable be laid out linearly in memory:

class Foo { var x:Int; var y:Int; }

class Bar {
	@:embed var foo : Foo;
}

would actually translate to:
class Bar {
	var __foo__x:Int; var __foo__y:Int;
}

That’s a vast simplification of course; an obvious problem is that no Point actually exists at runtime anymore, meaning this code couldn’t work:

var bar = new Bar();
var foo:Foo = bar.foo; // foo member variable doesn’t actually exist!

Solving this is non-trivial. Instead, Haxe could limit the ability of @:embed’ed member variables, so that they are disallowed as ‘rvalues’ in any expression. In essence, there is no way to retain a reference to them, but you can call methods or access variables on the embedded class.

Allocation and Reducing GC pressure

Haxe is fundamentally a garbage-collected language, which has its advantages and disadvantages. Some might not know that a modern garbage collector can actually be faster than manual memory management in some cases! However, garbage collection in Haxe has a couple issues. First of all, not every target platform has a modern garbage collector—some are quite slow. Secondly, garbage collectors require occasional “stop-the-world” mark and sweep phases that can cause hitches in framerate, which is a problem for games in particular.

One way to reduce the number of allocations in games is via object pools: instead of allocating a new object every time, keep an array of previously allocated objects and reuse them. Once you are done with an object, return it back to the pool for further reuse. This does much to alleviate garbage collection overhead, but it is not without its issues. First, there isn’t a “transparent” way to implement pooling in Haxe. The caller has to know that they must ask the pool for an instance instead of just using “new.” Second, you now have a bunch of objects hanging around that aren’t being used, taking up memory. In some cases, you may need to limit the pool size or even deallocate the pool when you know you won’t be allocating more objects of that type. Third, pooling can cause issues in multithreaded code. You either have to give each thread its own pool, use a critical section to guard the pool, or use some sort of lockless implementation.

Haxe could alleviate some of these issues by adding object pooling to the standard library, with a well-written implementation for each target platform. It would be very nice to have some way of declaratively marking a type as being pooled, so that “new” would allocate from the pool instead of from the heap.

Pooling works great for large, heavyweight objects, but the above issues can cause a lot of overhead for smaller objects (like the aforementioned Point class). One important advantage of @:struct and @:embed support is that it would allow developers to reduce the number of allocations, and thus the amount of work the garbage collector has to do. Since structs are allocated on the stack, or linearly in memory within their host class, they do not require a separate allocation and thus don’t need to be tracked by the garbage collector.

32-bit Floating Point

Haxe’s floating point format is 64-bit double-precision. This is the only floating datatype that is supported across all target platforms, but it has performance implications. Games will typically use 32-bit floating point precision for 3D math, because it’s faster and the extra precision normally isn’t required.

Frustratingly, there are two ways of using 32-bit floats in Haxe: the C++ uses Float32, whereas C# and Java use Single. Neither of these datatypes are allowed on other targets. There is no good way to emulate 32-bit floating point across all platforms. Newer Javascript targets have optimized Math.fround in order to support asm.js, but other dynamic languages like Python and PHP are out of luck. I suggest allowing the Single datatype to be used on all targets, but use double precision when single precision isn’t available. Emscripten, for example, does this by default.

Another annoyance with 32-bit floats in Haxe is that math operations are often promoted to double-precision. For example:

function div2(x:Single) {
  return x / 2.0;
}

Will actually perform the division using 64-bit Floats, rather than 32-bit Singles. Many languages support the “f” postfix on literals to indicate a 32-bit float. I propose Haxe support it as well:

function div2(x:Single) {
  return x / 2.0f;
}

Conclusion

Haxe has a lot going for it: it’s a modern language that is highly adaptable, without throwing performance out the window. In many cases, Haxe will actually generate smarter/faster code than you would’ve written yourself in the target language. I’m looking forward to seeing additional performance improvements Haxe will see in the future!

 

Dan Ogles
Chief Technical Officer

2 Comments
  1. FWIW, you might want to consider creating your own abstract over haxe.ds.Vector instead of using Array, because it gives you finer control over size and such.

  2. @BACK2DOS, It’s true, that would solve solve problems. Unfortunately, we would miss out on some syntactic niceties like array constructors and comprehensions. That could be partially mitigated with macros though, perhaps. Thanks for the suggestion!

Leave a Reply

On Twitter