< Other articles

LLE vs HLE and their tradeoffs

AuthorAlexandro Sanchez Date2018-04-18

Introduction

This article aims to give an intuitive understanding for the terms "Low-Level Emulation" (LLE) and "High-Level Emulation" (HLE) often heard in the emulation scene, their differences and tradeoffs in development/performance costs, and how developers choose one paradigm or the other.

Machines are made of several layers of abstraction, each of them relying in the layer below to perform some particular task. In the context of gaming consoles, you might consider these layers (ordered from higher to lower level):

That's where these "low-level" or "high-level" terms come from. Something is more "high-level" when it has more layers of abstraction below it, and it's more "low-level" when it has more layers of abstraction above it. With so many layers, the terms "low" and "high" can become quite subjective (developers can't even agree about whether some emulators are HLE or LLE). Furthermore, you could go even below than hardware-level and start thinking about transistors, atoms, etc. as even deeper layers of abstraction. Similarly, there's also even higher levels like the game scripts that are sometimes used to handle events/dialogues in a game. Of course, for most emulators, these layers are either too low, or too high. Why?

Emulation paradigms

Let's tackle this question after giving an intuitive notion of what emulation is. Emulating a system all about putting a "barrier" between two adjacent layers of abstraction. For instance:

Back to the original question, why do emulators pick the barriers always at these two "hot spots", i.e. LLE (hardware and kernel) and HLE (kernel and userland)?

When you place this "emulation" barrier between two layers, you have to reimplement the layer below (i.e. reimplement the hardware on LLE, reimplement the kernel on HLE), so that the layer(s) above it can execute successfully. This results in two costs that you have to balance: "development time" and "execution time". Let me explain why this balance is important with few extreme examples of poor balances:

As you see, the rule of thumb is: higher-level incurs in larger development costs, and lower-level incurs in larger execution costs. But this is not always the case, and it has frequently led to misconceptions among the end-users. One of them is wrongly estimating the perfomance of different emulator paradigms.

Performance myths

Let's debunk some of those performance myths: Assume you want to emulate some machine, and you are learning about its hardware/software to balence "development time" vs "execution time" and pick the right strategy. How do you estimate those costs, specially "execution time", aside from the naive rule of thumb above? Estimating how fast something will run isn't just about which levels of abstraction you are targetting. The resulting performance will be depend on how many "concepts" from your guest machine (i.e. the thing you're trying to emulate), can be mapped into your host machine (the thing that will run the emulator).

To give you an example, one such "concept" is the MMU. To explain it briefly (and slightly wrong/oversimplified but for the sake of the explanation will do), the MMU is the thing that allows each application have access to a slice of RAM by mapping addresses of a "virtual address space" (an imaginary arrangement of memory) to a "physical address space" (the actual RAM). Every time the application accesses the memory with some CPU instruction, behind the scenes the MMU will translate the virtual address given by the application into a physical one.

However in some scenarios (this depends on MMU quirks, page sizes, etc.), you could have use your host computer's own MMU to handle the accesses of the guest applications directly. One way of accomplishing this is running the guest software in a VM and having an hypervisor letting it directly access a slice of the host computer's physical RAM directly. This would remove the need for expensive software-based address translation and result in large performance gains.

Conclusion

By making a better use of the host machine's resources, in the MMU and many other different areas, you can make even low-level emulation happen with an acceptable performance. It's not a surprise that Sony used this strategy to emulate the PS2 on the PS3, and Microsoft to emulate the Xbox on Xbox 360 [1] and Xbox 360 on Xbox One. This 10x performance slowdown while doing LLE is a myth, resulting from many oversimplifications and/or people that have poorly utilized the host machine's resources.

Of course, massive slowdowns can still happen: with really heterogeneous architectures, some concepts can be hard to map into each other and you might have to resort to software emulation incurring in 10x and 100x performance penalties, but this isn't always necessarily the case. There are no magic "performance penalty" numbers, everything has to be considered in a case-by-case basis, and the only way of estimating what that would be is getting to know both guest and host systems really in detail.