1171 строка
59 KiB
TeX
1171 строка
59 KiB
TeX
%\documentclass[11pt]{proc}
|
|
\documentclass[preprint,10pt]{sigplanconf}
|
|
|
|
\usepackage{amsmath}
|
|
\usepackage{url}
|
|
\usepackage{graphicx}
|
|
|
|
\begin{document}
|
|
|
|
\title{Emscripten: An LLVM-to-JavaScript Compiler}
|
|
\conferenceinfo{Splash '11}{??-2011, Portland.}
|
|
\copyrightyear{2011}
|
|
\copyrightdata{[to be supplied]}
|
|
\titlebanner{} % These are ignored unless
|
|
\preprintfooter{} % 'preprint' option specified.
|
|
\authorinfo{Alon Zakai}
|
|
{Mozilla}
|
|
{azakai@mozilla.com}
|
|
|
|
\maketitle
|
|
|
|
%\title{Emscripten: An LLVM-to-JavaScript Compiler}
|
|
%\subtitle{}
|
|
%\authorinfo{Alon Zakai}
|
|
% {Mozilla}
|
|
% {azakai@mozilla.com}
|
|
%\author{Alon Zakai \\ Mozilla \\ \url{azakai@mozilla.com}}
|
|
%\maketitle
|
|
|
|
\begin{abstract}
|
|
We present Emscripten, a compiler from LLVM (Low Level Virtual Machine) assembly to JavaScript. This
|
|
opens up two avenues for running code written
|
|
in languages other than JavaScript on the web: (1) Compile code directly into LLVM assembly, and
|
|
then compile that into JavaScript using Emscripten, or (2) Compile
|
|
a language's entire runtime into LLVM and then JavaScript, as in the previous
|
|
approach, and then use the compiled runtime to run code written in that language. For example, the
|
|
former approach can work for C and C++, while the latter can work for Python; all three
|
|
examples open up new opportunities for running code on the web.
|
|
|
|
Emscripten itself is written in JavaScript and is available under the MIT
|
|
license (a permissive open source license), at \url{http://www.emscripten.org}.
|
|
As a compiler from LLVM to JavaScript, the challenges in designing
|
|
Emscripten are somewhat the reverse of the norm -- one must go from a low-level
|
|
assembly into a high-level language, and recreate parts of the original
|
|
high-level structure of the code that were lost in the compilation to
|
|
low-level LLVM. We detail the methods used in
|
|
Emscripten to deal with those challenges, and in particular present and prove
|
|
the validity of Emscripten's Relooper
|
|
algorithm, which recreates high-level loop structures from low-level
|
|
branching data.
|
|
\end{abstract}
|
|
|
|
%\category{CR-number}{subcategory}{third-level}
|
|
|
|
%\terms
|
|
%term1, term2
|
|
|
|
%\keywords
|
|
%keyword1, keyword2
|
|
|
|
\bigskip
|
|
|
|
%\copyright 2011 Alon Zakai. License: Creative Commons Attribution-ShareAlike (CC BY-SA), \url{http://creativecommons.org/licenses/by-sa/3.0/}
|
|
|
|
\section{Introduction}
|
|
|
|
Since the mid 1990's, JavaScript~\cite{js} has been present in most web browsers (sometimes
|
|
with minor variations and under slightly different names, e.g., JScript in Internet
|
|
Explorer), and today it is
|
|
well-supported on essentially all web browsers, from desktop browsers like
|
|
Internet Explorer, Firefox, Chrome and Safari, to mobile browsers on smartphones
|
|
and tablets. Together with HTML and CSS, JavaScript forms the standards-based
|
|
foundation of the web.
|
|
|
|
Running other programming languages on the web has been suggested many times,
|
|
and browser plugins have allowed doing so, e.g., via the Java
|
|
and Flash plugins. However, plugins must be manually installed and do not integrate in
|
|
a perfect way with the outside HTML. Perhaps more problematic is that they cannot run
|
|
at all on some platforms, for example, Java and Flash cannot run on iOS devices such as the iPhone
|
|
and iPad. For those reasons, JavaScript remains
|
|
the primary programming language of the web.
|
|
|
|
There are, however, reasonable motivations for running code from
|
|
other programming languages on the web, for example, if one has a large
|
|
amount of existing code already written in another language, or if one
|
|
simply has a strong preference for another language and perhaps is
|
|
more productive in it. As a consequence, there has been work on tools to compile languages
|
|
\textbf{into} JavaScript. Since JavaScript is present in essentially all web
|
|
browsers, by compiling one's language of choice into JavaScript, one
|
|
can still generate content that will run practically everywhere.
|
|
|
|
Examples of the approach of compiling into JavaScript include
|
|
the Google Web Toolkit~\cite{gwt}, which compiles Java into JavaScript;
|
|
Pyjamas\footnote{\url{http://pyjs.org/}}, which compiles Python into JavaScript;
|
|
SCM2JS \cite{hop}, which compiles Scheme to JavaScript,
|
|
Links \cite{links}, which compiles an ML-like language into JavaScript;
|
|
and AFAX \cite{afax}, which compiles F\# to JavaScript;
|
|
see also \cite{ashkenas} for additional examples.
|
|
While useful, such tools usually only allow a subset of the original language to
|
|
be compiled. For example, multithreaded code (with shared memory) is
|
|
not possible on the web, so compiling code of that sort is
|
|
not directly possible. There are also often limitations of the conversion
|
|
process, for example, Pyjamas compiles Python to JavaScript in a nearly
|
|
1-to-1 manner, and as a consequence the underlying semantics are those of JavaScript,
|
|
not Python, so for example division of integers can yield unexpected results
|
|
(it should yield an integer in Python 2.x,
|
|
but in JavaScript and in Pyjamas a floating-point number can be generated).
|
|
|
|
In this paper we present another project along those lines: \textbf{Emscripten},
|
|
which compiles LLVM (Low Level Virtual Machine\footnote{\url{http://llvm.org/}}) assembly into JavaScript.
|
|
LLVM is a compiler project primarily focused on C, C++ and
|
|
Objective-C. It compiles those languages through a \emph{frontend} (the
|
|
main ones of which are Clang and LLVM-GCC) into the
|
|
LLVM intermediary representation (which can be machine-readable
|
|
bitcode, or human-readable assembly), and then passes it
|
|
through a \emph{backend} which generates actual machine code for a particular
|
|
architecture. Emscripten plays the role of a backend which targets JavaScript.
|
|
|
|
By using Emscripten, potentially many languages can be
|
|
run on the web, using one of the following methods:
|
|
\begin{itemize}
|
|
\item Compile \textbf{code} in a language recognized by one of the existing LLVM frontends
|
|
into LLVM, and then compile that
|
|
into JavaScript using Emscripten. Frontends for various languages
|
|
exist, including many of the most popular programming languages such as C and
|
|
C++, and also various new and emerging languages (e.g., Rust\footnote{\url{https://github.com/graydon/rust/}}).
|
|
\item Compile the \textbf{runtime} used to parse and execute code in
|
|
a particular language into LLVM, then compile that into JavaScript using
|
|
Emscripten. It is then possible to run code in that runtime on the web.
|
|
This is a useful approach if
|
|
a language's runtime is written in a language for which an LLVM
|
|
frontend exists, but the language itself has no such frontend. For
|
|
example, there is currently no frontend for Python, however
|
|
it is possible to compile CPython -- the standard implementation of
|
|
Python, written in C -- into JavaScript, and run Python code on that
|
|
(see Section~\ref{sec:examples}).
|
|
\end{itemize}
|
|
|
|
From a technical standpoint, one challenge in designing and implementing
|
|
Emscripten is that it compiles a low-level language -- LLVM assembly -- into
|
|
a high-level one -- JavaScript. This is somewhat the reverse of the usual
|
|
situation one is in when building a compiler, and leads to some unique
|
|
difficulties. For example, to get good performance in JavaScript one must
|
|
use natural JavaScript code flow structures, like loops and ifs, but
|
|
those structures do not exist in LLVM assembly (instead, what is present
|
|
there is a `soup of code fragments': blocks of code with branching information
|
|
but no high-level structure).
|
|
Emscripten must therefore reconstruct a high-level
|
|
representation from the low-level data it receives.
|
|
|
|
In theory that issue could have been avoided by compiling a higher-level
|
|
language into JavaScript. For example, if compiling Java into JavaScript
|
|
(as the Google Web Toolkit does), then one can benefit from the fact
|
|
that Java's loops, ifs and so forth generally have a very direct parallel
|
|
in JavaScript. But of course the downside in that approach is it yields a
|
|
compiler only for Java. In Section~\ref{sec:relooper}
|
|
we present the `Relooper' algorithm, which generates high-level loop structures from the low-level
|
|
branching data present in LLVM assembly. It is similar to loop recovery algorithms used in decompilation
|
|
(see, for example, \cite{Cifuentes98assemblyto}, \cite{pro97}).
|
|
The main difference between the Relooper and standard loop recovery algorithms
|
|
is that the Relooper generates loops in a different language than that which was compiled originally, whereas
|
|
decompilers generally assume they are returning to the original language. The Relooper's
|
|
goal is not to accurately recreate the original source code, but rather to generate
|
|
native JavaScript control flow structures, which can then be implemented
|
|
efficiently in modern JavaScript engines.
|
|
|
|
Another challenge in Emscripten is to maintain accuracy (that is, to
|
|
keep the results of the compiled code the same as the original)
|
|
while not sacrificing performance. LLVM assembly
|
|
is an abstraction of how modern CPUs are programmed for, and its basic
|
|
operations are not all directly possible in JavaScript. For example, if in
|
|
LLVM we are to add two unsigned 8-bit numbers $x$ and $y$, with overflowing (e.g., 255
|
|
plus 1 should give 0), then there is no single operation in JavaScript which
|
|
can do this -- we cannot just write $x+y$, as that would use the normal JavaScript
|
|
semantics. It is possible to emulate a CPU in JavaScript, however doing so
|
|
is very slow. Emscripten's approach is to allow such emulation, but to try to
|
|
use it as little as possible, and to provide tools that help one find out
|
|
which parts of the compiled code actually need such full emulation.
|
|
|
|
We conclude this introduction with a list of this paper's main contributions:
|
|
\begin{itemize}
|
|
\item We describe Emscripten itself, during
|
|
which we detail its approach in compiling LLVM into JavaScript.
|
|
\item We give details of Emscripten's Relooper algorithm, mentioned earlier, which generates
|
|
high-level loop structures from low-level branching data, and prove
|
|
its validity.
|
|
\end{itemize}
|
|
In addition, the following are the main contributions of Emscripten
|
|
itself, that to our knowledge were not previously possible:
|
|
\begin{itemize}
|
|
\item It allows compiling a very large subset of C and C++ code into
|
|
JavaScript, which can then be run on the web.
|
|
\item By compiling their runtimes, it allows running languages such as Python
|
|
on the web (with their normal semantics).
|
|
\end{itemize}
|
|
|
|
The remainder of this paper is structured as follows. In Section~\ref{sec:compapp} we
|
|
describe the approach Emscripten takes to compiling LLVM assembly into JavaScript,
|
|
and show some benchmark data.
|
|
In Section~\ref{sec:emarch} we describe Emscripten's internal design and in
|
|
particular elaborate on the Relooper algorithm.
|
|
In Section~\ref{sec:examples} we give several example uses of
|
|
Emscripten. In Section~\ref{sec:summary} we summarize and give directions for future
|
|
work.
|
|
|
|
\section{Compilation Approach}
|
|
\label{sec:compapp}
|
|
|
|
Let us begin by considering what the challenge is, when we want to compile LLVM assembly
|
|
into JavaScript. Assume we are given the
|
|
following simple example of a C program:
|
|
\begin{verbatim}
|
|
#include <stdio.h>
|
|
int main()
|
|
{
|
|
int sum = 0;
|
|
for (int i = 1; i < 100; i++)
|
|
sum += i;
|
|
printf("1+...+100=%d\n", sum);
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
This program calculates the sum of the integers from 1 to 100. When
|
|
compiled by Clang, the generated LLVM
|
|
assembly code includes the following:
|
|
\label{code:examplellvm}
|
|
\begin{verbatim}
|
|
@.str = private constant [14 x i8]
|
|
c"1+...+100=%d\0A\00"
|
|
|
|
define i32 @main() {
|
|
%1 = alloca i32, align 4
|
|
%sum = alloca i32, align 4
|
|
%i = alloca i32, align 4
|
|
store i32 0, i32* %1
|
|
store i32 0, i32* %sum, align 4
|
|
store i32 1, i32* %i, align 4
|
|
br label %2
|
|
|
|
; <label>:2
|
|
%3 = load i32* %i, align 4
|
|
%4 = icmp slt i32 %3, 100
|
|
br i1 %4, label %5, label %12
|
|
|
|
; <label>:5
|
|
%6 = load i32* %i, align 4
|
|
%7 = load i32* %sum, align 4
|
|
%8 = add nsw i32 %7, %6
|
|
store i32 %8, i32* %sum, align 4
|
|
br label %9
|
|
|
|
; <label>:9
|
|
%10 = load i32* %i, align 4
|
|
%11 = add nsw i32 %10, 1
|
|
store i32 %11, i32* %i, align 4
|
|
br label %2
|
|
|
|
; <label>:12
|
|
%13 = load i32* %sum, align 4
|
|
%14 = call i32 (i8*, ...)*
|
|
@printf(i8* getelementptr inbounds
|
|
([14 x i8]* @.str, i32 0, i32 0),
|
|
i32 %13)
|
|
ret i32 0
|
|
}
|
|
\end{verbatim}
|
|
At first glance, this may look more difficult to translate into
|
|
JavaScript than the original C++. However, compiling C++ in
|
|
general would require writing code to handle preprocessing,
|
|
classes, templates, and all the idiosyncrasies and complexities
|
|
of C++. LLVM assembly, while more verbose in this example, is
|
|
lower-level and simpler to work on. Compiling it also has the benefit we
|
|
mentioned earlier, which
|
|
is one of the main goals of Emscripten, that it allows many languages can
|
|
be compiled into LLVM and not just C++.
|
|
|
|
A detailed overview of LLVM assembly is beyond our scope here (see \url{http://llvm.org/docs/LangRef.html}). Briefly,
|
|
though, the example assembly above can be seen to define a
|
|
function main(), then allocate some values on the stack (alloca),
|
|
then load and store various values (load and store). We do not have
|
|
the high-level code structure as we had in C++ (with a loop), instead
|
|
we have labeled code fragments, called LLVM basic blocks, and code flow moves
|
|
from one to another by branch (br) instructions. (Label 2 is the
|
|
condition check in the loop; label 5 is the body, label 9 is the
|
|
increment, and label 12 is the final part of the function, outside
|
|
of the loop).
|
|
Conditional branches
|
|
can depend on calculations, for example the results of comparing
|
|
two values (icmp). Other numerical operations include addition (add).
|
|
Finally, printf is called (call). The challenge, then, is to convert
|
|
this and things like it into JavaScript.
|
|
|
|
In general, Emscripten's main approach is to translate each line of LLVM
|
|
assembly into JavaScript, 1 to 1, into `normal' JavaScript
|
|
as much as possible. So, for example, an \emph{add} operation becomes
|
|
a normal JavaScript addition, a function call becomes a JavaScript
|
|
function call, etc. This 1 to 1 translation generates JavaScript
|
|
that resembles the original assembly code, for example, the LLVM assembly code shown
|
|
before for main() would be compiled into the following:
|
|
\label{code:example}
|
|
\begin{verbatim}
|
|
function _main() {
|
|
var __stackBase__ = STACKTOP;
|
|
STACKTOP += 12;
|
|
var __label__ = -1;
|
|
while(1) switch(__label__) {
|
|
case -1:
|
|
var $1 = __stackBase__;
|
|
var $sum = __stackBase__+4;
|
|
var $i = __stackBase__+8;
|
|
HEAP[$1] = 0;
|
|
HEAP[$sum] = 0;
|
|
HEAP[$i] = 0;
|
|
__label__ = 0; break;
|
|
case 0:
|
|
var $3 = HEAP[$i];
|
|
var $4 = $3 < 100;
|
|
if ($4) { __label__ = 1; break; }
|
|
else { __label__ = 2; break; }
|
|
case 1:
|
|
var $6 = HEAP[$i];
|
|
var $7 = HEAP[$sum];
|
|
var $8 = $7 + $6;
|
|
HEAP[$sum] = $8;
|
|
__label__ = 3; break;
|
|
case 3:
|
|
var $10 = HEAP[$i];
|
|
var $11 = $10 + 1;
|
|
HEAP[$i] = $11;
|
|
__label__ = 0; break;
|
|
case 2:
|
|
var $13 = HEAP[$sum];
|
|
var $14 = _printf(__str, $13);
|
|
STACKTOP = __stackBase__;
|
|
return 0;
|
|
}
|
|
}
|
|
\end{verbatim}
|
|
Some things
|
|
to take notice of:
|
|
\begin{itemize}
|
|
\item A switch-in-a-loop construction is used in order to let the flow
|
|
of execution move between basic blocks of code in an arbitrary manner: We set
|
|
\emph{\_\_label\_\_} to the (numerical representation of the) label of
|
|
the basic block we want to reach, and do a break, which leads to the proper
|
|
basic block being reached. Inside each basic block, every line of code corresponds to a line of
|
|
LLVM assembly, generally in a very straightforward manner.
|
|
\item Memory is implemented by \emph{HEAP}, a JavaScript array. Reading from
|
|
memory is a read from that array, and writing to memory is a write.
|
|
\emph{STACKTOP} is the current position of the stack. (Note that we
|
|
allocate 4 memory locations for 32-bit integers on the stack, but only
|
|
write to 1 of them. See Section~\ref{sec:lsc} for why.)
|
|
\item LLVM assembly functions become JavaScript functions, and function calls
|
|
are normal JavaScript function calls. In general, we attempt to generate
|
|
as `normal' JavaScript as possible.
|
|
\item We implemented the LLVM \emph{add} operation using simple addition in JavaScript.
|
|
As mentioned earlier, the semantics of that code are not entirely identical to
|
|
those of the original LLVM assembly code (in this case, overflows will have very
|
|
different effects). We will explain Emscripten's approach to that problem in
|
|
Section~\ref{sssec:realworldcode}.
|
|
\end{itemize}
|
|
|
|
\subsection{Performance}
|
|
\label{sec:perf}
|
|
|
|
In this section we will deal with several topics regarding
|
|
Emscripten's approach to generating high-performance JavaScript code.
|
|
|
|
\subsubsection{Load-Store Consistency (LSC)}
|
|
\label{sec:lsc}
|
|
|
|
We saw before that Emscripten's memory usage allocates the usual number
|
|
of bytes on the stack for variables (4 bytes for a 32-bit integer, etc.).
|
|
However, we only wrote values into the first location, which appeared odd.
|
|
We will now see the reason for that.
|
|
|
|
To get there, we must first step back, and note that
|
|
Emscripten does not aim to achieve perfect compatibility with all possible
|
|
LLVM assembly (and correspondingly, with all possible C or C++ code, etc.);
|
|
instead, Emscripten targets a large subset of LLVM assembly code, which is portable
|
|
and does not make crucial assumptions about the underlying CPU architecture
|
|
on which the code is meant to run. That subset is meant to encompass the
|
|
vast majority of real-world code that would be compiled into LLVM,
|
|
while also being compilable into very
|
|
performant JavaScript.
|
|
|
|
More specifically, Emscripten assumes that the LLVM assembly code it is
|
|
compiling has \textbf{Load-Store Consistency} (LSC), which is the requirement that
|
|
after a value with a specific type is written to a memory location, loads from
|
|
that memory location will be of the same type (until a value with a different
|
|
type is written there).
|
|
Normal C and C++
|
|
code generally does so: If $x$ is a variable containing a 32-bit floating
|
|
point number, then both loads and stores of $x$ will be of 32-bit floating
|
|
point values, and not 16-bit unsigned integers or anything else.
|
|
|
|
To see why this is important for performance, consider the following
|
|
C code fragment, which does \emph{not} have LSC:
|
|
\begin{verbatim}
|
|
int x = 12345;
|
|
printf("first byte: %d\n", *((char*)&x));
|
|
\end{verbatim}
|
|
Assuming an architecture with more than 8 bits, this code will read
|
|
the first byte of \emph{x}. (This might, for example, be used to detect the
|
|
endianness of the CPU.) To compile this into JavaScript in
|
|
a way that will run properly, we must do more than a single operation
|
|
for either the read or the write, for example we could do this:
|
|
\begin{verbatim}
|
|
var x_value = 12345;
|
|
var x_addr = stackAlloc(4);
|
|
HEAP[x_addr] = (x_value >> 0) & 255;
|
|
HEAP[x_addr+1] = (x_value >> 8) & 255;
|
|
HEAP[x_addr+2] = (x_value >> 16) & 255;
|
|
HEAP[x_addr+3] = (x_value >> 24) & 255;
|
|
[...]
|
|
printf("first byte: %d\n", HEAP[x_addr]);
|
|
\end{verbatim}
|
|
Here we allocate space for the value of \emph{x} on the stack, and
|
|
store that address in \emph{x\_addr}. The stack itself is part of
|
|
the `memory space', which is the array \emph{HEAP}. In order for
|
|
the read on the final line to give the proper value, we must go to
|
|
the effort of doing 4 store operations, each of the value of a
|
|
particular byte. In other words, \emph{HEAP} is an array of bytes,
|
|
and for each store into memory, we must deconstruct the value into
|
|
bytes.\footnote{Note that we can use JavaScript typed arrays with a shared memory
|
|
buffer, which would work as expected, assuming (1) we are running
|
|
in a JavaScript engine which supports typed arrays, and (2) we
|
|
are running on a CPU with the same architecture as we expect. This
|
|
is therefore dangerous as the generated code may run differently on
|
|
different JavaScript engines and different CPUs.
|
|
Emscripten currently has optional experimental support for typed arrays.}
|
|
|
|
Alternatively, we can store the value in a single operation, and
|
|
deconstruct into bytes as we load. This will be faster in some
|
|
cases and slower in others, but is still more overhead
|
|
than we would like, generally speaking -- for if the code \textbf{does} have
|
|
LSC, then we can translate that code fragment into
|
|
the far more optimal
|
|
\begin{verbatim}
|
|
var x_value = 12345;
|
|
var x_addr = stackAlloc(4);
|
|
HEAP[x_addr] = x_value;
|
|
[...]
|
|
printf("first byte: %d\n", HEAP[x_addr]);
|
|
\end{verbatim}
|
|
(Note that even this can be optimized even more -- we can store
|
|
\emph{x} in a normal JavaScript variable. We will discuss such
|
|
optimizations in Section \ref{sec:codeopt}; for now we are
|
|
just clarifying why it is useful to assume we are compiling code
|
|
that has LSC.)
|
|
|
|
In practice the vast majority of C and C++ code does have LSC. Exceptions
|
|
do exist, however, for example:
|
|
\begin{itemize}
|
|
\item Code that detects CPU features like endianness, the behavior of floats, etc. In general such code can be disabled
|
|
before running it through Emscripten, as it is not actually needed.
|
|
\item \emph{memset} and related functions typically work on values of one kind,
|
|
regardless of the underlying values. For example, memset may write 64-bit
|
|
values on a 64-bit CPU since that is usually faster than writing individual
|
|
bytes. This tends to
|
|
not be a problem, as with \emph{memset} the most common case is setting to
|
|
0, and with \emph{memcpy}, the values end up copied properly anyhow (with
|
|
a proper implementation of \emph{memcpy} in Emscripten's generated code).
|
|
\item Even LSC-obeying C or C++ code may turn into LLVM assembly that does not,
|
|
after being optimized. For example, when storing two 32-bit integers constants into
|
|
adjoining locations in a structure, the optimizer may generate a single
|
|
64-bit store of an appropriate constant. In other words, optimization can
|
|
generate nonportable code, which runs faster on the current CPU, but
|
|
nowhere else. Emscripten currently assumes that optimizations of this form
|
|
are not being used.
|
|
\end{itemize}
|
|
In practice it may be hard to know if code has LSC or not, and requiring
|
|
a time-consuming code audit is obviously impractical. Emscripten therefore has
|
|
a compilation option, SAFE\_HEAP, which generates code that checks that LSC holds, and warns if it
|
|
doesn't. It also warns about other memory-related issues like
|
|
reading from memory before a value was written (somewhat similarly to tools
|
|
like Valgrind\footnote{\url{http://valgrind.org/}}). When such problems are detected, possible solutions are to ignore the issue (if it has no actual
|
|
consequences), or alter the source code.
|
|
|
|
Note that it is somewhat wasteful to allocate 4 memory locations for
|
|
a 32-bit integer, and use only one of them. It is possible to change
|
|
that behavior with the QUANTUM\_SIZE parameter to Emscripten, however,
|
|
the difficulty is that LLVM assembly has hardcoded values that depend on
|
|
the usual memory sizes being used. We are looking into modifications
|
|
to LLVM itself to remedy that.
|
|
|
|
\subsubsection{Emulating Code Semantics}
|
|
\label{sssec:realworldcode}
|
|
|
|
As mentioned in the introduction, the semantics of LLVM assembly and JavaScript are not identical: The former
|
|
is very close to that of a modern CPU, while the latter is a high-level
|
|
dynamic language. Both are of course Turing-complete, so it is possible to
|
|
precisely emulate each in the other, but doing so with good performance is
|
|
more challenging. For example, if we want to convert
|
|
\begin{verbatim}
|
|
add i8 %1, %2
|
|
\end{verbatim}
|
|
(add two 8-bit integers) to JavaScript, then to be completely accurate we must emulate the
|
|
exact same behavior, in particular, we must handle overflows properly, which would not be the case if we just implement
|
|
this as $\%1 + \%2$ in JavaScript. For example, with inputs of $255$ and $1$, the
|
|
correct output is 0, but simple addition in JavaScript will give us 256. We
|
|
can of course emulate the proper behavior by adding additional code.
|
|
This however significantly degrades performance,
|
|
because modern JavaScript engines can often translate something like $z = x + y$ into
|
|
native code containing a single instruction (or very close to that), but if instead we had
|
|
something like $z = (x + y)\&255$ (in order to correct overflows), the JavaScript engine
|
|
would need to generate additional code to perform the AND operation.\footnote{
|
|
In theory, the JavaScript engine could determine that we are implicitly working
|
|
on 8-bit values here, and generate machine code that no longer needs the AND operation.
|
|
However, most or all modern JavaScript engines have just two internal numeric types, doubles and
|
|
32-bit integers. This is so because they are tuned for `normal' JavaScript code
|
|
on the web, which in most cases is served well by just those two types.
|
|
|
|
In addition, even if JavaScript engines did analyze code containing $\&255$, etc.,
|
|
in order to deduce that a variable can be implemented
|
|
as an 8-bit integer, there is a cost to including all the necessary $\&255$ text
|
|
in the script, because code size is a significant factor on the web. Adding even
|
|
a few characters for every single mathematic operation, in a large JavaScript file,
|
|
could add up to a significant increase in download size.}
|
|
|
|
Emscripten's approach to this problem is to allow the generation of both accurate code,
|
|
that is identical in behavior to LLVM assembly, and inaccurate code which is
|
|
faster. In practice, most addition operations in LLVM do not overflow,
|
|
and can simply be translated into $\%1 + \%2$. Emscripten
|
|
provides tools that make it straightforward to find which code does require
|
|
the slower, more accurate code, and to generate that code in those locations, as follows:
|
|
\begin{itemize}
|
|
\item Compile the code using Emscripten with special options that generate runtime checking.
|
|
CHECK\_OVERFLOWS adds runtime checks for integer overflows, CHECK\_SIGNS
|
|
checks for signing issues (the behavior of signed and unsigned integers can
|
|
be different, and JavaScript does not natively support that difference), and
|
|
CHECK\_ROUNDINGS checks for rounding issues (in C and C++, the convention is
|
|
to round towards 0, while in JavaScript there is no simple operation that does
|
|
the same).
|
|
\item Run the compiled code on a representative sample of inputs, and notice which
|
|
lines are warned about by the runtime checks.
|
|
\item Recompile the code, telling Emscripten to add corrections (using CORRECT\_SIGNS, CORRECT\_OVERFLOWS
|
|
or CORRECT\_ROUNDINGS) only on the specific lines that actually need it.
|
|
\end{itemize}
|
|
|
|
This method is not guaranteed to work, as if we do not run on a truly representative
|
|
sample of possible inputs, we may not compile with all necessary corrections. It is
|
|
of course possible to compile with all corrections applied to all the code, to make
|
|
sure things will work properly (this is the default compilation setting), however, in
|
|
practice the procedure above works quite well, and results in code is significantly faster.
|
|
|
|
\subsubsection{Emscripten Code Optimizations}
|
|
\label{sec:codeopt}
|
|
|
|
When comparing the example program from page~\pageref{code:example},
|
|
the generated code was fairly complicated
|
|
and cumbersome, and unsurprisingly it performs quite poorly. There
|
|
are two main reasons for that: First, that the code is simply
|
|
unoptimized -- there are many variables declared when fewer could
|
|
suffice, for example, and second, that the code does not use `normal'
|
|
JavaScript, which JavaScript engines are optimized for -- it
|
|
stores all variables in an array (not normal JavaScript variables),
|
|
and it controls the flow of execution using a switch-in-a-loop, not
|
|
normal JavaScript loops and ifs.
|
|
|
|
Emscripten's approach to generating fast-performing code is as
|
|
follows. Emscripten doesn't do any
|
|
optimizations that can be done by other tools:
|
|
LLVM can be used to perform optimizations before Emscripten, and
|
|
the Closure Compiler\footnote{\url{http://code.google.com/closure/compiler/}}
|
|
can perform optimizations on the generated JavaScript afterwards. Those
|
|
tools will perform standard useful optimizations like removing unneeded variables, dead code elimination,
|
|
function inlining, etc.
|
|
That leaves two major optimizations that are left for Emscripten
|
|
to perform:
|
|
\begin{itemize}
|
|
\item \textbf{Variable nativization}: Convert variables
|
|
that are on the stack -- which is implemented using addresses in the \emph{HEAP} array
|
|
as mentioned earlier -- into native JavaScript variables (that is to say, \emph{var x;} and so forth). In general,
|
|
a variable will be nativized unless it is used
|
|
outside that function, e.g., if its address is taken and stored somewhere
|
|
or passed to another function. When optimizing, Emscripten tries to nativize
|
|
as many variables as possible.
|
|
\item \textbf{Relooping}: Recreate high-level loop and if structures
|
|
from the low-level code block data that appears in LLVM assembly.
|
|
We describe Emscripten's Relooper algorithm in Section~\ref{sec:relooper}.
|
|
\end{itemize}
|
|
|
|
When run with Emscripten's optimizations, the code on page \pageref{code:example} looks
|
|
like this:
|
|
\begin{verbatim}
|
|
function _main() {
|
|
var __label__;
|
|
var $1;
|
|
var $sum;
|
|
var $i;
|
|
$1 = 0;
|
|
$sum = 0;
|
|
$i = 0;
|
|
$2$2: while(1) {
|
|
var $3 = $i;
|
|
var $4 = $3 < 100;
|
|
if (!($4)) { __label__ = 2; break $2$2; }
|
|
var $6 = $i;
|
|
var $7 = $sum;
|
|
var $8 = $7 + $6;
|
|
$sum = $8;
|
|
var $10 = $i;
|
|
var $11 = $10 + 1;
|
|
$i = $11;
|
|
__label__ = 0; continue $2$2;
|
|
}
|
|
var $13 = $sum;
|
|
var $14 = _printf(__str, $13);
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
If in addition the Closure Compiler is run on that output, we get
|
|
\begin{verbatim}
|
|
function K() {
|
|
var a, b;
|
|
b = a = 0;
|
|
a:for(;;) {
|
|
if(!(b < 100)) {
|
|
break a
|
|
}
|
|
a += b;
|
|
b += 1;
|
|
}
|
|
_printf(J, a);
|
|
return 0;
|
|
}
|
|
\end{verbatim}
|
|
which is fairly close to the original C++ (the differences, of
|
|
having the loop's condition inside the loop instead of inside
|
|
the for() expression at the top of the original loop, are not important to performance). Thus, it is possible
|
|
to recreate the original high-level structure of the code that
|
|
was compiled into LLVM assembly.
|
|
|
|
\subsection{Benchmarks}
|
|
\label{sec:benchmarks}
|
|
|
|
We will now take a look at some performance benchmarks:
|
|
|
|
\bigskip
|
|
|
|
\begin{tabular}{ l | c | c | c || c }
|
|
\hline
|
|
\textbf{benchmark} & \textbf{SM} & \textbf{V8} & \textbf{gcc} & \textbf{ratio} \\
|
|
\hline
|
|
fannkuch (10) & 1.158 & \textbf{0.931} & 0.231 & 4.04 \\
|
|
fasta (2100000) & \textbf{1.115} & 1.128 & 0.452 & 2.47 \\
|
|
primes & \textbf{1.443} & 3.194 & 0.438 & 3.29 \\
|
|
raytrace (7,256) & \textbf{1.930} & 2.944 & 0.228 & 8.46 \\
|
|
dlmalloc (400,400) & 5.050 & \textbf{1.880} & 0.315 & 5.97 \\
|
|
\hline
|
|
\end{tabular}
|
|
|
|
\bigskip
|
|
|
|
The first column is the name of the benchmark, and in parentheses any
|
|
parameters used in running it. The source code to all the benchmarks
|
|
can be found at \url{https://github.com/kripken/emscripten/tree/master/tests}
|
|
(each in a separate file with its name, except for `primes', which is
|
|
embedded inside runner.py in the function test\_primes). A brief summary of
|
|
the benchmarks is as follows:
|
|
\begin{itemize}
|
|
\item \textbf{fannkuch} and \textbf{fasta} are commonly-known benchmarks, appearing for example
|
|
on the Computer Language Benchmarks Game\footnote{\url{http://shootout.alioth.debian.org/}}.
|
|
They use a mix of mathematic operations (integer in the former, floating-point in the latter) and memory access.
|
|
\item \textbf{primes} is the simplest benchmark in terms of code. It is basically just a tiny loop that calculates prime numbers.
|
|
\item \textbf{raytrace} is real-world code, from the sphereflake raytracer\footnote{\url{http://ompf.org/ray/sphereflake/}}. This benchmark has a combination of memory access and floating-point math.
|
|
\item \textbf{dlmalloc} (Doug Lea's malloc\footnote{\url{http://en.wikipedia.org/wiki/Malloc#dlmalloc_and_its_derivatives}}) is a well-known real-world implementation of malloc and free. This benchmark does a large amount of calls to malloc and free in an intermixed way, which tests memory access and integer calculations.
|
|
\end{itemize}
|
|
|
|
Returning to the table of results, the second
|
|
column is the elapsed time (in seconds) when running the compiled code (generated using all Emscripten and LLVM
|
|
optimizations as well as the Closure Compiler) in the SpiderMonkey JavaScript
|
|
engine (specifically the JaegerMonkey branch, checked out June 15th, 2011).
|
|
The third column is the elapsed time when running the same JavaScript code in the V8 JavaScript engine
|
|
(checked out Jun 15th, 2011). In both the second and third column lower values
|
|
are better; the best of the two is in bold.
|
|
The fourth column is the elapsed time when running the original code compiled with \emph{gcc -O3},
|
|
using GCC 4.4.4. The last column is the ratio, that is, how much slower the JavaScript code
|
|
(running in the faster of the two engines for that test) is
|
|
when compared to gcc. All the tests were run on a MacBook Pro with
|
|
an Intel i7 CPU clocked at 2.66GHz, running on Ubuntu 10.04.
|
|
|
|
Clearly the results greatly vary by the benchmark, with the generated JavaScript running from 2.47 to 8.46 times
|
|
slower. There are also significant differences between the two JavaScript engines, with each better
|
|
at some of the benchmarks.
|
|
It appears that code that does simple numerical operations -- like
|
|
the primes test -- can run fairly fast, while code that has a lot of memory
|
|
accesses, for example due to using structures -- like the raytrace test --
|
|
will be slower. (The main issue with structures is that Emscripten does not
|
|
`nativize' them yet, as it does to simple local variables.)
|
|
|
|
Being 2.47 to 8.46 times slower than the most-optimized C++ code
|
|
is a significant slowdown, but it is still more than fast enough for
|
|
many purposes, and the main point of course is that the code can run
|
|
anywhere the web can be accessed. Further work on Emscripten is expected to
|
|
improve the speed as well, as are improvements to LLVM, the Closure
|
|
Compiler, and JavaScript engines themselves; see further discussion
|
|
in the Summary.
|
|
|
|
\subsection{Limitations}
|
|
|
|
Emscripten's compilation approach, as has been described in this Section so far,
|
|
is to generate `natural' JavaScript, as close as possible to normal JavaScript
|
|
on the web, so that modern JavaScript engines perform well on it. In particular,
|
|
we try to generate `normal' JavaScript operations, like regular addition and
|
|
multiplication and so forth. This is a very
|
|
different approach than, say, emulating a CPU on a low level, or for the case
|
|
of LLVM, writing an LLVM bitcode interpreter in JavaScript. The latter approach
|
|
has the benefit of being able to run virtually any compiled code, at the cost
|
|
of speed, whereas Emscripten makes a tradeoff in the other direction. We will
|
|
now give a summary of some of the limitations of Emscripten's approach.
|
|
|
|
\begin{itemize}
|
|
\item \textbf{64-bit Integers}: JavaScript numbers are all 64-bit doubles, with engines
|
|
typically implementing them as 32-bit integers where possible for speed.
|
|
A consequence of this is that it is impossible to directly implement
|
|
64-bit integers in JavaScript, as integer values larger than 32 bits will become doubles,
|
|
with only 53 bits for the significand. Thus, when Emscripten uses normal
|
|
JavaScript addition and so forth for 64-bit integers, it runs the risk of
|
|
rounding effects. This could be solved by emulating 64-bit integers,
|
|
but it would be much slower than native code.
|
|
\item \textbf{Multithreading}: JavaScript has Web Workers, which are additional
|
|
threads (or processes) that communicate via message passing. There is no
|
|
shared state in this model, which means that it is not directly possible
|
|
to compile multithreaded code in C++ into JavaScript. A partial solution
|
|
could be to emulate threads, without Workers, by manually controlling
|
|
which blocks of code run (a variation on the switch in a loop construction
|
|
mentioned earlier) and manually switching between threads every so often.
|
|
However, in that case there would not be any utilization
|
|
of additional CPU cores, and furthermore performance would be slow due to not
|
|
using normal JavaScript loops.
|
|
\end{itemize}
|
|
|
|
After seeing these limitations, it is worth noting that some advanced LLVM instructions turn out to be
|
|
surprisingly easy to implement. For example, C++ exceptions are represented in
|
|
LLVM by \emph{invoke} and \emph{unwind}, where \emph{invoke} is a call to a function that will
|
|
potentially trigger an \emph{unwind}, and \emph{unwind} returns to the earliest invoke.
|
|
If one were to implement those
|
|
in a typical compiler, doing so would require careful work. In Emscripten, however,
|
|
it is possible to do so using JavaScript exceptions in a straightforward manner:
|
|
\emph{invoke} becomes a function call wrapped in a \emph{try} block, and \emph{unwind}
|
|
becomes \emph{throw}. This is a case where compiling to a high-level language turns
|
|
out to be quite convenient.
|
|
|
|
\section{Emscripten's Architecture}
|
|
\label{sec:emarch}
|
|
|
|
In the previous section we saw a general overview of Emscripten's approach
|
|
to compiling LLVM assembly code into JavaScript. We will now get into more detail
|
|
into how Emscripten itself is implemented.
|
|
|
|
Emscripten is written in JavaScript. The primary reason for that decision
|
|
was convenience: Two simple examples of the benefits of that approach are that (1)
|
|
the compiler can create JavaScript objects that represent constant structures from the original
|
|
assembly code, and convert them to a string using JSON.stringify()
|
|
in a trivial manner,
|
|
and (2) the compiler can simplify numerical operations by simply
|
|
eval()ing the code (so ``1+2'' would become ``3'', etc.). In both examples,
|
|
the development of Emscripten was made simpler by having the exact same environment
|
|
during compilation as the executing code will have. This also helps in more
|
|
complex ways, for example when the same code needs to be run at compile time
|
|
and at runtime, and makes various dynamic compilation techniques possible in the future.
|
|
|
|
Emscripten's compilation has three main phases:
|
|
\begin{itemize}
|
|
\item The \textbf{intertyper}, which converts from LLVM assembly into
|
|
Emscripten's internal representation.
|
|
\item The \textbf{analyzer}, which inspects the internal representation
|
|
and generates various useful information for the final phase,
|
|
including type and variable information, stack usage analysis,
|
|
optional data for optimizations
|
|
(variable nativization and relooping), etc.
|
|
\item The \textbf{jsifier}, which does the final conversion of the
|
|
internal representation plus additional analyzed data into JavaScript.
|
|
\end{itemize}
|
|
|
|
\subsection{The Runtime Environment}
|
|
\label{sec:runtime}
|
|
|
|
Code generated from Emscripten is meant to run in a JavaScript engine,
|
|
typically in a web browser. This has implications for the kind of
|
|
runtime environment we can generate for it, for example, there is no
|
|
direct access to the local filesystem.
|
|
|
|
Emscripten comes with a partial implementation of a C library,
|
|
mostly written from scratch in JavaScript, with parts compiled from an
|
|
existing C library\footnote{newlib, \url{http://sourceware.org/newlib/}}. Some aspects of the runtime environment, as
|
|
implemented in that C library, are:
|
|
\begin{itemize}
|
|
\item An emulated filesystem is available, with files stored in memory.
|
|
\item Emscripten allows writing pixel data to an HTML5 canvas element,
|
|
using a subset of the SDL API. That is, one can write an application in C or C++ using
|
|
SDL, and that same application can be compiled normally and run
|
|
locally, or compiled using Emscripten and run on the web. See, for
|
|
example, Emscripten's raytracing demo at \url{http://syntensity.com/static/raytrace.html}.
|
|
\item \emph{sbrk()} is implemented using the \emph{HEAP} array which
|
|
was mentioned previously. This allows a normal \emph{malloc()}
|
|
implementation written in C to be compiled to JavaScript.
|
|
\end{itemize}
|
|
|
|
\subsection{The Relooper: Recreating high-level loop structures}
|
|
\label{sec:relooper}
|
|
|
|
The Relooper the most complex module in Emscripten. It receives
|
|
a `soup of blocks', which is a set of labeled fragments of code, each
|
|
ending with a branch operation, and the goal is to generate normal
|
|
high-level JavaScript code flow structures such as loops and ifs.
|
|
Generating such code structures is essential to producing good-performing code,
|
|
since JavaScript engines are tuned to run such code very quickly (for
|
|
example, a tracing JIT as in SpiderMonkey will only trace normal loops).
|
|
|
|
Returning to the LLVM assembly code on page~\pageref{code:examplellvm}, it
|
|
has the following structure (where arrows denote potential paths of execution):
|
|
|
|
\includegraphics[width=80mm]{graph.png}
|
|
|
|
In this simple example, it is fairly straightforward to see that a natural way to implement it
|
|
using normal loop structures is
|
|
\newpage
|
|
\begin{verbatim}
|
|
ENTRY
|
|
while (true) do
|
|
2
|
|
if (condition) break
|
|
5
|
|
9
|
|
12
|
|
\end{verbatim}
|
|
In general though, this is not always easy or even practical -- there may
|
|
not be a straightforward high-level loop structure corresponding to the low-level one, if
|
|
for example the original C code relied heavily on \emph{goto} instructions.
|
|
In practice, however, almost all real-world C and C++ code tends to
|
|
be amenable to loop recreation.
|
|
|
|
We now begin to describe the Relooper algorithm. As mentioned before, it takes as input a `soup of labeled LLVM blocks' as described above,
|
|
and generates a structured set of Emscripten code blocks, which are each a set of LLVM blocks
|
|
with some logical structure. For simplicity we call LLVM blocks `labels' and Emscripten
|
|
blocks `blocks' in the following.
|
|
|
|
There are three types of Emscripten blocks:
|
|
\begin{itemize}
|
|
\item \textbf{Simple block}: A block with
|
|
\begin{itemize}
|
|
\item One \textbf{Internal} label, and
|
|
\item a \textbf{Next}
|
|
block, which the internal label branches to. The block is later
|
|
translated simply into the code for that label, and the Next
|
|
block appears right after it.
|
|
\end{itemize}
|
|
\item \textbf{Loop}: A block that represents a basic loop, comprised of
|
|
two internal sub-blocks:
|
|
\begin{itemize}
|
|
\item \textbf{Inner}: A block that will appear inside
|
|
the loop, i.e., when execution reaches the end of that block,
|
|
flow will return to the beginning. Typically a loop will contain
|
|
a conditional \emph{break} defining where it is exited. When we
|
|
exit, we reach the Next block, below.
|
|
\item \textbf{Next}: A block that will appear just outside
|
|
the loop, in other words, that will be reached when the loop is exited.
|
|
\end{itemize}
|
|
\item \textbf{Multiple}: A block that represents a divergence into several
|
|
possible branches, that eventually rejoin. A Multiple block can
|
|
implement an `if', an `if-else', a `switch', etc. It is comprised of:
|
|
\begin{itemize}
|
|
\item \textbf{Handled blocks}: A set of blocks to which execution can
|
|
enter. When we reach the multiple block, we check which of them
|
|
should execute, and go there. When execution of that block is
|
|
complete, or if none of the handled blocks was selected for
|
|
execution, we proceed to the Next block, below.
|
|
\item \textbf{Next}: A block that will appear just after the Handled blocks,
|
|
in other words, that will be reached after code flow
|
|
exits the Handled blocks.
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
To clarify these definitions, the example LLVM assembly code we have
|
|
been working with could be represented in a natural way as
|
|
\begin{verbatim}
|
|
Simple
|
|
entry
|
|
Loop
|
|
Simple
|
|
2
|
|
Simple
|
|
5
|
|
Simple
|
|
9
|
|
null
|
|
Simple
|
|
12
|
|
null
|
|
\end{verbatim}
|
|
where the first indented line in a Simple block is the Internal label in that Simple block,
|
|
the second indented line is its Next block, and so forth.
|
|
|
|
Continuing to describe the Relooper algorithm, we will use the term `entry' to
|
|
mean a label that can be reached immediately in a block. In other
|
|
words, a block consists of labels $l_1,..,l_n$, and the entries
|
|
are a subset of those labels, specifically the ones that execution
|
|
can directly reach when we reach that block. With that
|
|
definition, the Relooper algorithm can
|
|
then be written as follows:
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Receive a set of labels and which of them are entry points.}
|
|
We wish to create a block comprised of all those labels.
|
|
\item \textbf{Calculate, for each label, which other labels it \emph{can}
|
|
reach}, i.e., which labels we are able to reach if we start
|
|
at the current label and follow one of the possible paths
|
|
of execution.
|
|
\item \textbf{If we have a single entry, and cannot return to it (by some other
|
|
label later on branching to it) then create a Simple block}, with the entry
|
|
as its internal label, and the Next block comprised of all
|
|
the other labels. The entries for the Next block are the entries
|
|
to which the internal label can branch.
|
|
\item \textbf{If we can return to all of the entries, create a
|
|
Loop block}, whose Inner block is comprised of all labels that
|
|
can reach one of the entries, and whose Next block is
|
|
comprised of all the others. The entry labels for the current
|
|
block become entry labels for the Inner block (note that
|
|
they must be in the Inner block by definition, as each one can reach
|
|
itself). The Next block's entry labels are all the labels
|
|
in the Next block that can be reached by the Inner block.
|
|
\item \textbf{If we have more than one entry, try to create a Multiple block}: For each entry, find all
|
|
the labels it reaches that cannot be reached by any other
|
|
entry. If at least one entry has such labels, return a
|
|
Multiple block, whose Handled blocks are blocks for those
|
|
labels (and whose entries are those labels), and whose Next block is all the rest.
|
|
Entries for the next block are entries that did not become part of the Handled
|
|
blocks, and also labels that can be reached from the Handled blocks.
|
|
\item \textbf{If we could not create a Multiple block, then create a Loop block as described above}
|
|
(see proof below of why creating a Loop block is possible, i.e., why the labels contain a loop).
|
|
\end{itemize}
|
|
Note that we first create a Loop only if we must, then try to create a
|
|
Multiple, then create a Loop if we have no other choice. We could have slightly simplified this in
|
|
various ways, but the algorithm as presented above has given overall better
|
|
results in practice, in terms of the `niceness' of the shape of the
|
|
generated code, both subjectively and at least in some simple benchmarks.
|
|
|
|
Additional details of the algorithm include
|
|
\begin{itemize}
|
|
\item The technical mechanism by which execution flow is controlled in the generated code involves
|
|
the \emph{\_\_label\_\_} variable, mentioned earlier. Whenever we enter a block with more than one
|
|
entry, we set \emph{\_\_label\_\_} before we branch into it, and we
|
|
check its value when we enter that block. So, for example, when we
|
|
create a Loop block, its Next block can have multiple entries -- any
|
|
label to which we branch out from the loop. By creating a Multiple
|
|
block after the loop, we can enter the proper label when the loop is
|
|
exited. (Having a \emph{\_\_label\_\_} variable does add some overhead,
|
|
but it greatly simplifies the problem that the Relooper needs to solve
|
|
and allows us to only need three kinds of blocks as described above.
|
|
Of course, it is possible to optimize
|
|
away writes and reads to \emph{\_\_label\_\_} in many or even most cases.)
|
|
\item As the Relooper processes labels, it replaces branch
|
|
instructions accordingly. For example, when we create a Loop
|
|
block, then all branch instructions to the outside of the loop are
|
|
converted into \emph{break} commands (since a break instruction in JavaScript
|
|
will indeed get us to outside of the loop), and all branch
|
|
instructions to the beginning of the loop are converted into
|
|
\emph{continue} commands, etc. Those commands are then
|
|
ignored when called recursively to generate the Inner block (that is,
|
|
the \emph{break} and \emph{continue}
|
|
commands are guaranteed, by the semantics of JavaScript, to get us to
|
|
where we need to go -- they do not need any further work
|
|
for them to work properly).
|
|
\item Emscripten also does an additional pass after what has been
|
|
described thus far, which was the first pass. The first pass is guaranteed to produce valid output (see below), while
|
|
the second pass takes that valid output and optimizes it, by
|
|
making minor changes such as removing
|
|
\emph{continue} commands that occur at the very end of loops
|
|
(where they are not needed), etc.
|
|
\end{itemize}
|
|
|
|
We now turn to an analysis of the Relooper algorithm. It is straightforward to see that the output of the algorithm,
|
|
assuming it completes successfully -- that is, that if finishes in finite time, and does
|
|
not run into an error in the last part (where it is claimed that
|
|
if we reach it we can return to at least one of the entry labels) --
|
|
is correct in the sense of code execution being carried out
|
|
as in the original data. We will now prove that the algorithm must
|
|
in fact complete successfully.
|
|
|
|
First, note that if we
|
|
successfully create a block, then we simplify the remaining
|
|
problem, where the `complexity' of the problem for our purposes
|
|
here is the sum of labels plus the sum of branching operations:
|
|
\begin{itemize}
|
|
\item This is trivial for Simple blocks (since we now have a Next block
|
|
which is strictly smaller).
|
|
\item It is true for Loop blocks simply by removing branching
|
|
operations (there must be a branching back to an entry, which
|
|
becomes a \emph{continue}).
|
|
\item For Multiple blocks, if the Next block is non-empty then we have split into strictly
|
|
smaller blocks (in number of labels) than before. If the next block
|
|
is empty, then since we built the Multiple block from a set of labels
|
|
with more than one entry, then the Handled blocks are strictly smaller
|
|
than the current one.
|
|
\end{itemize}
|
|
We have seen that whenever we successfully create a block, we simplify the remaining problem
|
|
as defined above, which means that we must eventually halt successfully (since
|
|
we strictly decrease a nonnegative integer).
|
|
The remaining issue is whether we can reach a situation where we \emph{cannot}
|
|
successfully create a block, which is if we reach the final part of the relooper algorithm, but cannot create a
|
|
Loop block there. For that to occur, we must not be able
|
|
to return to any of the entries (or else we would create a Loop
|
|
block). Assume that indeed we cannot return to any of the entries. But if that is so, we can create a Multiple
|
|
block with Handled blocks that each include one of the entries (and possibly additional labels as well), since each entry
|
|
label cannot be reached from any other label by our assumption earlier, thus
|
|
contradicting that assumption
|
|
and concluding the proof.
|
|
|
|
(We have not, of course, proven that the shape of the blocks is optimal
|
|
in any sense. However, even if it is possible to optimize them further, the Relooper
|
|
already gives a very substantial speedup due to the move from the switch-in-a-loop
|
|
construction to more natural JavaScript code flow structures.)
|
|
|
|
\section{Example Uses}
|
|
\label{sec:examples}
|
|
|
|
Emscripten has been run successfully on several real-world codebases. We present
|
|
some examples here to give an idea of the various opportunities made possible
|
|
by Emscripten.
|
|
\begin{itemize}
|
|
\item \textbf{Python}: It is possible to run variants of Python on
|
|
the web in various ways, including Pyjamas, IronPython on SilverLight and
|
|
Jython in Java. However, all of these are slightly nonstandard in the
|
|
Python code they run, while the latter two also require plugins to be
|
|
installed. With Emscripten, on the other hand, it is possible to compile
|
|
CPython itself -- the standard, reference implementation of Python -- and
|
|
run that on the web, which allows running standard Python code. An online
|
|
demo is available at \url{http://syntensity.com/static/python.html}.
|
|
(Another example of a language runtime that Emscripten can convert to run
|
|
on the web is Lua; an online demo is available at \url{http://syntensity.com/static/lua.html}.)
|
|
\item \textbf{Poppler and FreeType}: Poppler\footnote{\url{http://poppler.freedesktop.org/}} is an open source PDF
|
|
rendering library. In combination with FreeType\footnote{\url{http://www.freetype.org/}}, an open source font
|
|
engine, it can be used to render PDF files. By compiling it with Emscripten,
|
|
PDF files can be viewed on the web, without the use of plugins or external
|
|
applications. An online demo is available at \url{http://syntensity.com/static/poppler.html}
|
|
\item \textbf{Bullet}: The Bullet Physics library\footnote{\url{http://bulletphysics.org/wordpress/}} is
|
|
an open source physics engine, used in many open source and proprietary applications. An online
|
|
demo is available at \url{http://syntensity.com/static/bullet.html}, showing a physics
|
|
simulation of falling blocks that uses Bullet compiled to JavaScript. Bullet has in the
|
|
past been ported to JavaScript\footnote{\url{http://pl4n3.blogspot.com/2010/07/bulletjs-javascript-physics-engine.html}}, by porting JBullet (a port of Bullet to Java). The main difference in the approaches is that with Emscripten, there is no need for
|
|
time-consuming manual conversion of C++ to Java and then to JavaScript, and consequently,
|
|
the latest Bullet code can be run in JavaScript and not an earlier version (JBullet lags
|
|
several versions behind the latest Bullet release).
|
|
\end{itemize}
|
|
|
|
\section{Summary}
|
|
\label{sec:summary}
|
|
|
|
We presented Emscripten, an LLVM-to-JavaScript compiler, which opens up
|
|
numerous opportunities for running code written in languages other
|
|
than JavaScript on the web, including some not previously possible.
|
|
Emscripten can be used to, among other
|
|
things, compile real-world C and C++ code and run that on the web. In
|
|
addition, by compiling the runtimes of languages which are implemented in C and C++,
|
|
we can run them on the web as well, for example Python and Lua.
|
|
|
|
Perhaps the largest future goal of Emscripten is to improve the performance of
|
|
the generated code. As we have seen, speeds of around $1/10$th that of
|
|
GCC are possible, which is already good enough for many purposes, but
|
|
can be improved much more. The code Emscripten generates will become faster
|
|
`for free' as JavaScript engines get
|
|
faster, and also by improvements in the optimizations done by LLVM and the Closure
|
|
Compiler. However there is also a lot of room for additional optimizations in
|
|
Emscripten itself, in particular in how it nativizes variables and structures,
|
|
which can potentially lead to very significant speedups.
|
|
|
|
When we compile a language's entire runtime into JavaScript, as mentioned
|
|
before, there is another way to improve performance.
|
|
Assume that we are compiling a C or C++ runtime of a language
|
|
into JavaScript, and that that runtime uses JIT compilation to generate machine code. Typically
|
|
code generators for JITs are written for the main CPU architectures, today
|
|
x86, x86\_64 and ARM. However, it would be possible for a JIT to
|
|
generate JavaScript instead. Thus, the runtime would be compiled using
|
|
Emscripten, and at runtime it would pass the JIT-generated JavaScript to
|
|
\emph{eval}. In this
|
|
scenario, JavaScript is used as a low-level intermediate representation in
|
|
the runtime, and the final conversion to machine code is left to the underlying
|
|
JavaScript engine. This approach can potentially allow languages that
|
|
greatly benefit from a JIT (such as Java, Lua, etc.) to be run on the web
|
|
efficiently.
|
|
|
|
Getting back to the issue of high-performing code in general, it is worth comparing
|
|
Emscripten to Portable Native Client (\cite{pnacl},~\cite{nacl}), a project in development which aims
|
|
to allow an LLVM-like format to be distributed and run securely
|
|
on the web, with speed comparable to native code.
|
|
|
|
Both Emscripten
|
|
and PNaCl aim to allow code written in languages like C and C++ to
|
|
be run on the web, but in very different ways: Emscripten compiles code into JavaScript, and
|
|
PNaCl compiles into an LLVM-like format which is then
|
|
run in a special PNaCl runtime. As a consequence, Emscripten's generated code can run on all web browsers,
|
|
since it is standard JavaScript, while PNaCl's generated code
|
|
requires the PNaCl runtime to be installed; another major
|
|
difference is that JavaScript engines do not yet run code at
|
|
near-native speeds, while PNaCl does. In a broad summary, Emscripten's
|
|
approach allows the code to be run in more places, while PNaCl's
|
|
allows the code to run faster.
|
|
|
|
However, as mentioned earlier, improvements in JavaScript engines and compiler
|
|
technology may narrow the speed
|
|
gap. Also, when considering the speed of JavaScript engines, for purposes of Emscripten we do not need to
|
|
care about \emph{all} JavaScript, but only the kind generated by
|
|
Emscripten. Such code is \textbf{implicitly statically typed}, that is,
|
|
types are not mixed, despite JavaScript in general allowing assigning, e.g., an
|
|
integer to a variable and later a floating point value or even an object to that same variable. Implicitly statically
|
|
typed code can be statically analyzed and converted into
|
|
machine code that has no runtime type checks at all. While such
|
|
static analysis can be time-consuming, there are practical ways for
|
|
achieving similar results quickly, such as tracing and type inference, which
|
|
would help on such code very significantly, and are already in use
|
|
or being worked on in mainstream JavaScript engines (e.g., SpiderMonkey). As
|
|
a consequence, it may soon be possible to run code written in languages such as
|
|
C and C++ on the web with near-native speed.
|
|
|
|
%\appendix
|
|
%\section{Appendix Title}
|
|
%This is the text of the appendix, if you need one.
|
|
|
|
\acks
|
|
|
|
We thank the following people for their contributions to Emscripten: David LaPalomento, Daniel Heres, Brian Crowder, Brian McKenna, dglead and tuba.
|
|
|
|
\bibliographystyle{abbrvnat}
|
|
|
|
% The bibliography should be embedded for final submission.
|
|
|
|
\begin{thebibliography}{}
|
|
\softraggedright
|
|
|
|
\bibitem[Ashkenas. (2011)]{ashkenas} J. Ashkenas.
|
|
List of languages that compile into JavaScript. Available at \url{https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS}.
|
|
Retrieved April 2011.
|
|
|
|
\bibitem[Cifuentes et~al. (1998)]{Cifuentes98assemblyto} C. Cifuentes, D. Simon and A. Fraboulet.
|
|
Assembly to High-Level Language Translation. In Int. Conf. on Softw. Maint, pp. 228--237, IEEE-CS Press, 1998.
|
|
|
|
\bibitem[Cooper et~al. (2006)]{links} E. Cooper, S. Lindley, P. Wadler and J. Yallop.
|
|
Links: Web programming without tiers. In 5th International Symposium on Formal Methods for Components and Objects (FMCO), 2006.
|
|
|
|
\bibitem[Donovan et~al. (2010)]{pnacl} A. Donovan, R. Muth, B. Chen and D. Sehr.
|
|
PNaCl: Portable Native Client Executables. Available at
|
|
\url{http://nativeclient.googlecode.com/svn/data/site/pnacl.pdf}. Retrieved April 2011.
|
|
|
|
\bibitem[Flanagan (2006)]{js} D. Flanagan.
|
|
JavaScript: The Definitive Guide. O'Reilly Media, 2006.
|
|
|
|
\bibitem[Loitsch et~al. (2008)]{hop} F. Loitsch and M. Serrano.
|
|
Hop Client-Side Compilation. In Trends in Functional Programming, vol. 8, pp. 141--158, Seton Hall University, Intellect Bristol, 2008.
|
|
|
|
\bibitem[Petříček et~al (2011)]{afax} T. Petříček and D. Syme.
|
|
AFAX: Rich client/server web applications in F\#. Draft. Available at \url{http://tomasp.net/academic/fswebtools.aspx}. Retrieved April 2011.
|
|
|
|
\bibitem[Prabhakar (2007)]{gwt} C. Prabhakar.
|
|
Google Web Toolkit: GWT Java Ajax Programming. Packt Publishing, 2007.
|
|
|
|
\bibitem[Proebsting et~al. (1997)]{pro97} T.A. Proebsting and S. A. Watterson.
|
|
Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?) In Third USENIX Conference on Object-Oriented Technologies and Systems (COOTS), 1997.
|
|
|
|
\bibitem[Yee et~al. (2009)]{nacl} B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S.
|
|
Okasaka, N. Narula, and N. Fullagar. Native Client: A Sandbox for
|
|
Portable, Untrusted x86 Native Code. In IEEE Symposium on
|
|
Security and Privacy, May 2009.
|
|
|
|
\end{thebibliography}
|
|
|
|
\end{document}
|
|
|