hints & tips, links, news // 2011.06.01 08:07:02 [hh]

"Axis of Eval": "Emscripten" macht aus C/C++ Code "JavaScript"

Mal wieder was ganz unwahrscheinlich Nerdiges: nachdem Adobe ein Labor-Projekt vorgestellt hat, um aus C++ ActionScript zu machen, gibt es jetzt auch ein Open-Source-Projekt das aus C bzw. C++ JavaScript machen kann. Der unter MIT-Lizenz publizierte Compiler "Emscripten" von Alon Zakai ("kripken") wandelt LLVM-Bytecode in JavaScript um. Demos zeigen etwa "Doom", "Lua" und "Python" im Browser. Ziel soll es sein, beliebigen Code im Web ausführen zu können. Screen2.0 bietet alle Hintergünde.

Alon Zakai über sein Projekt: "Emscripten is an LLVM-to-JavaScript compiler. It takes LLVM bitcode (which can be generated from C/C++, using llvm-gcc or clang, or any other language that can be converted into LLVM) and compiles that into JavaScript, which can be run on the web (or anywhere else JavaScript can run)."

Using Emscripten, you can

Fragen und Antworten von Alon Zakai

What is this project's goal? To allow people to run code on the web, no matter what language it's written in.

How are you trying to do that? Emscripten's strategy is to compile LLVM bytecode into JavaScript, which then allowsWhy are you doing this? The web is standards-based, cross-platform, runs everywhere from PCs to iPads, and has numerous independent compatible implementations. It's arguably the best platform to develop for, for those reasons. But it could be even more developer-friendly: While JavaScript (when used well!) is an excellent language, lots of people want to code in other languages.

What is the status of this project? A substantial amount of C/C++, including large projects like CPython, Poppler and Bullet, compiles and run properly, see the demos.
While a lot of stuff already works, expect to find plenty of bugs and rough edges - compiling some random C/C++ code might not currently work due to various small issues. Please file a bug if you encounter code that doesn't compile!

What C/C++ code is expected to be compilable later? The goal is to compile almost all C/C++ code out there, but not all of it - that would require running a bit-for-bit LLVM emulator, which would be slow, and for most C++ code out there, completely unjustified.
An example of code that will not work in this approach: void func(int* y) { char* x = (char*)y; x += 17; y = (int*)x; *y = 10; }. This will confuse Emscripten, but you really don't want to write this kind of C/C++ code anyhow, generally speaking - it isn't portable. If you use pointers 'normally' (including casting to void*, casting to parent/child classes, etc. - stuff that is portable), things will work.
Other limitations are the obvious ones - we will be running in a JavaScript engine. So no threads and so forth (although we may emulate them later on).

How is memory management done? You can use dlmalloc, a popular malloc implementation - it's written in C, and can be compiled into JavaScript using Emscripten (in fact it is one of our automated tests). With that, you will basically be using the same kind of memory management your C/C++ code would use normally.
We also tested with another memory model, of multiple dynamic heaps and letting the JavaScript GC take care of freeing memory automatically. This seems to be slower, so it isn't being focused on currently.

How fast can the generated code get? Right now generated code is 10x slower than gcc -O3 - which is not great, but not that bad either. This will get better with improvements in LLVM, JavaScript optimizers like the Closure Compiler, and JavaScript engines themselves, so expect the generated code to get much faster. There are also plans for additional optimizations that Emscripten itself can perform, which will help significantly as well.

What is the compiler written in? JavaScript. Paralleling the language we are generating code far has various benefits, for example, if we determine some expression can be known at compile time, we can do that immediately in the compiler; otherwise we can simply JSON.stringify() it for the generated code to solve at runtime. Also, (nice) JavaScript is cool.

Isn't it better just to write JavaScript code? Why compile LLVM into JavaScript? By all means write new JavaScript code. Emscripten is just another option to have, and will hopefully be useful if you have a lot of C/C++ code that you don't want to rewrite from scratch. You can still write web applications normally, but Emscripten lets you integrate existing C/C++ code when useful.

How does this relate to similar projects? There are some sort-of-similar things, for example GWT compiles Java into JavaScript, Pyjamas compiles Python into JavaScript, and there was talk of some Adobe project for compiling LLVM into ActionScript for running on Flash. I (kripken) don't know much about any of these projects though.
You can also compare the goals of Emscripten to Google NaCl. NaCl has two main benefits: Reusing existing C/C++ code on the web, and running that code at native speed. Emscripten aims to provide the first of those two benefits, while hoping that JavaScript engines will get so fast that the second benefit will pretty much be achieved as well, and while doing so in a web-friendly way: Emscripten generates JavaScript, that runs on any browser on any platform and device.

Where does Emscripten itself run? It's tested on Linux. In theory it should work anywhere you can get JavaScript and LLVM working (plus Python for the automatic test runner and for, however, non-Linux platforms currently have problems (see our issue tracker). Help with supporting other platforms would be very welcome.
(That's for the compiler. Of course, the generated code is valid JavaScript, so it will run anywhere JavaScript can run.)

What frontends for LLVM can be used? / What can be used to generate LLVM bitcode for Emscripten to then compile into JavaScript? Both Clang and llvm-gcc are supported. Clang is simpler to set up, but might not be able to compile all the C++ code that llvm-gcc does (however it does compile all of our test code successfully).
Using frontends for other languages should be possible, but hasn't really been focused on yet. We would be very happy if someone did, though!

Is this really a compiler? Isn't it better described as a translator? Well, a compiler is usually defined as a program that transforms source code written in one programming language into another, which is what Emscripten does. A translator is a more specific term that is usually used for compilation between high-level languages, which isn't exactly applicable. On the other hand a decompiler is something that translates a low-level language to a higher-level one, so that might technically be a valid description, but it sounds odd since we aren't going back to the original language we compiled from (C/C++, most likely) but into something else (JavaScript).

The name of the project sounds weird to me. I don’t know why; it’s a perfectly cromulent word.

Direkter Link: kripken/ emscripten/ wiki