Rustc_codegen_cranelift (cg_clif) is an alternative backend for rustc that I have been working on for the past two years. It uses the Cranelift code generator. Unlike LLVM which is optimized for output quality at the cost of compilation speed even when optimizations are disabled, Cranelift is optimized for compilation speed while producing executables that are almost as fast as LLVM with optimizations disabled. This has the potential to reduce the compilation times of rustc in debug mode.

Since the last progress report there have been 150 commits.

Achievements in the past three months

Git subtree

In rust#77975 cg_clif was added as git subtree to the main rust repo. This PR makes it possible to compile cg_clif as part of rustc. As already mentioned in “Fixing bootstrap of rustc using cg_clif” it is even possible to bootstrap rustc completely using cg_clif without LLVM. All you have to do is add "cranelift" to the codegen-backends array in config.toml. (Or completely replace "llvm" in the array if you don’t want to compile the LLVM backend)

Lazy compilation in jit mode

It is now possible to select the lazy jit mode using $cg_clif_dir/build/cargo.sh lazy-jit. In this mode functions are only compiled when they are first called. This has the potential to significantly improve the startup time of a program. While functions have to be codegened when called, it is expected that a significant amount of all code is only required when an error occurs or only when the program is used in certain ways.

Thanks @flodiebold for the suggestion back in February.

This mode is not enabled by default as trying to lazily compile a function from a different thread than the main rustc thread will result in an ICE while parallel rustc is not yet enabled by default.

SIMD

Several new simd intrinsics have been implemented.

  • commit 22c9623: Implement simd_reduce_{add,mul}_{,un}ordered
  • commit 47ff2e0: Implement float simd comparisons
  • commit d2eeed4: Implement more simd_reduce_* intrinsics
  • commit e99f78a: Make simd_extract panic at runtime on non-const index again
  • commit d95d03a: Support #[repr(simd)] on array wrappers

Runtime performance

A variety of peephole optimizations has been added to cg_clif. Combined this probably resulted in a speedup of ~5%. In addition now that wasmtime#1080 has been fixed, it became possible to enable the optimizations of Cranelift itself.

  • commit 3f47f93: Enable Cranelift optimizations when optimizing

Challenges

While there are several important things currently missing, I am confident that I will be able to implement a significant portion in 2021.

ABI compatibility

There are many remaining ABI incomptibilities. I will need to rework cg_clif to reuse rustc_target::abi::call::FnAbi. I am currently working on a refactoring of the ABI handling code on the rustc side to make this easier. A part of this refactor has already landed.

  • issue #10: C abi compatability
  • rust#79067: Refactor the abi handling code a bit

Switch to the new backend framework of Cranelift

Cranelift is currently switching to a new backend framework. This framework produces faster code and has support for AArch64. Currently there is no 128bit integer support for it though, which is necessary to compile libcore. There is however a draft PR by @cfallin that is able to compile cg_clif. There is a miscompilation of simple-raytracer in release mode though. It is currently unknown if it is related to this PR.

Atomics

Atomic instructions are currently emulated using a global lock. This is very inefficient and only works when pthreads is available. The new style backends for Cranelift have native support for atomic instructions. I will switch to them once I can use the new style backends.

  • wasmtime#2077: Implement Wasm Atomics for Cranelift/newBE/aarch64.
  • wasmtime#2149: This patch fills in the missing pieces needed to support wasm atomics…

SIMD

Many vendor intrinsics remain unimplemented. The new portable SIMD project will however likely exclusively use platform intrinsics or which there are much fewer compared to the LLVM intrinsics used to implement all vendor intrinsics in core::arch. In addition platform intrinsics are architecture independent, so they only have to be implemented once.

  • issue #171: std::arch SIMD intrinsics

Cleanup during stack unwinding on panics

Cranelift currently doesn’t have support for cleanup during stack unwinding.

Windows support

Various issues

Maintenance

While there have been several PR’s by other people, I am the only person who has contributed more than a few changes to cg_clif.

Thanks to @jyn514 for giving feedback on this post.