It’s been quite a while since the last progress report. There have been 393 commits since the last progress report.

Achievements in the past ten months

Migrating away from rust-ar

Since the start archive file reading and writing has been done by the rust-ar crate. While is has been very useful, there are a couple of limitations that necessitate moving away from it. First off it doesn’t support writing symbol tables. While I managed to implement support for it with the Gnu and BSD variants of the archive format, it doesn’t work with macOS, thus requiring usage of ranlib on macOS, which is slower than writing the symbol table while creating the archive file. Second my changes to support symbol table writing haven’t been merged into rust-ar, which means that cg_clif has to depend on my own fork. This means that if I accidentally delete my fork, cg_clif would be broken. In addition it doesn’t play nice with vendoring as necessary for building rust offline. And finally rust-ar is not actively maintained.

To migrate away from I first switched archive file reading from rust-ar to the newly introduced archive file support in the object crate. I’m now working on integrating a port of LLVM’s archive writer to rust with rustc so all backends can share the same code.

#1155: Remove the ar git dependency
1da5054: Use the object crate for archive reading during archive building
rust-lang/rust#97485: Rewrite LLVM’s archive writer in Rust

Multi-threading support

Currently cg_clif does everything on a single thread, unlike cg_llvm which does optimizations and emitting object files in parallel. This means that depending on how many codegen units can be compiled in parallel cg_llvm can finish in less time than cg_clif. I have been slowly working on refactorings that will allow Cranelift to compile codegen units on background threads. These refactorings are necessary as currently a function is immediately compiled after it has been translated to cranelift ir.

9089c30: Remove TyCtxt dependency from UnwindContext
5f6c59e: Pass only the Function to write_clif_file
78b6571: Split compile_fn out of codegen_fn

SIMD

There have been a lot of fixes for portable-simd (the unstable core::simd module). Part of these also benefit stdarch (the core::arch module).

a8be7ea: Implement new simd_shuffle signature
d288c69: Implement simd_reduce_{min,max} for floats
dd288d2: Fix vector types containing an array field with mir opts enabled
037aafb: Fix simd type validation
f3d97cc: Fix saturating float casts test
3c030e2: Fix NaN handling of simd float min and max operations
11007c0: Use fma(f) libm function for simd_fma intrinsic

Inline assembly

@nbdd0121 implemented support for register classes in PR #1206. Previously only fixed register constraints were supported.

I also fixed a couple of bugs in an attempt to compile Philipp Oppermann’s blog os. There are still many things missing for that to work though.

#1206: Improve inline asm support
1222192: Use cgu name instead of function name as base for inline asm wrapper name
efdbd88: Ensure inline asm wrapper name never starts with a digit
#1204: Full asm!() support
#1208: Support compiling blog os

Misc bug fixes

f74cf39: Fix crash when struct argument size is not a multiple of the pointer size
97e5045: Fix taking address of truly unsized type field of unsized adt
f3fc94f: Fix #[track_caller] with MIR inlining
f52162f: Fix #[track_caller] location for function chains
74b9232: Fix assert_assignable for array types
7a10059: Fix symbol tables in case of multiple object files with the same name

Usage changes

There have two big changes to the way cg_clif is used. First of the cargo wrapper executable has been renamed to cargo-clif. This is necessary on windows as otherwise the cargo wrapper would invoke itself when running cargo due to windows putting the current working directory in the search path for executables. It also allows invoking the wrapper as cargo clif in case you add the cg_clif build directory to your $PATH. The second change is that cg_clif is now always run using the -Zcodegen-backend rustc argument. This matches what happens when building cg_clif as part of rustc. Previously a wrapper cg_clif executable was used which uses rustc_driver to run rustc with cg_clif as backend. This change is only visible when you are directly using cg_clif/rustc without the cargo-clif wrapper. Usage of cargo-clif is advised.

0dd3d28: Rename cargo executable to cargo-clif
#1225: Use -Zcodegen-backend instead of a custom rustc driver

Perf optimizations

Both build time and runtime performance should be improved by several percent due to a couple of optimizations. A small improvement is the new support of Cranelift for cold blocks. These are placed at the end of the function to enable more efficient usage of the instruction cache and to reduce branch mispredictions, which slightly improves runtime performance. A much bigger improvement is the replacement of a lot of print+trap combinations with just a trap. While the prints have been very useful for debugging miscompilations, they also bloat compiled binaries a lot (up to ~30% improvement from removing them!). Given that miscompilations in cg_clif are quite rare nowadays, I removed most debug prints. The final improvement is caused by Cranelift switching to a new register allocator. This has improved build time by up to 7% and should also have improved runtime performance a bit.

90f8aef: Mark cold blocks
#1220: Replace a lot of print+trap with plain trap
bytecodealliance/wasmtime#3989: Switch Cranelift over to regalloc2

Challenges

Windows support with the MSVC toolchain

Cranelift doesn’t yet support TLS for COFF/PE object files. This means that unlike MinGW which uses pthread keys for implementing TLS, it is not currently possible to compile for MSVC.

issue wasmtime#1885: [Cranelift] Add COFF TLS support
issue #997: Windows support

SIMD

Many vendor intrinsics remain unimplemented. The new portable SIMD project will however likely exclusively use so called “platform intrinsics” of which there are much fewer, compared to the LLVM intrinsics used to implement all vendor intrinsics in core::arch. In addition “platform intrinsics” are the common denominator between platforms supported by rustc, so they only have to be implemented once in cg_clif itself and in fact most have already been implemented. Cranelift does need a definition for each platform when native SIMD is used, but emulating “platform intrinsics” using scalar instructions is pretty easy.

issue #171: std::arch SIMD intrinsics

Cleanup during stack unwinding on panics

Cranelift currently doesn’t have support for cleanup during stack unwinding.

issue wasmtime#1677: Support cleanup during unwinding

Contributing

Contributions are always appreciated. Feel free to take a look at good first issues and ping me (@bjorn3) for help on either the relevant github issue or preferably on the rust lang zulip if you get stuck.