There has a ton of progress since the last progress report. There have been 303 commits since then. @afonso360 has been contributing a ton to improve Windows and AArch64 support. (Thanks a lot for that!)
Achievements in the past four months
Windows support with the MSVC toolchain
Windows support with the MSVC toolchain has been added by @afonso360. This requires a Cranelift change to add COFF based TLS support, a rewrite of the bash scripts for testing in rust (as windows doesn’t have bash), adding inline stack probing to Cranelift (stack probing is necessary on Windows to grow the stack) and finally a couple of minor changes to tests to make them run on Windows. There are still a couple of issues though. For example the JIT mode just crashes. In addition Bevy gets miscompiled causing it to crash at runtime. An investigation into this is ongoing.
- #1252: Move test script to y.rs
- #1253: Fix
no_sysroottestsuite for MSVC environments
- bytecodealliance/wasmtime#4546: cranelift: Add COFF TLS Support
- bytecodealliance/wasmtime#4747: cranelift: Add inline stack probing for x64
- #1249: Miscompilation of Bevy with MSVC
Gankra’s abi cafe (previously abi-checker) now gets run on CI. This uncovered a couple of ABI issues between cg_clif and cg_llvm. Some were the fault of cg_clif and others had to be fixed in Cranelift.
- #1255: Add abi-checker to y.rs and run it on CI
- 45b6cd6a8a2a3b364d22d4fabc0d72f9e37e3e50: Fix a crash for 11 single byte fields passed through the C abi
- bytecodealliance/wasmtime#4634: Fix sret for AArch64
Linux on AArch64 now passes the full test suite of cg_clif. It is not tested in CI, so it is possible that support will regress in the future.
Basic s390x support
Basic support for IBM’s s390x architecture has been added by @uweigand. There is no testing on CI and there are still some test failures.
- #1260: Ignore ptr_bitops_tagging test on s390x
- issue #1258: s390x test failure due to unsupported stack realignment
- issue #1259: Enabling s390x on CI
The LLVM backend has supported multi-threading during compilation from LLVM IR to object files since 2014. While the frontend is not parallelized, this can still give a non-trivial perf boost. Cg_clif until recently didn’t support this, causing it to take longer to compile especially on machines with many cores. After doing significant refactorings all over cg_clif for about two weeks I was able to implement multi-threading support in cg_clif too. It was a lot of effort, but it was well worth it. There are almost no cases where cg_llvm is faster than cg_clif now.
The perf results (warning: long image)
- #1264: Refactorings for enabling parallel compilation (part 1)
- #1266: Refactorings for enabling parallel compilation (part 2)
- #1271: Support compiling codegen units in parallel
While working on implementing multi-threading I was able to remove the partial linking hack that was used for supporting inline assembly and incremental compilation at the same time. This hack was incompatible with macOS. Now that it is no longer necessary inline assembly works on macOS too.
I implemented a couple of intrinsics used by
simd_arith_offset are missing now. Note that a large portion of
core::arch is still unimplemented.
- #1277: Implement a couple of portable simd intrinsics
Many vendor intrinsics remain unimplemented. The new portable SIMD project will however likely exclusively use so called “platform intrinsics” of which there are much fewer, compared to the LLVM intrinsics used to implement all vendor intrinsics in
core::arch. In addition “platform intrinsics” are the common denominator between platforms supported by rustc, so they only have to be implemented once in cg_clif itself and in fact most have already been implemented. Cranelift does need a definition for each platform when native SIMD is used, but emulating “platform intrinsics” using scalar instructions is pretty easy.
- issue #171: std::arch SIMD intrinsics
Cleanup during stack unwinding on panics
Cranelift currently doesn’t have support for cleanup during stack unwinding.
- issue wasmtime#1677: Support cleanup during unwinding
Distributing as rustup component
There is progress towards distributing cg_clif as rustup components, but there are still things to be done. https://github.com/bjorn3/rustc_codegen_cranelift/milestone/2 lists things I know of that still needs to be done.
Contributions are always appreciated. Feel free to take a look at good first issues and ping me (@bjorn3) for help on either the relevant github issue or preferably on the rust lang zulip if you get stuck.