|
| 1 | +--- |
| 2 | +title: Dropping RISC-V support |
| 3 | +layout: post |
| 4 | +excerpt_separator: <!--more--> |
| 5 | +--- |
| 6 | + |
| 7 | +The next set of images will drop RISC-V support. The builder is |
| 8 | +currently still going but within the next few days it will stop, |
| 9 | +and the repositories will stay in place but frozen. |
| 10 | + |
| 11 | +Nothing will change in packaging (the build profile will remain, |
| 12 | +template support where present will remain, cross-toolchains will |
| 13 | +remain) but there will be no more updates to the repo for the |
| 14 | +foreseeable future. |
| 15 | + |
| 16 | +<!--more--> |
| 17 | + |
| 18 | +## The situation |
| 19 | + |
| 20 | +The initial plumbing for RISC-V was added in the distro in July 2021 |
| 21 | +and repos later in the year, i.e. it has been there almost from the |
| 22 | +start. During all this time, the builds have been supported by doing |
| 23 | +so on an x86_64 machine with `qemu-user` binfmt emulation coupled with |
| 24 | +transparent `cbuild` support for this. |
| 25 | + |
| 26 | +The reason for doing it this way was that there wasn't any hardware |
| 27 | +we could use for performance reasons; I had obtained a SiFive HiFive |
| 28 | +Unmatched board in October 2021 and this proved to be useless for |
| 29 | +builds as the performance of this board is similar to Raspberry Pi 3. |
| 30 | +Other boards came later, but none of them improved on that front |
| 31 | +significantly enough. |
| 32 | + |
| 33 | +This was expected to be a temporary state that would resolve itself |
| 34 | +within 2-3 year time; it is Q1 2025, and the options are the following: |
| 35 | + |
| 36 | +* HiFive P550 that was released recently has performance similar to |
| 37 | + Raspberry Pi 4 and is unsuitable for the task; this board was originally |
| 38 | + supposed to be released several years ago as part of the SiFive and Intel |
| 39 | + collaboration (Horse Creek) but now got released with a Chinese SoC instead |
| 40 | +* Milk-V Pioneer is a board with 64 out-of-order cores; it is the only of |
| 41 | + its kind, with the cores being supposedly similar to something like ARM |
| 42 | + Cortex-A72. This would be enough in theory, however these boards are hard |
| 43 | + to get here (especially with Sophgon having some trouble, new US sanctions, |
| 44 | + and Mouser pulling all the Milk-V products) and from the information that |
| 45 | + is available to me, it is rather unstable, receives very little support, |
| 46 | + and is ridden with various hardware problems. |
| 47 | +* Things based on Spacemit K1 (e.g. Milk-V Jupiter) have an 8-core SoC that |
| 48 | + is technically an out-of-order design, but in practice the per-core |
| 49 | + performance is reportedly even worse than the JH7110, so it is unsuitable. |
| 50 | +* Boards based on JH7110 (e.g. VisionFive 2, the new Framework board etc.) |
| 51 | + utilitze 4 U74 cores (same configuration as my HiFive unmatched) that are |
| 52 | + simple in-order designs and therefore are unsuitable (similar to RPi3). |
| 53 | +* My HiFive Unmatched, which is the same situation as above. |
| 54 | +* Other available cores are usually much worse than any of the above. |
| 55 | + |
| 56 | +The promising option (Milk-V Oasis with 16 SiFive P670 cores) that was |
| 57 | +first announced in 2023 ultimately ended up being canned due to issues |
| 58 | +the SoC vendor has, and nobody has ever seen a single production chip, |
| 59 | +let alone a board. As far as I can tell, no other options are coming up. |
| 60 | + |
| 61 | +It is unsustainable to stick with the current situation with the emulator. |
| 62 | +Doing so has numerous problems: |
| 63 | + |
| 64 | +* We could never actually run tests on the packages being built, because |
| 65 | + the emulator is unreliable and will result in false positive failures. |
| 66 | + Disabling stuff conditionally for RISC-V is not a viable option because |
| 67 | + they are not RISC-V issues and will always happen with emulation, so |
| 68 | + all the RISC-V packages were being built without tests. |
| 69 | +* It is very slow, being by far the slowest builder in our fleet. It is |
| 70 | + still several times faster than e.g. the JH7110 would build things. The |
| 71 | + performance is actually rather variable; things that can parallelize |
| 72 | + really well run at a fairly reasonable speed due to being able to spawn |
| 73 | + many emulators, while things like configure scripts that are single |
| 74 | + thread and fork a lot run very slowly. Either way, overall, it is much |
| 75 | + slower than any of the other builders, despite RISC-V being until the |
| 76 | + introduction of LoongArch64 builds the only architecture with no LTO. |
| 77 | +* Most importantly, it is unreliable. The `qemu` emulator likes to hang |
| 78 | + during various workloads, with the emulator going into sleep state and |
| 79 | + remaining there forever. When that happens, the builds have to be |
| 80 | + manually canceled and restarted (it is not deterministic). This used |
| 81 | + to be worse before before some fixes, but even with latest version of |
| 82 | + the emulator it still happens, particularly during Go builds (since |
| 83 | + we rebuild every Go program upon toolchain updates for secfixes, |
| 84 | + any such rebuild can require many manual cancelations and restarts). |
| 85 | +* It burns a ton of power for how slow it is, because it fully loads |
| 86 | + a beefy x86 machine, and I'm not happy at all about that. |
| 87 | + |
| 88 | +At this point, to have a relatively sustainable base, we'd need a board |
| 89 | +that is at least as powerful as Raspberry Pi 5. This would still make |
| 90 | +the slowest builder in the fleet, but it would likely be faster than |
| 91 | +the current emulation arrangement while also being more reliable. |
| 92 | + |
| 93 | +However, the industry does not seem to be interested in producing such |
| 94 | +machines and for most part focuses on embedded (low-end) as well as |
| 95 | +things entirely irrelevant to a distro (AI/NPU etc.) that do not help |
| 96 | +at all; at this point I don't think we can wait any longer, especially |
| 97 | +as no remedy has been announced. |
| 98 | + |
| 99 | +We have no such problem with the other architectures; obviously x86 and |
| 100 | +ARM are at this point mainstream and this does not surprise anyone, but |
| 101 | +even the likes of LoongArch have perfectly acceptable hardware (not the |
| 102 | +fastest, but also not a bottleneck) that performs reliably. |
| 103 | + |
| 104 | +## Will RISC-V support be reintroduced? |
| 105 | + |
| 106 | +If acceptable build hardware is released and is reasonably available to |
| 107 | +us, the architecture will be reintroduced. |
| 108 | + |
| 109 | +If that happens, the repositories will be rebuilt from scratch, as if |
| 110 | +a new architecture, with a process similar to how it was recently done |
| 111 | +with LoongArch64. It will be a tier-2 architecture with enforced tests |
| 112 | +and without LTO just like LoongArch64. |
| 113 | + |
| 114 | +However, whether or when that will happen is currently a big unknown |
| 115 | +due to such hardware not existing and nothing being even announced. |
| 116 | + |
| 117 | +Nothing will change in the other architecture support. The new tier |
| 118 | +list will be: |
| 119 | + |
| 120 | +* Tier 1 for `aarch64`, `ppc64le`, and `x86_64` |
| 121 | +* Tier 2 for `loongarch64` |
| 122 | +* Tier 3 for `ppc64` and `ppc` |
| 123 | + |
| 124 | +There is also some chance of ARMv7 and ARMv6 32-bit repositories being |
| 125 | +introduced in the next few months' timeframe, as we may be moving to |
| 126 | +an oversized Ampere Altra machine for all ARM builds (right now AArch64 |
| 127 | +is served by a Hetzner Cloud VM and can't take any more load). This is |
| 128 | +yet not set in stone, however. |
0 commit comments