Binderless Hosts: redroid via QEMU/TCG micro-VM¶
Status: proof-of-concept validated (2026-06)
redroid (Android 11) was booted to sys.boot_completed=1 inside a
QEMU TCG guest (no KVM, no host binder) on a locked-down
Firecracker host, and held a stable 5-minute idle (no process
flapping, no reboot) once CONFIG_PSI=y was added. This doc records
the why, the reproducible recipe, the debugging log, and the
proposed backend/fallback design. Nothing here has shipped in the
CLI yet — it is the spec a future vm backend should build against.
See also
Running in CI / without kernel access
and Device backends.
1. The problem¶
Beetroot's Redroid backend runs Android userspace directly on the host's Linux kernel (that is the whole point of a container — there is no second kernel). Android's IPC is binder, a kernel driver. So Redroid can only run where the host kernel provides binder:
CONFIG_ANDROID_BINDER_IPC=y(built in), or- a loadable
binder_linuxmodule (Waydroid/anbox-modules DKMS), or - nothing — and then Redroid cannot run at all.
On a host with no binder and no way to add it, Beetroot is dead in the water. Concretely this includes:
- hardened CI runners and managed cloud sandboxes booted with
nomoduleor signed-module enforcement; - the Claude-Code-on-the-web execution environment this PoC was built
in: a Firecracker microVM, kernel cmdline contains
nomodule,# CONFIG_ANDROID_BINDER_IPC is not set, no/dev/kvm.
In that environment every "just load the module" path is closed:
init_module(2) returns ENOSYS, there is no /lib/modules, and
mount -t binder fails with unknown filesystem type.
Why not implement binder in userspace?
A userspace binder shim (CUSE/FUSE, or seccomp-unotify +
userfaultfd intercepting the ioctl/mmap ABI) is possible —
gVisor's Sentry once had one — but it is a multi-month research
project, risks Android's watchdog timeouts, and gVisor removed
theirs as an unmaintained burden. We chose the pragmatic path:
bring our own kernel in a VM.
2. The capability ladder¶
From fastest/cheapest to slowest, the ways to give Redroid a binder:
| Rank | Mechanism | Speed | Works when |
|---|---|---|---|
| 1 | Host kernel has CONFIG_ANDROID_BINDER_IPC=y |
native | host already supports it |
| 2 | Load out-of-tree binder_linux (DKMS) |
native | modules allowed + headers present |
| 3 | Nested VM w/ binder kernel, KVM-accelerated | near-native | /dev/kvm present |
| 4 | Nested VM w/ binder kernel, TCG (pure emulation) | ~5–20× slower | always (last resort) |
Ranks 1–2 are "use the host." Ranks 3–4 are "bring our own kernel." This PoC proves rank 4, the hardest case — if that works, rank 3 (same thing + KVM) is strictly easier and faster.
3. What we proved¶
3.1 Binder is usable under TCG (not just present)¶
A guest kernel built with binder, booted under TCG, exposes binderfs and — critically — routes a real cross-process transaction:
- dynamic device creation via
binder-control(BINDER_CTL_ADD) ✅ - a server process became the context manager (handle 0 — the role
servicemanagerplays at Android boot) ✅ - a client sent
BC_TRANSACTION→ driver → serverBR_TRANSACTION, serverBC_REPLY→ clientBR_REPLY, payloads verified ✅ (BINDER_VERSIONprotocol 8)
3.2 Full redroid boot under TCG¶
Host: Firecracker VM — nomodule, no /dev/kvm, no host binder
└─ QEMU 8.2 TCG (thread=multi, -cpu max, -smp <host cores>, -m 8192)
└─ Linux 6.12.9 (custom; binder + binderfs + BPF + cgroup/overlay/virtio built-in)
└─ containerd + dockerd (static 27.5.1)
└─ docker run --privileged redroid/redroid:11.0.0-latest
└─ sys.boot_completed = 1 @ t≈150s
Evidence from logcat: system_server, ActivityManager starting app
processes, Zygote forking children, SurfaceFlinger+hwcomposer
VSYNC, adbd/JDWP. Full boot ≈ 2.5 min under pure emulation.
4. Reproducible recipe¶
All artifacts were assembled under /root/binder-poc in the PoC. The
durable knowledge is the kernel config delta, the guest init
contract, and the QEMU invocation.
4.1 Kernel (build on the host; runs as the guest kernel)¶
Start from make defconfig (x86_64), then enable — all built-in
(=y), never =m, because the guest rootfs ships no modules:
# Binder (the whole point)
CONFIG_ANDROID=y
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ANDROID_BINDERFS=y
CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
# Container runtime (Docker/containerd/runc)
CONFIG_NAMESPACES=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y
CONFIG_CGROUPS=y CONFIG_MEMCG=y CONFIG_BLK_CGROUP=y CONFIG_CPUSETS=y
CONFIG_CGROUP_PIDS=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_BPF=y
CONFIG_OVERLAY_FS=y CONFIG_EXT4_FS=y
CONFIG_BRIDGE=y CONFIG_VETH=y CONFIG_TUN=y
CONFIG_NETFILTER=y CONFIG_NF_NAT=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_NAT=y
CONFIG_NETFILTER_XT_TARGET_MASQUERADE=y CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y
# bpf() syscall — REQUIRED: cgroup-v2 device control is eBPF-only.
# Without this, runc dies: bpf_prog_query(BPF_CGROUP_DEVICE) ENOSYS.
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
# Guest I/O
CONFIG_VIRTIO_BLK=y CONFIG_VIRTIO_NET=y CONFIG_VIRTIO_PCI=y CONFIG_VIRTIO_CONSOLE=y
CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_INITRD=y
# Stability (see §5 lmkd): pressure-stall info for lmkd / low-memory-killer
CONFIG_PSI=y
ashmem is gone — Android 11+ only
The ashmem driver was removed from mainline in Linux 5.18.
Android ≤10 needs /dev/ashmem; Android 11+ falls back to
memfd_create (CONFIG_MEMFD_CREATE=y). On a modern kernel you
must use a redroid 11.0.0+ image. (We tried 8.1 first — it
cannot boot on 6.12.)
4.2 Guest rootfs¶
Lightweight by design (fast TCG boot): busybox-static + the Docker
static binary bundle (dockerd, containerd, containerd-shim-runc-v2,
runc, docker, ctr, docker-proxy, docker-init). Packed into a
raw ext4 image with mke2fs -d <tree> (no loop mount needed); the
image is editable after the fact with debugfs -w -R "write …" +
sif <inode> mode 0100755.
4.3 Guest init contract (PID 1)¶
Ordered, with the hard-won correctness notes inline:
- mount
proc,sysfs,devtmpfs,devpts,tmpfson/dev/shm,tmpfson/runAND/var/run(the rootfs/var/runis not a symlink to/run; a staledocker.pidthere makes dockerd refuse to start),cgroup2on/sys/fs/cgroup,binderon/dev/binderfs. - Start
containerdstandalone first, wait for/run/containerd/containerd.sock(generous timeout — TCG is slow). Do not rely on dockerd's managed containerd: its 15 s startup timeout is blown under TCG. - Start
dockerd --containerd=/run/containerd/containerd.sock --iptables=false --bridge=none. Wait on real readiness (docker versionreturns a server version), not socket existence. docker rm -f redroid(clear any stale container from a prior boot —/var/lib/dockeris persistent).docker run -d --privileged --name redroid --network none redroid/redroid:11.0.0-latest androidboot.redroid_gpu_mode=guest …--namemust be ≥2 chars (ris rejected by docker's regex).--network nonepairs with--bridge=none(no iptables needed).gpu_mode=guest= software rendering (no GPU under TCG).- Poll
docker inspect -f '{{.State.Status}}'(fail fast if the container is notrunning) andgetprop sys.boot_completed.
4.4 QEMU invocation¶
qemu-system-x86_64 \
-M q35 -accel tcg,thread=multi,tb-size=1024 -cpu max -smp "$(nproc)" -m 8192 \
-nographic -display none -no-reboot \
-kernel bzImage \
-drive file=rootdisk.img,format=raw,if=virtio \
-append "console=ttyS0 root=/dev/vda rw init=/init panic=1 mitigations=off"
thread=multi (MTTCG) is the single biggest TCG lever — one host
thread per guest vCPU. -cpu max exposes SSE4/AVX that ART/bionic
expect. -smp is pinned to the host's physical core count (vm.smp:
auto, the default — the vm-rnd-log §B.5 sweep found this is the optimum:
the boot scales with vCPUs up to the host core count, then regresses, so a
logical-CPU count would oversubscribe on a hyperthreaded host).
mitigations=off drops the guest's speculative-execution barriers —
pure (emulated under TCG / real under KVM) overhead for a throwaway,
single-tenant research sandbox. On a host with /dev/kvm, swap
-accel tcg,thread=multi for -accel kvm for near-native speed (rank 3).
5. Debugging log (every blocker was plumbing, not physics)¶
This is the most reusable knowledge — the failure→fix chain:
| # | Symptom | Root cause | Fix |
|---|---|---|---|
| 1 | host can't load binder | nomodule, init_module=ENOSYS |
build guest kernel w/ binder =y |
| 2 | "is binder even usable under TCG?" | unknown | proved cross-process transaction |
| 3 | docker run → "See 'docker run --help'" |
error text truncated by tail -1; --name r too short (docker names need ≥2 chars) |
rename redroid; never tail the error |
| 4 | dockerd: "timeout waiting for containerd to start" | managed-containerd 15 s timeout blown under TCG | run containerd standalone, wait on its socket |
| 5 | dockerd: "delete /var/run/docker.pid: PID N still running" | /var/run persistent, stale pidfile, PID reused |
tmpfs over /var/run |
| 6 | runc: bpf_prog_query(BPF_CGROUP_DEVICE) ENOSYS |
cgroup-v2 device ctrl is eBPF-only; CONFIG_BPF_SYSCALL off |
enable CONFIG_BPF_SYSCALL |
| 7 | "container name /redroid already in use" | prior boot left a created container on disk |
docker rm -f before run |
| → | sys.boot_completed=1 |
— | ✅ |
Resolved: lmkd flapping → CONFIG_PSI=y
Before PSI, lmkd (low-memory-killer daemon) crash-looped
("critical process 'lmkd' exited 4 times") because it opens
/proc/pressure/memory and arms PSI triggers. boot_completed was
still reached, but unaddressed those restarts can eventually trigger
an Android reboot. Adding CONFIG_PSI=y fixed it: a re-run
confirmed /proc/pressure/{cpu,io,memory} present, lmkd stayed
running across a 5-minute idle, zero critical-exits, and the
container never rebooted (POC_RESULT: STABLE_IDLE_OK).
Debugging methodology that worked
- Never
tail -1an error stream — it hid the real docker message behind "See 'docker run --help'." for three iterations. - Fail fast: a poll loop that waits 30 min on a dead container
looks like a TCG hang and wastes a whole VM cycle. Check
State.Status != runningand bail withdocker logs. - Cache across iterations: the redroid image lives in the
persistent
/var/lib/docker, so skipdocker loadafter the first boot — iterations drop from ~25 min to ~3 min. - Edit the image in place with
debugfs, don't rebuild the 16 GB disk to change one script.
6. Performance¶
- Minimal kernel + busybox boot under TCG: ~3 s.
- Full redroid Android 11 boot to
boot_completed: ~150 s under TCG (-smp 4, MTTCG, 8 GiB). Android feels the emulation (ActivityManager: Slow operation: 5142ms) but completes. - The watchdog/ANR boot-loop risk that was feared did not materialize for boot. With KVM (rank 3) this should approach native redroid speed.
- Shipped boot-speed levers (on top of MTTCG + KVM fast path):
-smpauto-sizes to the host's physical core count (vm.smp: autodefault — HT siblings collapsed, capped by CPU affinity; the vm-rnd-log §B.5 optimum, since a logical-CPU count would oversubscribe a hyperthreaded host), and the guest kernel cmdline carriesmitigations=off, dropping speculative-execution barriers that are pure overhead for an ephemeral single-tenant sandbox (emulated work under TCG, real serialization under KVM).
7. Proposed Beetroot integration (the backend + fallback design)¶
This is the design question this PoC unblocks: how should Beetroot choose between using the host's binder and falling back to the VM?
Shipped: the binder config switch
The config switch landed as a top-level binder: auto | host | vm
key on InstanceConfig (see config reference),
wired into the beetroot up preflight and the host.binder doctor
row. The naming below predates that work — read redroid_backend as
the shipped binder key (native ≈ the auto/host host-binder
path). The refinements that shipped: auto keeps the lenient
warn-and-proceed behaviour, host is the strict fail-fast variant,
and vm is the explicit opt-in. The micro-VM engine itself
(everything vm needs to actually boot) shipped in issue #44:
selecting vm now dispatches beetroot up to the QEMU launcher
(VmDeviceBackend) described in §8 below. KVM detection is a runtime
concern (probed live from /dev/kvm), so no separate vm_accel config
key was needed — the optional vm.accel knob (auto/kvm/tcg) lives
in the new vm: config block alongside vm.kernel / vm.rootfs /
vm.smp / vm.memory_mib.
7.1 Principle¶
Silent automation is a gift when it is cheap and correct; it is a trap when it is expensive. Make the fast paths invisible; make the expensive path a deliberate, loud choice — never a silent surprise.
A fully-automatic ladder that quietly falls all the way to TCG would "work" but leave a developer staring at a 2½-minute boot with no idea why it is 20× slower than their colleague's. That is exactly the frustration to design out.
7.2 Recommended model: auto-detect, auto-use cheap paths, gate the expensive one¶
Pre-ship naming below — read against the shipped keys
The redroid_backend / vm_accel / native / module names in §7.2
through §7.6 are pre-ship design naming that never shipped under those
spellings. The keys that actually shipped are the top-level
binder: auto | host | vm plus a vm.accel (auto | kvm | tcg) knob in
the vm: config block. Translate as you read: redroid_backend →
binder, vm_accel → vm.accel; the native/module discriminator
values collapsed into the host-binder path that auto/host select.
Add a config key (see config reference):
redroid_backend: auto # auto | native | module | vm
vm_accel: auto # auto | kvm | tcg (only consulted for `vm`)
auto resolution order:
- native — host kernel has binder → use it. (transparent, fast)
- module —
binder_linuxis loadable → load it and use it. (transparent, fast, near-native) - STOP. If neither works, do not silently start a VM.
beetroot upfails fast with an actionable error:
error: this host has no kernel binder and the binder_linux module
could not be loaded (nomodule / no matching headers).
Redroid needs binder. Options:
• enable CONFIG_ANDROID_BINDER_IPC / load binder_linux on the host, or
• opt into the emulated micro-VM backend: redroid_backend: vm
(slow without KVM on this host: /dev/kvm absent → expect ~Nm boots)
Run `beetroot doctor` for the full capability report.
The VM backend is opt-in (redroid_backend: vm). The cost is
real, so the choice is explicit. But once chosen it is not nagged on
every command — it is configured.
7.3 Make the expensive path loud when engaged¶
When redroid_backend: vm is active, beetroot up prints a one-time,
unmissable banner:
▶ backend: emulated micro-VM (no host binder)
acceleration: TCG (software) — /dev/kvm not available
first boot is slow (~2–3 min); this is expected, not a hang.
pin `vm_accel: kvm` on a host with nested virt for near-native speed.
and beetroot doctor always reports the resolved backend + why:
binder: absent (CONFIG_ANDROID_BINDER_IPC not set)
binder_linux: not loadable (nomodule)
/dev/kvm: absent
→ selected backend: vm (tcg) [reason: no host binder; KVM unavailable]
7.4 Why not fully-manual?¶
Forcing every developer to hand-pick a backend punishes the 95 % whose
host does have binder (ranks 1–2) with needless config. Detection is
cheap and reliable (read /proc/filesystems, /sys/…, modprobe -n,
stat /dev/kvm), so the cheap-and-correct paths should "just work".
7.5 Why not fully-automatic?¶
Because rank 4 (TCG) has a 5–20× cost. Auto-engaging it silently trades a clear, one-time error message for recurring, mysterious slowness — the worst trade in UX terms. The boundary is drawn exactly at the expensive step.
7.6 Summary¶
| Path | Cost | Engagement |
|---|---|---|
| native binder | free | automatic, silent |
load binder_linux |
~free | automatic, silent |
| VM + KVM | small | opt-in (vm), auto-accelerated, banner once |
| VM + TCG | large | opt-in (vm), loud banner, doctor explains |
8. Roadmap¶
- Stabilize — ✅ done:
CONFIG_PSI=ygives a clean 5-minute idle (no lmkd flapping, no reboot). The kernel config fragment is now pinned in-tree atsrc/beetroot/templates/vm/kernel.configso the build is reproducible. - Package — ✅ done (issue #44): the kernel config fragment
(
src/beetroot/templates/vm/kernel.config) and guest init (src/beetroot/templates/vm/guest-init.sh) are shipped as package data (sobeetroot build --vm-kernelworks from auv tool installwheel); the rootfs builder is pure-Python (build_rootfsinsrc/beetroot/builder.py, the formerdocker/vm/build-rootfs.sh), andbeetroot build --vm-kernelwraps the kernel build + rootfs assembly. VmDeviceBackend— ✅ done (issue #44): implemented against theDeviceBackendProtocol.upboots the micro-VM and forwards ADB;downpowers it off (SIGTERM to the pidfile'd QEMU); the guest's redroid5555is mapped through QEMU user-nethostfwdto the instance's stride-allocated host ADB port.doctor+ banners — ✅ done (issue #44):beetroot upprints the §7.3 capability banner (quiet on KVM, loud on TCG) andbeetroot doctorrunsvm.process+vm.accel+vm.qemu+vm.artifacts+adb.connectrows (issue #191/#164).binder: auto/hostkeep their existing behaviour.- KVM fast path — ✅ done (issue #44):
detect_accelprobes/dev/kvm(read+write) and prefers-accel kvm(-cpu host);autofalls back to TCG, an explicitkvmrequest on a host without/dev/kvmerrors loudly. The whole guest stack is x86_64, and KVM only virtualizes the host's native architecture, sodetect_accelis host-arch-aware (issue #190): on a non-x86_64 host an explicitkvmis a cross-arch error andautoresolves to TCG regardless of/dev/kvm(a stray native/dev/kvmcan't accelerate the x86_64 guest).beetroot modesreports the KVM path — and the nativebinder: host/autoredroid path — asunsupportedon a non-x86_64 host, while the TCG path stays reachable cross-arch (even slower).
Usage¶
On a host with no kernel binder driver (beetroot doctor shows
host.binder as unsupported):
# 1. Build the guest kernel (binder + cgroup + bpf + PSI) and rootfs.
beetroot build --vm-kernel
# 2. Create an instance, opt into the micro-VM, point it at the artifacts.
beetroot create alpha
# edit alpha/beetroot.yaml:
# binder: vm
# vm:
# kernel: ~/.cache/beetroot/vm/bzImage
# rootfs: ~/.cache/beetroot/vm/rootdisk.img
# accel: auto # auto | kvm | tcg
beetroot apply alpha # flips the registry kind to the vm backend
# 3. Boot it. On a KVM host this is near-native; without /dev/kvm it falls
# back to TCG (~5-20x slower first boot — expected, not a hang).
beetroot up alpha
beetroot shell alpha
The kernel/rootfs paths can also come from the BEETROOT_VM_KERNEL /
BEETROOT_VM_ROOTFS environment variables (and the QEMU binary from
BEETROOT_QEMU_BIN) instead of the vm: block. The expensive TCG path is
never engaged automatically — binder: vm is always an explicit opt-in.
Prebuilt artifacts (skip the compile + the bake)¶
beetroot build --vm-kernel does not always build locally. Both of its
artifacts are fetched as prebuilt GitHub release assets when one matching the
local inputs exists, falling back to a local build otherwise:
- The kernel (
beetroot.kernel_download) fetches a prebuiltbzImage(~12 MiB) keyed on the pinned kernel version + a fingerprint of the bundledkernel.config, instead of the ~7-min source compile. - The rootfs (
beetroot.rootfs_download, issue #79) fetches a zstd-compressed ext4 image instead of pulling + baking a ~2 GiB redroid image into/var/lib/dockerlocally (which needs a running Docker daemon). The asset is keyed per Android version + a composite fingerprint over the three inputs that determine the baked bytes: the Android major version, the pinned Docker static-bundle version, andguest-init.sh. Change any of those and the fingerprint changes, the prebuilt no longer matches, and the build falls back to a local bake — so you can never boot a stale prebuilt rootfs.
Each prebuilt lives in its own per-fingerprint immutable release
(vm-kernel-<ver>-<fp> / vm-rootfs-<ver>-<fp>), published by
.github/workflows/vm-kernel-release.yml and
.github/workflows/rootfs-release.yml. The fetch path downloads the asset plus
a .sha256 sidecar, verifies the digest before decompressing, then writes
the image atomically (the rootfs fetch also writes the .android-version
marker so the version-skew check treats a fetched image identically to a baked
one). Pass --from-source to skip both fetches and build the kernel and rootfs
locally.