Micro-VM R&D log — Stage A (kernel + minimal rootfs under TCG)¶
Status: Stage A validated (2026-06-15)
The vendored kernel-config fragment and the QEMU/TCG invocation from binderless-hosts-qemu-tcg.md were taken from spec to a real, reproducible build and boot on a binderless, KVM-less host. A pinned mainline kernel was built from source, a minimal busybox rootfs was booted under pure TCG, and binderfs was confirmed mounted with the three configured device nodes. This log records the exact toolchain, the build, the working invocation, measured timings, and the one correction the fragment needed. Stage B (docker + redroid in the guest) is not yet run — the artifacts are ready for it.
Asset paths
The micro-VM build assets (kernel.config, adbprobe.c) now live under
src/beetroot/templates/vm/ — they were moved there from docker/vm/ in
77. Live recipes below point at the new location; any remaining¶
docker/vm/ mentions are historical R&D narrative that predate the move.
Environment¶
- Host: 4 CPU cores, 15 GiB RAM, ~30 GiB free on
/. Running as root. - No
/dev/kvm, no vmx/svm CPU flags → nested virt unavailable. All measurements are pure TCG (software emulation), the rank-4 worst case. This matches the PoC's Firecracker/TCG host.
1. Toolchain (exact packages + versions)¶
Installed via apt-get on Ubuntu 24.04 (noble):
| Package | Version |
|---|---|
| qemu-system-x86 | 1:8.2.2+ds-0ubuntu1.16 (QEMU 8.2.2 — matches PoC "QEMU 8.2") |
| build-essential | 12.10ubuntu1 (gcc 13.3.0, GNU ld 2.42) |
| flex | 2.6.4-8.2build1 |
| bison | 2:3.8.2+dfsg-1build2 |
| libelf-dev | 0.190-1.1ubuntu0.1 |
| libssl-dev | 3.0.13-0ubuntu3.11 |
| bc | 1.07.1-3ubuntu4 |
| cpio | 2.15+dfsg-1ubuntu2 |
| busybox-static | 1:1.36.1-6ubuntu3.1 |
| e2fsprogs | 1.47.0-2.4~exp1ubuntu4.1 (mke2fs 1.47.0) |
| xz-utils | 5.6.1+really5.4.5-1ubuntu0.3 |
GNU Make 4.3.
2. Kernel (pinned, reproducible build)¶
- Pinned version: Linux 6.12.9 (
cdn.kernel.org/pub/linux/kernel/v6.x/), matching the~6.12.xthe design doc references. This is roadmap item 1 ("pin the kernel build"). - Config recipe (exactly as the vendored fragment intends):
make defconfig # x86_64 base
./scripts/kconfig/merge_config.sh -m .config src/beetroot/templates/vm/kernel.config
make olddefconfig
make -j4 bzImage
- Build time: 450 s (7.5 min) on 4 cores,
-j4. Output: 14 MiB bzImage. - Final
.configverification (all=y, confirmed present afterolddefconfig):CONFIG_ANDROID_BINDER_IPC,CONFIG_ANDROID_BINDERFS,CONFIG_PSI,CONFIG_BPF_SYSCALL,CONFIG_BPF_JIT,CONFIG_VIRTIO_BLK,CONFIG_VIRTIO_PCI,CONFIG_VIRTIO_CONSOLE,CONFIG_OVERLAY_FS,CONFIG_EXT4_FS,CONFIG_CGROUP_BPF,CONFIG_USER_NS,CONFIG_DEVTMPFS_MOUNT,CONFIG_MEMFD_CREATE— all set.
Correction to src/beetroot/templates/vm/kernel.config¶
CONFIG_ANDROID=yis stale and was removed. On modern kernels there is noCONFIG_ANDROIDsymbol — the umbrella config was dropped anddrivers/android/Kconfignow opens straight onmenu "Android"withANDROID_BINDER_IPCas a top-level entry. Verified absent from the 6.12.9.configeven after requesting it (silent no-op).ANDROID_BINDER_IPCenables binder on its own. The fragment now carries a comment explaining this instead of the dead line.
No other fragment options needed correcting — binder, binderfs, PSI, bpf, and the virtio set all took effect as written.
3. Minimal rootfs (Stage-A: busybox + init, no docker yet)¶
Per §4.2/§4.3 ordering, a deliberately tiny rootfs to validate boot + binderfs before layering docker/redroid (Stage B):
busybox-static(host's/bin/busybox, 1.36.1) at/bin/busybox, applets self-installed at boot viabusybox --install -s./init(PID 1): mountsproc,sysfs,devtmpfs,devpts,tmpfs(/run),cgroup2(/sys/fs/cgroup), thenmount -t binder binder /dev/binderfs, prints confirmation markers, andpoweroff -f(so automated runs self-terminate; an interactive build wouldexec shhere instead).- Packed as a 256 MiB raw ext4 image with
mke2fs -q -t ext4 -d <tree>— no loop mount, no root needed, exactly the §4.2 technique.
4. Working QEMU invocation (TCG) + boot result¶
qemu-system-x86_64 \
-M q35 -accel tcg,thread=multi,tb-size=1024 -cpu max -smp 4 -m 8192 \
-nographic -display none -no-reboot \
-kernel bzImage \
-drive file=minroot.img,format=raw,if=virtio \
-append "console=ttyS0 root=/dev/vda rw init=/init panic=1"
This is the §4.4 invocation verbatim and it boots cleanly — no kernel
warnings, BUGs, or oops. virtio_blk brings up /dev/vda, ext4 mounts r/w.
binderfs confirmation (the whole point)¶
BINDERFS_MOUNT_OK
crw------- 1 0 0 247, 1 binder
crw------- 1 0 0 247, 0 binder-control
crw------- 1 0 0 247, 2 hwbinder
crw------- 1 0 0 247, 3 vndbinder
drwxr-xr-x 2 0 0 features/
--- /proc/filesystems ---
nodev binder
All three devices from CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"
appear, plus binder-control (the BINDER_CTL_ADD dynamic-device endpoint
that Android's servicemanager setup relies on) and the features/ dir.
PSI_OK: /proc/pressure/{cpu,io,memory} all present (the lmkd-stability
lever from §5). cgroup2 controllers: cpuset cpu io memory hugetlb pids rdma
misc.
5. Measured timings (pure TCG)¶
Kernel→init→binderfs-ready, measured inside the guest via /proc/uptime at
the init markers; host wall is the full QEMU launch→poweroff cycle.
| Invocation | guest uptime to init-ready | host wall (boot+poweroff) | binderfs |
|---|---|---|---|
tcg,thread=multi -smp 4 |
3.46–3.65 s | 4.65–4.85 s | OK |
tcg,thread=single -smp 4 |
3.99–4.06 s | 5.17–5.29 s | OK |
tcg,thread=multi -smp 2 |
3.13 s | 4.28 s | OK |
tcg,thread=multi -smp 1 |
3.00–3.42 s | 4.54 s | OK |
This matches the doc's "minimal busybox boot under TCG: ~3 s."
Perf-lever direction (honest reading)¶
- MTTCG helps when vCPU count is fixed: at
-smp 4,thread=multiis ~0.4–0.5 s faster thanthread=single(3.46–3.65 s vs 3.99–4.06 s). Confirmsthread=multiis the right default — the §4.4 claim holds. - More vCPUs did NOT help this trivial boot:
-smp 1/2were slightly faster than-smp 4(3.0–3.1 s vs 3.5 s) because a busybox boot has almost no parallel guest work, so extra-vCPU SMP bring-up + cross-vCPU sync is pure overhead. The-smp 4payoff is expected to appear only under a parallel workload (ART/Zygote/system_server during the real redroid boot) — i.e. it is a Stage B measurement, not visible on a bare-init boot. Do not read this as "drop to -smp 1"; read it as "the SMP lever needs a real workload to show its value."
6. Artifacts (scratch — never committed)¶
- Kernel source:
/home/user/vm-rnd/linux-6.12.9/ - bzImage:
/home/user/vm-rnd/bzImage(14 MiB, Linux 6.12.9 SMP) - Final kernel
.config:/home/user/vm-rnd/linux-6.12.9/.config - Minimal rootfs:
/home/user/vm-rnd/minroot.img(256 MiB ext4) - Serial console logs:
/home/user/vm-rnd/boot-*.log
7. Blockers / notes¶
/usr/bin/timeis not installed by default on the host — the first build invocation failed withrc=127beforemakeeven ran. Re-run with shelldatearithmetic for timing. (Not a recipe issue; a host-tooling note.)- No hard walls hit. Kernel built first try (after the
timefix); rootfs booted first try; binderfs mounted first try.
8. Readiness for Stage B¶
The bzImage + minimal rootfs are a known-good base. Stage B (add the Docker
static bundle + guest-init.sh + a redroid 11.0.0 image and drive it to
sys.boot_completed=1) can build on /home/user/vm-rnd/bzImage. The kernel
already has every config the full stack needs (bpf syscall for runc, cgroup2,
overlay, PSI for lmkd, memfd for ashmem-less Android 11+), so no kernel rebuild
should be required to proceed — only swapping the minimal rootfs for the full
build-rootfs.sh output.
Micro-VM R&D log — Stage B (full stack: docker + redroid Android 11 under TCG)¶
Status: Stage B validated (2026-06-15) — full redroid boot, ADB partially
The full stack — busybox + a static docker/containerd/runc bundle + a
baked-in redroid Android 11 image + guest-init.sh as PID 1 — was built
on top of the Stage A bzImage and booted under pure TCG on the same
binderless, KVM-less host. redroid reached sys.boot_completed=1 in ~100 s
(reproducibly, across 10 boots), fully offline (no internet in the guest).
An SMP sweep on the real redroid boot quantified the MTTCG/SMP payoff that
Stage A's trivial busybox boot could not show. The host→guest ADB path was
diagnosed hop-by-hop: every layer except docker's port-publish machinery works;
the corrected, docker-bypassing networking model (socat relay into redroid's
netns) is recorded below, including the open last-hop hardening item.
B.1 Image acquisition (exactly how, fully offline guest)¶
Note (post-#82):
11.0.0-latestis the Android version this R&D log originally validated, and it remainsguest-init.sh's historical fallback. The builder's default baked Android version is now 14 —build_rootfsderives the redroid image fromandroid_version(defaultconfig.DEFAULT_ANDROID_VERSION = 14) viaconfig.vm_redroid_image, so a defaultbeetroot build --vm-kerneland a defaultbeetroot createagree. The11.0.0-latestreferences below describe the originally-validated R&D stack, not today's default.
The guest boots redroid with --network none (see B.4) and has no
internet, so the image must be present on disk before first boot.
- Pulled on the host with a docker daemon. The host had the
dockerCLI +containerdbut no running daemon; we started one (dockerd &— runs fine on this kernel, it just can't run redroid for lack of binder) anddocker pull redroid/redroid:11.0.0-latest. Correction: the plain tagredroid/redroid:11.0.0does not exist on Docker Hub — the valid tags are11.0.0-latest,11.0.0-<date>, etc.guest-init.sh/build-rootfs.shuse11.0.0-latest(matching the design doc). - Baked into the guest
/var/lib/docker, notdocker load-ed at boot. Wedocker saved the image (815 MiB tar), then loaded it into a staging/var/lib/dockerusing the same static docker bundle version the guest runs (27.5.1) so theoverlay2on-disk layout is byte-compatible with the guest's dockerd. That staged data-root (2.0 GiB) iscp -a'd into the rootfs tree. Result: in the guest,dockerdbecomes ready in ~1 s and the image is already present — no in-guestdocker load(which would be minutes under TCG). If a host has no docker daemon to pull with,build-rootfs.shaccepts a pre-made tarball viaREDROID_TAR=(e.g. fromskopeo copy docker://… docker-archive:…).
B.2 Rootfs corrections (build-rootfs.sh)¶
Note: the rootfs builder described here was originally the shell script
docker/vm/build-rootfs.sh. It has since been ported to typed, unit-tested Python asbuild_rootfsinsrc/beetroot/builder.py; every correction below (theREDROID_TAR=escape hatch, thecp -aof the staged data-root, the11.0.0-latesttag, the iptables-legacy + socat staging) is preserved there.
The vendored builder was close but would not have produced a bootable stack. Corrections made and validated:
| # | Issue in the vendored script | Fix |
|---|---|---|
| 1 | No redroid image baked in → guest needs internet / a slow in-guest docker load |
Bake the image into /var/lib/docker via a staging dockerd of the same static bundle version (overlay2 layout matches) |
| 2 | dockerd's bridge driver needs an iptables binary; the static docker bundle ships none → failed to create NAT chain DOCKER: iptables not found |
Stage iptables-legacy (the kernel has the legacy xt backend, CONFIG_NF_TABLES is off) + its shared libs + the ld-linux loader |
| 3 | busybox applet list was a tiny hand-picked set (missing poweroff, nsenter, netstat, udhcpc, ip, ps, …) |
Ship the single busybox binary; guest-init runs busybox --install -s at boot to lay down all applets |
| 4 | socat (the ADB relay, see B.4) not present |
Stage socat + its libs |
| 5 | 16 GiB image (wasteful) | 8 GiB is ample (2.1 GiB image + overlay scratch) |
| 6 | busybox fetched from busybox.net 1.35.0 | Use the host's known-good busybox-static (1.36.x); overridable via BUSYBOX_BIN |
The mke2fs -d <tree> technique (Stage A) was kept; repacking the 8 GiB image
takes ~10–20 s, and debugfs -w edits /init in place for fast iteration.
B.3 Guest boot sequence (guest-init.sh) — what actually worked¶
The §4.3 contract booted first try with no §5-class blockers on the boot path itself (the Stage A kernel already had bpf-syscall, cgroup2, overlay, PSI, memfd). The exact, validated ordering:
- mount proc/sysfs/devtmpfs(+EBUSY-tolerant, kernel auto-mounts it)/devpts/
tmpfs(
/dev/shm,/run,/var/run)/cgroup2/binderfs;ip link set lo up. containerdstandalone → socket up in ~2 s.dockerd --containerd=… --iptables=false --bridge=none→ ready in ~1 s (image pre-baked; no load).docker rm -f redroid(stale cleanup) →docker run -d --privileged --name redroid --network none redroid/redroid:11.0.0-latest androidboot.redroid_gpu_mode=guest ro.adb.secure=0 service.adb.tcp.port=5555 persist.adb.tcp.port=5555.- poll
docker inspect .State.Status == running(fail-fast) +docker exec redroid getprop sys.boot_completed.
Evidence: Android init brings up surfaceflinger, bootanim, system_server,
zygote, adbd; sys.boot_completed=1 and sys.bootstat.first_boot_completed=1
set; bootanim exits cleanly. No lmkd flapping (PSI present from Stage A).
B.4 Host→guest ADB networking — the critical finding¶
This is the VmDeviceBackend's whole purpose, so it was driven to a precise,
hop-by-hop diagnosis. The QEMU user-net hostfwd=tcp::<port>-:5555 assumption
in the Python build_qemu_argv is correct — host:port reaches the guest main
netns :5555 (verified: a raw TCP connect to the host-forwarded port succeeds).
The hard part is the last hop, guest → redroid's adbd. What was learned:
- redroid's adbd is USB-gadget-only by default.
persist.sys.usb.config=adb, noservice.adb.tcp.port→ adbd opens no TCP listener. The image's adbd binary does support TCP (strings:using tcp port=%d,service.adb.tcp.port). Settingservice.adb.tcp.port=5555+stop/start adbdmakes adbd bind:::5555(IPv6 wildcard); withnet.ipv6.bindv6only=0it accepts IPv4 too (a direct CNXN probe to127.0.0.1:5555and[::1]:5555inside the container both get a validCNXNreply — adbd is healthy).ro.adb.secure=0is already set in the image, so no host adb key is needed. --network noneis the only redroid networking mode that boots cleanly. Tested all three:--network none→ boots in ~100 s, stable (adbd in an isolated netns).--network host→ netd/zygote crash-loop (zygote SIGKILL'd every ~44 s, never reaches boot_completed) — the guest main netns has no usable address for Android's netd to manage. A §5-class blocker; do not use host net.-
-p 5555:5555(docker publish) → boots, the whole publish chain is wired (docker portshows0.0.0.0:5555, docker-proxy binds the guest main netns, iptables NATDNAT … to:172.17.0.2:5555present) but the byte path fails: a real ADBCNXNthrough docker-proxy getsConnection reset by peer. Per-hop isolation with a static CNXN probe (adbprobe) pinned it:Hop Path Result A container → adbd [::1]:5555CNXN OK A2 container → adbd 127.0.0.1:5555CNXN OK (dual-stack) B guest in-netns → in-container socat :15555→ adbdCNXN OK C guest main-netns → docker-proxy :5555connects, then RESET So adbd and an in-netns socat relay both work; docker-proxy's userspace forward is the broken hop on this minimal kernel. * The pure-DNAT fallback needs a kernel feature we don't have.
dockerd --userland-proxy=false(which would drop docker-proxy and rely on iptables DNAT alone) refuses to start: it needs/proc/sys/net/bridge/bridge-nf-call-iptables, i.e.CONFIG_BRIDGE_NETFILTER(br_netfilter), absent from the Stage A kernel; it also tried (and failed) to create anip6tables nattable (noIP6_NF_NATin-kernel). Setting--ip6tables=falseclears the ip6 error and dockerd does come up with--userland-proxy=false, but the IPv4 DNAT publish still did not deliver to adbd from the host under TCG in the time available. -
Corrected networking model (docker-bypassing socat relay). Because docker's port machinery is unusable here, the validated direction is to keep redroid on
--network none(stable boot) and bridge ADB outside docker: a socat relay thatnsenters into redroid's network namespace and forwards the QEMU hostfwd target (guest:5555) to adbd.guest-init.shdoes exactly this. The in-netns hop (B above) is proven; the cross-process socat chaining still needed hardening at the end of Stage B. ✅ RESOLVED in Stage C — see §C below: with two fixes (an EXEC wrapper script + bringing up the guest eth0) the relay now carries a full host→guest ADB session.
Implication for the Python build_qemu_argv¶
No change to the hostfwd is required: hostfwd=tcp::<adb_port>-:5555 (host port
→ guest main netns :5555) is the correct target — the guest-side socat relay
listens there. The backend should not assume docker -p/--network host
will surface adbd; the relay model in guest-init.sh is the contract. If a
later iteration prefers docker-native publishing, the kernel config must gain
CONFIG_BRIDGE_NETFILTER=y + CONFIG_IP6_NF_IPTABLES/NAT=y (then
--userland-proxy=false works and the relay can be dropped).
B.5 Performance — real redroid boot + SMP/MTTCG sweep¶
Wall-clock to sys.boot_completed=1, pure TCG, thread=multi (MTTCG), 8 GiB,
redroid --network none. boot_seconds is measured inside the guest from the
docker run to boot_completed; host_wall is the full QEMU launch→poweroff.
-smp |
boot_seconds (guest) | host wall |
|---|---|---|
| 2 | 180 s | 198 s |
| 4 | ~100 s (99–103 across runs) | 120 s |
| 8 | 113 s | 134 s |
- This is the headline number: ~100 s to
boot_completedat-smp 4, matching (slightly beating) the design doc's "~150 s under TCG" estimate. - The SMP/MTTCG payoff Stage A could not show is now visible. Stage A's
trivial busybox boot got slower with more vCPUs (pure SMP bring-up overhead,
no parallel work). The real redroid boot scales the other way:
-smp 4(= the host's physical core count) is ~1.8× faster than-smp 2because ART/Zygote/system_serverboot is genuinely parallel.-smp 8regresses slightly vs 4 (8 vCPU threads oversubscribe 4 physical cores → cross-thread TCG sync overhead). Conclusion: pin-smpto the host's physical core count;thread=multiremains the right default. On a KVM host (rank 3) this boot should approach native redroid speed.
Shipped as the default. This finding is now the vm.smp: auto default
(config.Vm.smp): qemu.resolve_smp("auto") pins -smp to the host's
physical core count via qemu.host_physical_cores() — distinct
(physical id, core id) pairs in /proc/cpuinfo, capped by
os.sched_getaffinity (affinity/cgroup-aware, so it is correct inside a
constrained CI container too). Counting physical cores rather than logical
CPUs is what makes this match the conclusion above: on a hyperthreaded host
a logical count (os.process_cpu_count()) would pick -smp 8 on a 4c/8t
box and hit the very oversubscription regression §B.5 measured. The
validated -smp 4 runs above were on a 4-physical-core host, which auto
reproduces exactly — so the measured numbers still describe the shipped
default on that class of host, while other hosts now get the right -smp
without hand-tuning.
B.6 Artifacts (scratch — never committed)¶
- Full rootfs tree:
/home/user/vm-rnd/fullroot/(busybox + static docker + iptables-legacy + socat + baked/var/lib/docker). - Full rootfs image:
/home/user/vm-rnd/rootdisk.img(8 GiB ext4). - Redroid image tarball:
/home/user/vm-rnd/redroid-11.tar(815 MiB); staged data-root:/home/user/vm-rnd/stage-dockerroot/(2.0 GiB, overlay2). - Static CNXN probe used for hop isolation:
/home/user/vm-rnd/adbprobe[.c]. - Serial logs:
/home/user/vm-rnd/stageB-boot*.log(boot1 = first full boot,--network none, success; boot12 = the per-hop ADB isolation; boot13/14 = DNAT attempts; smp{2,4,8} = the sweep).
B.7 Honest status / open items¶
- ✅ docker (containerd + dockerd) comes up in-guest in ~3 s total (baked image).
- ✅ redroid Android 11 reaches
sys.boot_completed=1reproducibly (~100 s). - ✅ adbd reachable on TCP inside the container (IPv4 and IPv6 CNXN verified).
- ✅ SMP/MTTCG payoff quantified (
-smp 4optimal on a 4-core host). - ✅ Host→guest ADB end-to-end CONFIRMED in Stage C (§C below). From the
host:
adb connect localhost:5555→device;adb shell getprop sys.boot_completed→1. The socat netns-bridge relay was hardened (no kernel rebuild needed). docker's own-p/--network hostpaths remain ruled out on this kernel (docker-proxy reset / netd crash-loop) — documented above.
Micro-VM R&D log — Stage C (host→guest ADB end-to-end, proven)¶
Status: Stage C validated (2026-06-15) — real adb session from host to guest redroid
The one open Stage B item — making the host's adb connect localhost:<port>
actually reach redroid's adbd — is closed. From the host, against the
QEMU hostfwd port, adb shell getprop sys.boot_completed returns 1,
ro.product.cpu.abi is x86_64, and uname -a reports the guest's Linux
6.12.9 kernel. Two fixes turned the "connects-but-no-bytes" relay into a
working end-to-end path; neither needed a kernel rebuild.
C.1 The two real blockers (and why the prior diagnosis missed them)¶
Stage B left the relay "connected but did not relay bytes." Driving it to a working session uncovered two independent faults, plus one measurement error that had masked them:
-
socat
EXEC:chokes on commas (the relay never actually ran). The Stage B relay passed the inner command inline:EXEC:"nsenter -t PID -n socat - TCP6:[::1]:5555". socat parses theEXEC:address's argument with commas as option separators, so it died withEXEC: wrong number of parameters (3 instead of 1)— the listener was up but every accepted connection's child failed instantly. Fix: write the inner command to a parameterless wrapper script (/run/adb-relay-inner.sh) andEXEC:that. (Also: the script must live in a dir that exists — the first attempt wrote to/usr/local/bin, absent from the rootfs, so the heredoc silently failed and EXEC hit "No such file or directory". Use/run, a tmpfs we mount unconditionally, andmkdirbefore the heredoc.) -
The guest
eth0was never brought up — the QEMU hostfwd had nowhere to land. This was the true last-hop blocker. QEMU user-nethostfwddelivers host traffic to the guest's eth0 address (the SLIRP default10.0.2.15), not to guest loopback.guest-init.shonly ranip link set lo up, soeth0stayed down and every host SYN was silently dropped — even though the outer socat was listening on0.0.0.0:5555and the in-guest relay was perfectly healthy. Fix: bring upeth0with the user-net address inmount_pseudo_filesystems:
- Measurement error that hid both: the in-guest self-probe used `socat - TCP:127.0.0.1:5555