Szymon Kaliski

Home Server on NixOS, Sandboxing in MicroVMs, and Feedback Loops for LLMs

Hi!

A pretty nerdy update this time around — I replaced my Raspberry Pi 4 with a Mini PC running NixOS, I started using MicroVMs for sandboxing on said Mini PC, I have some thoughts about feedback loops for LLMs, and a couple of side-project updates.


If you missed the last issue, I'm an independent consultant now, currently exploring at Google Creative Lab ↗.

I currently have some capacity for advising, and I'm always happy to discuss future projects!

Reach out if you're interested in working together: hi@szymonkaliski.com


Home Server on NixOS

I've written about my smart home setup ↗ before. Back then, the whole thing was controlled by a Raspberry Pi 4, which recently started to feel a bit constraining — on one hand I had an always-on server in my home, on the other, I couldn't really do much with it.

The issues started with a slowly degrading SD card — this isn't that big of a deal in itself — I could just clone my stuff onto a new one, but I haven't updated Raspbian since I flashed it originally, so by now I couldn't really install anything new.

At the same time I was afraid of how problematic it would be to set everything back up from scratch if I update. I also got somewhat spoiled by running Nix on my Macbook, it's just way too nice to go back to a normal way of doing things, and while I could put NixOS on my Raspberry Pi, it felt like asking for extra troubleshooting.

I was on the fence about upgrading for weeks, but the rising RAM prices (and conversations with a friend ↗) pushed me over the edge. I got myself a N350 in a chunky passively-cooled aluminium case ↗:

At first I thought this would be overkill, but once I started thinking about all the things I could do there, I was pretty happy I went with the beefy option.

NixOS installation was surprisingly easy. I went with the graphical installer ↗ booted from a USB stick, clicked through a couple of screens, selected "no thank you" for the final graphical environment, and was ready to go.

The first step of the setup was a fun bootstrapping problem. I already have my dotfiles online ↗, with a simple script which — in theory — sets everything up, but then it turned out git is not in the NixOS shell by default. I ended up having to write this fun command:

nix --extra-experimental-features "nix-command flakes" \
  run nixpkgs#git -- clone https://github.com/szymonkaliski/home-configuration.git

...and then went through a cycle of running ./setup.sh, failing, fixing what's failing, and doing it again. This took a couple of attempts, but now I can be sure I can stand up a new machine in a single bash script, which makes me much less worried about upgrading the hardware at some point in the future.

The declarative and git-tracked configuration of the whole machine is really great. There's no need to sudoedit random files in /etc/, and note down (or fail to do so) what I did. Every change can be easily rolled back, and the system is, in a sense, stateless — which makes experimentation super "cheap" — I can try out a new service for a while, and if I don't like it, remove a couple of lines of nix and rebuild the system. LLMs are pretty good at writing these configs too, so most of my interaction with the machine is now like "hey LLM can you make sure this script runs every hour?"

MicroVMs

NixOS enables another cool trick — microvm.nix ↗ — a way to easily manage high-performance VMs (which are much better at isolation from the host OS than Docker).

My config creates four small ephemeral VMs with bare-bones user environment — just enough so direnv can automatically evaluate flake.nix, and I can safely --dangerously-skip-permissions. This allows me to let the Agents run free, even when my laptop is closed, and not worry about anything breaking.

I can also run the same project multiple times in parallel! There's no messing around with ports, setting up multiple DB instances, etc. — the same configuration with the same npm run dev, just in a full-on separate virtual machine.

I use a small microvm script which allows me to do things like cd some-project && microvm start which picks up the first free VM, mounts the current directory into /workspace, starts and ssh-es into the machine, and enters /workspace so I'm ready to go (in about 15s).

Thanks to Tailscale ↗, I can open http://vm-1:3000 wherever I am, and that will get me not only into my Mini PC, but also into the VM running inside of it.

Feedback Loops for LLMs

The "Agents running free" leads me to a game I started playing recently: what kind of ad-hoc tools and feedback loops can I build for the task-at-hand, so the model has a very concrete thing to iterate against, easy visibility into its state, and a clear success criteria. Some examples below.

MirrorState has been effective at letting the models reason about the application state — any sort of complex Chrome/Playwright MCP calls are replaced with reading from a single JSON blob (this assumes you have the page open in a browser, so it's a more hands-on workflow). It takes a bit of AGENT.md massaging to explain that the state is bi-directionally synced with a file, but then I have a pretty good time with the model using it as a way to "see" and "act," especially if I get myself into a convoluted state while using the app we're working on.

Another spin on this idea is making a separate /playground page with random bits of debug UI, and letting the model see it, and modify as it goes. Usually it takes a bit of extra encouragement to also put in JSON.stringify-ied blobs of data in there, but once a correct "point of view" on the problem is created, the model usually has everything it needs to figure it out.

For example here I was iterating on some tooltip positioning, where it has to take into account its own size, the selection (dashed line), and the size of the viewport:

I'm somewhat confident I could figure out the conditions myself while the model was getting stuck, but it was fun to figure out how just seeing all the possible states at once made it work through the issues. This is interestingly related to Solving Things Visually — seeing the problem correctly can make it way easier to solve, and sometimes the seeing is as simple as throwing in some extra debug UI (which with LLMs is basically "free").

Similarly, figuring out how to "partition" the application into smaller chunks of functionality can be very useful. For example, I sometimes create a separate sub-project where we develop a library with a stable API which the main project then imports, so the model on the main one doesn't get bogged down in the details of some complex but well-known thing.

When building my own Agents for various things, I had very good results by creating a small CLI where each of the sub-tools could be executed by the model that's helping me. This way I can judge the quality of the responses, and iterate on them with the model, without having to go through the whole application loop, which sped up the work on, say, a "summarization" or "data extraction" tool. I let the model run the CLI wrapper, we both look at the responses, I say I'd prefer them to be a bit different, and we continue.

Finally, creating ad-hoc MCP tools for the thing we're working on is always an option. For web-based things this usually makes no sense with MirrorState/Chrome/Playwright MCP already existing, but as I was working on React-Native code, it was useful to let the model click around and take screenshots of the application to test and debug it.

Smaller Explorations

I continued exploring some of the "Codegen Environments" ideas, mainly, adding a way to combine smaller tools into larger wholes.

A box can be drawn around multiple elements, which hides the arrows (treated as "implementation details"), and optionally any extra interface elements created to solve a given problem:

The "boxed" tools can still be connected with each-other using arrows — which can point to either the box as a whole, or a single interface element:

This reminds me of some thoughts from Crosscut on "Encapsulation without lamination" ↗. Combining things and exposing only a subset of the internals is incredibly powerful, but seems to always be done in a way where "the whole" becomes the only thing you can interact with.

This also opens up an interesting philosophical question — the act of "boxing" could easily be fractal, so what then is interface and what is implementation? The tools seem to morph from one to the other, or maybe hold a superposition of both, depending on which context they are in...

I also experimented with a pen-based interface, inspired by explorations from Alex Chen ↗:

(check out the Bluesky thread for more videos ↗)

My version, instead of Nano Banana, uses the Flash 3 model with no thinking budget so it responds very quickly, even if often "weirdly". I send the model a composite of my scribbles and the model's previous outputs. As a response, the model generates javascript canvas code, which is added on top of the canvas, and we continue.

The feeling of using this is quite interesting, the combination of human input, machine-styled responses ("Computer Voice" ↗), and quick generation is something I have not seen before in this space. This, of course, smells of Programmable Ink — it's also very fun, but of unclear usefulness at the moment — let me know if you have any ideas where to take this further!

Worth Checking Out

What I've been reading lately:

On the web: