Goblin Mode, or spinning up VMs for feral agents

I’ve been working on a new tool for spinning up development environments. It’s called Goblin Mode. It spins up a new virtual machine (VM) on Hetzner and configures everything to be ready so that you can just SSH in and that start developing.

It is something that I’ve wanted to have for a while now, but the arrival of the coding agents got me to implement it.

The agents like Claude Code and Codex are excellent. I’d like to run them in the YOLO mode / --dangerously-skip-permissions where they do not ask you for a permission to run commands, write files, search web, and so on.

However, my laptop has too much ambient authority for that. I’m logged in to all kinds of services and I’ve got a lot of important data there. I don’t want an agent to mess with it. A potential – but by no means the only – solution is to use a separate VM with just enough authority to get the job done. The agents can then go goblin mode there.

I created an alpha version of the tool and I’ve been using for a while. It’s too specific to me and my workflows to be published right now, but it’s still interesting enough to talk about.

How does it work?

I had two goals:

The environment should not need any manual setup. Running gob up should be all you need to receive an environment that works.
It should be cheap to run. By “cheap” I mean something like a few euros per month for my typical use. For commercial development, much higher price would be acceptable, but I want to use this for my hobby projects, too.

I ended up with a command-line tool implement in Rust. When you run gob up, it does a few things:

It spins up a project-specific VM on Hetzner. Hetzner’s CX43 instances are available for less than two cents per hour and they’re beefy enough for compiling Rust projects. This is cheap enough for me to not to have to think about it unless I leave the box running all the time.

It connects the box to Tailscale. Tailscale is a VPN solution that is really nice to use. Thanks to its MagicDNS, the box gets an easy-to-remember hostname that I can SSH to. For the projects that launch network servers, Goblin Mode uses Tailscale Serve to expose the service over HTTPS. This great for developing web applications.

It install the toolchain. Goblin Mode detects whether the project is a Python project or Rust project (my two main programming languages) and installs the appropriate tools. Some tools like git get always installed, and you can add extra packages via a config file.

This is implemented with cloud-init. It’s a standard YAML config file for newly provisioned servers. Many cloud providers, Hetzner included, support it. Goblin Mode generates a list of packages to install and commands to run and adds them to cloud-init when launching the VM.

What is a bit annoying about this is that there are so many ways to install packages. I run Debian and you can get many but not all packages via apt. Some packages, like Claude Code, are best installed by curl | bash and for others I resort to cargo-binstall. It would be nice to have a single solution - is Nix the answer?

It sets up dotfiles. Can you imagine doing anything without your custom zsh config? Me neither.

In practice this is implemented by cloning your dotfiles git repository on the VM and running the included installation script in it.

It sets up the git repo. You’re expected to run gob up in the git repository for the project you’re developing. Goblin Mode sets up a new repo on the VM, pushes the contents of the local repo there and copies the configuration for remotes over.

This has been enough for my projects right now. I’ve added a few convenience commands, too:

gob mosh to mosh to the VM - in practice I always use mosh instead of SSH
gob zed to open the project on the VM in Zed via its remote development support.

Problems

Spinning up VMs is slow. gob up takes something like two minutes. It’s a bit annoying - you think “alright, I’m going to work a bit on project X” and then you have to wait for a few minutes while the server boots up.

I’m not sure if there’s any great solution to this that does not involve running more capacity than you need.

Also, while Hetzner is great, you aren’t guaranteed to get a cheap cloud VM when you need it. I’ve seen a bunch of these:

Error: Failed to create server (412): {
 "error": {
  "code": "resource_unavailable",
  "message": "error during placement",
  "details": {}
 }
}

Logging in to Claude Code is annoying. If you want to use Claude Code, you’ll have to go through the browser-based flow to log in. I’m not sure if this could be automated. Likely yes if you use the API; not sure if you use a Claude subscription.

Restricting GitHub permissions is annoying. GitHub does not make it too easy to grant just the right amount of permissions to the development VM. Ideally the VM would be able to check out the project, work with the issues and create PRs, but not merge to main. With fine-grained personal account tokens (PATs) and bot accounts this should be possible, assuming you have a paid-for GitHub plan, but you can’t create PATs via GitHub’s API.

For now my solution has been to use a self-hosted Forgejo instead of GitHub. It’s not ideal, but it has enabled me to experiment without fear.

Product management is still needed. I had a bright idea that I’ll just dump my improvement ideas into Forgejo issues and let the agents work on them. I created a script that loops through the issues and prompts an agent to solve each one and to create a PR.

A few hours of later I had a dozen of PRs and 5k lines of Rust to review. The PRs were good but I quickly realized that my ideas were half-assed. They were for nice to haves that could be great one day but that I did not yet need. I didn’t want to maintain the code, so I left most of the PRs unmerged.

So, yeah, you still need to have a proper vision.

Successes

I’ve enjoyed the tool more than I expected. It’s nice when you can spin up a new environment with one command and everything actually works (modulo Claude Code auth). It has also had a couple of benefits that I didn’t expect:

If you want to demo a web service to your friends, you can expose it to Internet with Tailscale Funnel. Start the service in tmux, leave the development VM running, close your laptop, and your friends can still access it. It’s not a great production setup, but it’s excellent for trying things out.
If you need more capacity, provision a bigger VM. My laptop has paltry 16 GB of RAM and I needed more for some data crunching, so I launched a big VM and got the job done.

I know of at least four projects with similar goals:

devcontainers use Docker containers to create an ephemeral dev environment locally.
gmab (Give Me A Box) is a CLI tool that spins us ephemeral cloud boxes
exe.dev is a neat commercial service that gives you virtual machines for development. My project is sort of “we have exe.dev at home”.
Sprites by Fly.io is another commercial take on the same idea.

How does it work?

Problems

Successes

Related work