Swarm Messaging System
Swarm Messaging System
LLMs should function as a universal computer interface. Not just to single computers, but entire distributed systems. If given sudo and SSH access, Claude or other frontier LLMs could function as a system administrator and maintain 100+ computers.
For security, the controller should run on a single machine with a touch-disabled hardware security key that stores the SSH key. Ideally local models like DeepSeek or Qwen are used for control-layer commands, escalating to frontier models when a problem is too complex to solve.
Since programming is often a loop of editing, building, running, and inspecting, you could also develop dozens of different software projects simultaneously. The main expected difficulties are 1. Visibility into a complex system and 2. UX for guiding agents when they inevitably get stuck.
The intent for this design is an interface, not an automated system. A human is expected to be available for the system to escalate errors to, and to monitor agents.
An agentic solution like Claude Code can often edit for 20-30 minutes by itself. But it may:
- Leave unnecessary test files, which need to be cleaned up.
- “Reward-hack” automated tests, which need to be reviewed.
- Be responsible for a small fragment of a much larger plan, that a higher-level agent is responsible for executing.
This system wraps the tool to have a networked Port. Humans, chat tools, LLM agents, and workflow systems can all communicate with this port. So instead of a single developer running a dozen tmux sessions, you could run Claude from a dozen Slack channels and bring friends or coworkers as needed to collaborate.
LLM agents also need to use services like issue trackers, wikis, git repositories, etc. Protocols like MCP or OpenAPI already give a general interface for any specific agent, but the Swarm Messaging System should allow agents to discover useful services themselves as needed.
The remainder of this document is an implementation plan for a user interface intended to be a prototype of this idea.
Terminology
A Domain is a collection of Hosts, each exposing Ports, having Host Identity, and belonging to a Subnet. A Port(α) carries Messages of type α. Ports have Direction and Port Identity. Direction is one of Send-only, or Receive-only. Identity is an integer. A Duplex Port(αi,αo) is a pair of ports of type αi and αo sharing the same identity. αi is receive-only, and αo is send-only. A port should use one of the standard nanomsg communication patterns:
PAIR - one-to-one communication
BUS - many-to-many communication
REQREP - requests are sequenced and paired with responses
PUBSUB - messages are replicated to all subscribers
PIPELINE - load-balances messages from many sources to many destinations
SURVEY - messages replicated to everyone, with responses collected
An Interface is a collection of ports. Two interfaces are Compatible if their corresponding ports are compatible(same type, compatible direction). A Domain Controller is a host which owns data that must be globally consistent: The registry of all dynamically-generated types, agent credentials, permissions & authorization. A Broadcast is a message sent to every host on the same network. A subnet can be Internal, Local, or Networked. An internal subnet does not use OS networking at all; a local subnet uses a single process that binds to many localhost IPs; and a networked subnet uses internet routing & protocols. A host can also be internal(process-local), local(OS-local), or networked. Internal subnets do not use nanomsg - they use something language-specific.
Every host should expose a Discovery Port using SURVEY pattern. This lets agents dynamically query the local subnet topology.
A type can be Linear(cannot be replicated or dropped by the network): Demands and exclusive resources should never be duplicated, except in debugging contexts like tracing.
A type is Generable if it is semantically meaningful for anyone to construct a value. Ex: A username is generable, but an access token is not.
An RPC port is a REQREP port where the requests represent RPC calls.
ECS
A Bare Host or Entity is a host with no ports and no data - it only has identity. Entities can have Components added to them that include standard ports/data. Then Systems look for combinations of components and activate, handling messages, reacting to events, and creating new ports.
Common Types
- Message: Text message, with identity.
- ConversationEntry: Either a message, or tool call, or tool response.
- Event: Events can have before,
around, or after
modifiers applied by systems.
- Received Message
- Sent Message
- Component Field Changed
- Shell: A Shell is a duplex port
that receives Commands and sends values
of type
{exitCode :: Integer, stdout :: Text, stderr :: Text}
. A shell can run on a host machine, in a container, or be virtual.
Discovery System
Creates standard Discovery Port 5000
Returns:
+ Host(ip, pid, hostname, cwd)
+ List of components(and their data)
+ List of systems(active or suspended, component requirements)
+ List of ports(id + type)
Control System
Creates an entity Control Port.
Control commands:
+ AddComponent(name)
+ GetComponent(id)
+ SetComponentField(id, name, value)
+ CreateEvent(event)
+ EnableSystem(id)
+ DisableSystem(id)
+ Stop() -- Stop responding to events
+ Start() -- Start responding to events
+ Step() -- Handle exactly one event
Tracing
Creates a Trace Port that replicates every event out, except for itself. Used by the debugger or loggers.
RPC
Data: List of (Name, Signature, Function or function symbol)
Ports:
+ RPC
When a request is received, check the value against the signature, dispatch to the implementation, and respond with the result.
Error
Data:
+ pauseOnError: bool
Ports:
+ Error
Any exceptions or error events are broadcasted to the error port.
Timer
Data: + Frequency(seconds) + Offset
Generates a timer
event every so
often.
Switch
Data: List of static links(local port <-> remote (host,port))
When a message is received, check for a static link and forward the message to it.
LLM
Data: + Model Ports: + Message(Duplex)
Container
Data: + image + Status(Running, Stopped, Starting) RPC: + Start + Stop
Claude
Wraps Claude Code. Data: + git repository
There should be a standard container that installs Claude Code and has Github credentials and runs a wrapper service.
Port: + Duplex(Message, ConversationEntry)
LLM Agent
System: + Watches every event type and reacts to it. Tools: + ReplyMessage(message, response) + SendMessage(port, message) + Ignore()
Toolbox
Equivalent to RPC, but additionally made available to the LLM agent.
MCP Receiver
Data: + MCP SSE URL + MCP Server Cached Outputs
Ports: + MCP(receive-only)
Connects to an MCP server and downloads the available prompts, tools, resources, etc. LLM Agents can have all of this added to their context.
MCP Transmitter
Ports: + MCP(send-only)
Allows tools like Claude Desktop or Claude Code to connect to this component as an MCP server.
This port should be networked to be useful.
Debugger
An infinite canvas using React Flow that has all hosts and ports grouped by subnet and colored by port type. Create, destroy, inspect, message, trace, and single-step any host. The frontend can run a simulation of most components by itself. But Docker Containers or other host-specific features require a backend or live system.
Deployment
It should be possible to deploy this system to a Kubernetes cluster. There are a couple options:
Every entity becomes a separate container: The most direct interpretation. It works, but can be unwieldy.
Processes are combined into a single container: A process A with ports a,b,c and a process B with ports d,e,f can be run in a single container that exposes ports a,b,c,d,e,f.
Entities are “stacked” onto a single process as much as possible: A process A with ports a,b,c and a process B with ports d,e,f can be run in a single process that exposes ports a,b,c,d,e,f.
Entities are stacked into a single process using internal linking: A process A with ports a,b,c and a process B with ports d,e,f intend for a and d to be exposed externally. Otherwise, b is connected to e and c is connected to f. If A and B are using the same programming language, this linking can be done “internally” without going through OS networking.
For both development and deployment, it is preferable to use as little networking as possible.