Complete system design: agent runtime, automation, memory, monitoring, and file serving
The OpenClaw infrastructure is built around five core concerns: inference (AI models), automation (workflows), memory (continuity), monitoring (observability), and file serving (asset delivery). All components run in Docker containers on a single VPS with proper network isolation.
Internet users connect via HTTPS. All traffic is encrypted and routed through a single entry point: Nginx reverse proxy.
Single reverse proxy with 4 virtual hosts (subdomains). Routes traffic to: OpenClaw Gateway, n8n Automation, Conductor Dashboard, File Server.
Three primary workloads: the agent runtime (Gateway), automation engine (n8n), and observability dashboard (Conductor). Each connects to PostgREST.
Single PostgreSQL instance. All agent state, memory, summaries, and metrics are stored here. PostgREST exposes it via REST API.
Local LLM (Llama 3.2) running in another container. Used by agents for embeddings and reasoning. No external API dependency.
Agents write to a shared workspace volume. A dedicated read-only file server (Nginx) exposes files via HTTPS for instant access via URL.
The Conductor Dashboard is the observability layer. It collects data from three sources (Gateway events, JSONL logs, PostgreSQL queries) and presents a unified view of agent activity, health metrics, task queue, and system memory.
Live stream of agent messages, tool calls, and responses. Shows timestamp, user, channel, and full message content. Searchable and filterable.
Real-time system status: Gateway uptime, PostgreSQL connection pool, token usage, error rate. Trends over the last 24 hours with alerts.
Kanban-style task queue. Agent-generated TODO items, PRs to review, research items. Drag-drop between Backlog, In Progress, Done.
Deep dive into agent memory. View HOT rows in memory_kv, session summaries, conversation history, and knowledge base documents.
Scheduled task history. Full-backup status, log rotation, TTL cleanup. Shows last run time, next run time, and success/failure.
S3 backup status, restore options, backup history. Automated daily full backups. Incremental snapshots available.
Poller listens on OpenClaw Gateway for messages, tool calls, and agent state changes in real-time.
Reader tails the JSONL session logs (one per agent session) and extracts structured session data.
Engine executes time-series queries on PostgreSQL (metrics, tasks, health) and formats results for dashboard display.
Results are rendered in the web dashboard: activity feed, health chart, task list, memory browser.
Every 30 minutes, the Gateway fires a heartbeat—a proactive maintenance cycle that flushes memory, archives logs, commits changes to Git, and checks system health. This keeps context window bounded and ensures data durability.
Sessions older than 30 days are archived. HOT rows in memory_kv (above size threshold) are pruned. Recent summaries are cached.
Active session summary is written to PostgreSQL (conversation_summaries table) with 30-day TTL. Used for cross-session recall.
JSONL logs older than N days are compressed and archived to S3. Recent logs stay in workspace for fast access.
All workspace changes (diagrams, docs, scripts, configs) are staged and committed to the main repo. Push to origin/main.
Verify all critical services are running: PostgreSQL, Redis, n8n, Ollama. Log metrics: CPU, memory, disk, network I/O.
Remove temporary files, vacuum PostgreSQL, prune Docker images, rotate nginx logs. Keep disk usage below 80%.
Memory rows are upserted (with TTL), session summaries inserted, cleanup queries executed (vacuuming, index maintenance).
Git commit writes to .git/, log rotation writes to logs/ archive, session dumps to memory/YYYY-MM-DD.md.
Compressed old logs uploaded to s3://openclaw-logs-archive/. Backups synced to s3://openclaw-backups/.
The agent writes files to the workspace volume (diagrams, research documents, images, etc.). A dedicated read-only file server exposes these at HTTPS, making them instantly accessible via URL. No upload, no transfer—just a shared volume.
/diagrams/ — Mermaid/Graphviz SVG + PNG renders of architecture, workflows, data models, decision trees.
/research/ — Deep-research markdown exports, PDF summaries, literature reviews, curated resources.
/slides/ — Presentation HTML (Keynote-style dark theme), JSON slide definitions, speaker notes.
/images/ — Generated images (DALL-E, Replicate), screenshots, profile pictures, brand assets.
/docs/ — Auto-generated docs (API reference, schema diagrams), README renders, checklists.
/video/ — Thumbnail previews, metadata files, caption SRT files, streaming manifests.
All files are served over HTTPS via files.nerdbox.com:
https://files.nerdbox.com/diagrams/openclaw-infrastructure-architecture.html
https://files.nerdbox.com/research/my-research-topic.md
https://files.nerdbox.com/slides/presentation.html
https://files.nerdbox.com/images/generated-image.png
Documentation, diagrams, presentations. Accessible to anyone. Use this for public-facing content.
Research notes, drafts, personal documents. Stored in workspace but NOT served via file server.
File server runs read-only. No upload capability. Auth layer handled by reverse proxy (if needed).
Security is built into every layer: network isolation, authentication, rate limiting, audit logging, and secret management.
Single entry point. HTTPS/TLS enforced. Rate limiting, DDoS protection, bot detection. Nginx ModSecurity optional.
All services on internal bridge network. No direct internet access except through Nginx. SSH/telnet not exposed.
Host firewall: SSH (22), HTTP (80), HTTPS (443) only. All other ports blocked. Fail2ban for brute-force protection.
Workspace volume mounted read-write to agent, read-only to file server. Database volume encrypted. Backups encrypted at rest.
PostgREST uses JWT. All HTTP requests include Authorization header. Token expiry: 24 hours. Rotation on login.
PostgreSQL user/password auth. Separate roles for services (readonly, readwrite). No direct SQL access from agents.
Internal services auth via private network. n8n ↔ Gateway via internal webhook. PostgREST ↔ Database via role-based access.
Dashboard admin login. SSH key-based (no password). Sudo for privileged ops. All admin actions logged to audit table.
PostgreSQL database volumes encrypted with LUKS2. S3 backups encrypted with AWS KMS. Secrets stored in .env (not in git).
All HTTP traffic upgraded to HTTPS. TLS 1.2+. Certificates via Let's Encrypt (auto-renewed). Internal traffic unencrypted (trusted network).
.env file (not committed). Loaded at container startup. Secrets: GitHub token, OpenAI API key, pCloud token, database password.
All agent actions logged: messages sent, files written, queries executed, external API calls. Stored in PostgreSQL audit_log table. 90-day retention.
Daily full backup (2 AM UTC). Incremental snapshots every 6 hours. Retention: 30 days full backups, 7 days incremental. Off-site (S3).
Restore from backup tar: uncompress, overlay workspace volume, restart containers. Full restore time: <5 minutes. RTO <5min, RPO <6h.
All code, docs, configs in Git. Heartbeat auto-commits changes hourly. Easy rollback to any commit. GitHub private repo.
Health checks every 5 min. Alerts on: service down, disk >80%, memory >85%, error rate >5%, backup failure.
Each service gets only the permissions it needs. Database roles are granular. File permissions: 644 (public), 600 (private).
Multiple layers: firewall, rate limiting, input validation, SQL prepared statements, CSRF tokens, CSP headers.
Docker images pinned to specific versions. Dependencies reviewed before pull. Source code on GitHub is version-controlled. Any unauthorized change is auditable.