Security Hardening

Production Security Checklist

Before going live, verify every item:

Secrets & Encryption

All secrets generated with openssl rand -hex 32 (not default values)
.env.prod file permissions set to 600 (owner-only)
.env.prod is in .gitignore and never committed
JWT_SECRET is at least 32 characters
Separate encryption keys for APP_ENCRYPTION_KEY, MFA_ENCRYPTION_KEY
AGENT_ENROLLMENT_SECRET rotated after initial enrollment batch

Network

Only ports 80/443 (and optionally 3478 for TURN) exposed publicly
PostgreSQL bound to 127.0.0.1 (not 0.0.0.0)
Redis bound to 127.0.0.1 (not 0.0.0.0)
Redis password authentication enabled via REDIS_PASSWORD (set in docker-compose and included in REDIS_URL)
Grafana/Prometheus accessible only via localhost or VPN
SSH key-only authentication (no password auth)
UFW or iptables configured

TLS

Caddy auto-TLS configured with valid domain and ACME email
HSTS header enabled with includeSubDomains; preload
No self-signed certificates in production

Container Security

no-new-privileges: true on all containers (default in prod compose)
cap_drop: ALL on all containers
API and Web containers run with read_only: true rootfs
Resource limits (cpus, mem_limit, pids_limit) set
Non-root container users (UID 1001)

Authentication

MFA (TOTP) enabled for all admin accounts
Roles that should require MFA have Force MFA turned on. Users in a force-MFA role get a 428 Precondition Required response until they enroll a TOTP device; the dashboard then redirects them through a forced-enrollment page before any other workflow becomes available.
Registration disabled in production (ENABLE_REGISTRATION=false) after initial setup
Rate limiting active on login endpoints
Session timeout configured (SESSION_MAX_AGE)
Session revocation is fail-closed — revoked sessions stay revoked even if Redis is unavailable
Refresh tokens use family-based reuse detection — replaying a previously rotated refresh token immediately revokes every other token in that family, log out included.

Agent Security

Agent tokens stored as SHA-256 hashes (automatic for new enrollments)
Agent token rotation tested (POST /agents/:id/rotate-token) — both old and new tokens are valid for a 5-minute grace period, and the agent picks up the new token on its next heartbeat with no downtime
Config file permissions: 0750 for /etc/breeze/, 0640 for agent.yaml, 0600 for secrets.yaml
Agent rate limiting enabled (120 req/60s per agent via Redis)
Enrollment keys set with expiry and usage limits
Cross-tenant probe detection enabled — if an agent token is used to access a device in another tenant, the token is automatically suspended and re-enrollment is blocked until an admin reviews the device.
Source-IP tracking active — every heartbeat records the agent’s source IP, and an agent.source.ip.changed audit event fires when it shifts, surfacing token theft or NAT changes.
Consider enabling Cloudflare mTLS for zero-trust agent auth

Outbound Request Safety (SSRF)

Outbound integrations (webhooks, DNS providers, SSO discovery) flow through the platform’s SSRF guard, which blocks private/loopback ranges and cloud metadata hostnames unless an explicit allowlist entry permits them.
partners.settings and sites.settings columns are AES-256-GCM encrypted at rest — secrets stored here (provider credentials, integration tokens) never leave the database in plaintext.

Monitoring

Prometheus metrics endpoint protected with bearer token
Alert rules configured for error rates and infrastructure
Audit logging enabled (automatic for all mutating operations)
Log aggregation configured (Loki)

Firewall Configuration

# UFW example
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Only if using TURN for WebRTC:
# sudo ufw allow 3478/tcp
# sudo ufw allow 3478/udp
sudo ufw enable

Audit Logging

All mutating operations are automatically logged with:

Field	Description
`actorType`	`user`, `api_key`, `agent`, or `system`
`actorId`	User ID or device ID
`action`	Operation performed
`resource`	Target resource type
`resourceId`	Target resource ID
`details`	JSON payload of changes
`ipAddress`	Client IP address
`timestamp`	ISO 8601 timestamp
`checksum`	SHA-256 of the canonical row payload
`prev_checksum`	Checksum of the previous row in this organization’s chain

Tamper evidence

The audit_log table is append-only at the database level. Database triggers refuse UPDATE, DELETE, and TRUNCATE operations against audit rows — not even a superuser can quietly edit history. Each row also carries a prev_checksum that links to the previous audit row in the same organization, producing a per-org SHA-256 hash chain. Verifying the chain end-to-end detects any insertion, deletion, or alteration between two timestamps.

Retention pruning is the one legitimate path that removes audit rows. It requires both the breeze_audit_admin Postgres role and the breeze.allow_audit_retention='1' session GUC; pruning re-anchors the chain on the surviving rows so the integrity check still passes after old data ages out. Both controls are managed by the platform’s audit retention worker — operators do not run pruning by hand.

Query audit logs via the API:

curl -H "Authorization: Bearer $TOKEN" \
  "https://breeze.yourdomain.com/api/v1/audit?resource=devices&action=delete"

Rate Limiting

Breeze implements Redis-backed sliding window rate limiting:

Endpoint	Limit	Window
Login	5 attempts	5 minutes
API (per user)	100 requests	60 seconds
Agent (per device)	120 requests	60 seconds
Agent (per organization)	600 requests	60 seconds
Enrollment	10 attempts	60 seconds

The per-organization agent limit is configurable via AGENT_ORG_RATE_LIMIT_PER_MIN and caps total fleet traffic for any single tenant. When exceeded, the API returns 429 with Retry-After: 60; agents respect this header and back off automatically.