Security Hardening
Production Security Checklist
Section titled “Production Security Checklist”Before going live, verify every item:
Secrets & Encryption
Section titled “Secrets & Encryption”- All secrets generated with
openssl rand -hex 32(not default values) -
.env.prodfile permissions set to600(owner-only) -
.env.prodis in.gitignoreand never committed -
JWT_SECRETis at least 32 characters - Separate encryption keys for
APP_ENCRYPTION_KEY,MFA_ENCRYPTION_KEY -
AGENT_ENROLLMENT_SECRETrotated after initial enrollment batch
Network
Section titled “Network”- Only ports 80/443 (and optionally 3478 for TURN) exposed publicly
- PostgreSQL bound to
127.0.0.1(not0.0.0.0) - Redis bound to
127.0.0.1(not0.0.0.0) - Redis password authentication enabled via
REDIS_PASSWORD(set in docker-compose and included inREDIS_URL) - Grafana/Prometheus accessible only via localhost or VPN
- SSH key-only authentication (no password auth)
- UFW or iptables configured
- Caddy auto-TLS configured with valid domain and ACME email
- HSTS header enabled with
includeSubDomains; preload - No self-signed certificates in production
Container Security
Section titled “Container Security”-
no-new-privileges: trueon all containers (default in prod compose) -
cap_drop: ALLon all containers - API and Web containers run with
read_only: truerootfs - Resource limits (
cpus,mem_limit,pids_limit) set - Non-root container users (UID 1001)
Authentication
Section titled “Authentication”- MFA (TOTP) enabled for all admin accounts
- Roles that should require MFA have Force MFA turned on. Users in a force-MFA role get a
428 Precondition Requiredresponse until they enroll a TOTP device; the dashboard then redirects them through a forced-enrollment page before any other workflow becomes available. - Registration disabled in production (
ENABLE_REGISTRATION=false) after initial setup - Rate limiting active on login endpoints
- Session timeout configured (
SESSION_MAX_AGE) - Session revocation is fail-closed — revoked sessions stay revoked even if Redis is unavailable
- Refresh tokens use family-based reuse detection — replaying a previously rotated refresh token immediately revokes every other token in that family, log out included.
Agent Security
Section titled “Agent Security”- Agent tokens stored as SHA-256 hashes (automatic for new enrollments)
- Agent token rotation tested (
POST /agents/:id/rotate-token) — both old and new tokens are valid for a 5-minute grace period, and the agent picks up the new token on its next heartbeat with no downtime - Config file permissions:
0750for/etc/breeze/,0640foragent.yaml,0600forsecrets.yaml - Agent rate limiting enabled (120 req/60s per agent via Redis)
- Enrollment keys set with expiry and usage limits
- Cross-tenant probe detection enabled — if an agent token is used to access a device in another tenant, the token is automatically suspended and re-enrollment is blocked until an admin reviews the device.
- Source-IP tracking active — every heartbeat records the agent’s source IP, and an
agent.source.ip.changedaudit event fires when it shifts, surfacing token theft or NAT changes. - Consider enabling Cloudflare mTLS for zero-trust agent auth
Outbound Request Safety (SSRF)
Section titled “Outbound Request Safety (SSRF)”- Outbound integrations (webhooks, DNS providers, SSO discovery) flow through the platform’s SSRF guard, which blocks private/loopback ranges and cloud metadata hostnames unless an explicit allowlist entry permits them.
-
partners.settingsandsites.settingscolumns are AES-256-GCM encrypted at rest — secrets stored here (provider credentials, integration tokens) never leave the database in plaintext.
Monitoring
Section titled “Monitoring”- Prometheus metrics endpoint protected with bearer token
- Alert rules configured for error rates and infrastructure
- Audit logging enabled (automatic for all mutating operations)
- Log aggregation configured (Loki)
Firewall Configuration
Section titled “Firewall Configuration”# UFW examplesudo ufw default deny incomingsudo ufw default allow outgoingsudo ufw allow sshsudo ufw allow 80/tcpsudo ufw allow 443/tcp# Only if using TURN for WebRTC:# sudo ufw allow 3478/tcp# sudo ufw allow 3478/udpsudo ufw enableAudit Logging
Section titled “Audit Logging”All mutating operations are automatically logged with:
| Field | Description |
|---|---|
actorType | user, api_key, agent, or system |
actorId | User ID or device ID |
action | Operation performed |
resource | Target resource type |
resourceId | Target resource ID |
details | JSON payload of changes |
ipAddress | Client IP address |
timestamp | ISO 8601 timestamp |
checksum | SHA-256 of the canonical row payload |
prev_checksum | Checksum of the previous row in this organization’s chain |
Tamper evidence
Section titled “Tamper evidence”The audit_log table is append-only at the database level. Database triggers refuse UPDATE, DELETE, and TRUNCATE operations against audit rows — not even a superuser can quietly edit history. Each row also carries a prev_checksum that links to the previous audit row in the same organization, producing a per-org SHA-256 hash chain. Verifying the chain end-to-end detects any insertion, deletion, or alteration between two timestamps.
Retention pruning is the one legitimate path that removes audit rows. It requires both the breeze_audit_admin Postgres role and the breeze.allow_audit_retention='1' session GUC; pruning re-anchors the chain on the surviving rows so the integrity check still passes after old data ages out. Both controls are managed by the platform’s audit retention worker — operators do not run pruning by hand.
Query audit logs via the API:
curl -H "Authorization: Bearer $TOKEN" \ "https://breeze.yourdomain.com/api/v1/audit?resource=devices&action=delete"Rate Limiting
Section titled “Rate Limiting”Breeze implements Redis-backed sliding window rate limiting:
| Endpoint | Limit | Window |
|---|---|---|
| Login | 5 attempts | 5 minutes |
| API (per user) | 100 requests | 60 seconds |
| Agent (per device) | 120 requests | 60 seconds |
| Agent (per organization) | 600 requests | 60 seconds |
| Enrollment | 10 attempts | 60 seconds |
The per-organization agent limit is configurable via AGENT_ORG_RATE_LIMIT_PER_MIN and caps total fleet traffic for any single tenant. When exceeded, the API returns 429 with Retry-After: 60; agents respect this header and back off automatically.