Provisioning
Stand up cloud infrastructure end-to-end from a single wizard. ZopNight's provisioner service creates VPCs, Kubernetes clusters, VMs, datastores, registries, and load balancers across AWS, GCP, and Azure — using the same credentials you connected for discovery and scheduling. It also runs auto-remediation workflows triggered from a recommendation's Remediate button.
Where these routes live
Surfaces
/zopday. It shares authentication, RBAC, teams, and audit logs with the main ZopNight app — same login, same org, same role model.Two job kinds, one state machine
Internally the provisioner runs two job kinds on the same provisioning_jobs table:
provisioning— creates and tears down clusters, VMs, datastores, networks, registries, load balancers, and Helm components.remediation— orchestrates auto-remediation workflows kicked off from a recommendation. See Auto-Remediation Jobs below.
No OpenTofu, no remote agents — the service talks to cloud SDKs (AWS, GCP, Azure) directly, and to the target cluster via the Helm and Kubernetes Go SDKs. After a fresh cluster is up, the provisioner calls config's RegisterSpace gRPC so the cluster appears as a Deployment Space automatically, then calls the discoverer to kick off resource sync.
What you can provision
| Category | AWS | GCP | Azure |
|---|---|---|---|
| Kubernetes | EKS | GKE | AKS |
| VMs / compute | EC2 + ASG + ALB / NLB | Compute Engine + MIG + GCLB | VM Scale Sets + Azure LB |
| Datastores | RDS (MySQL / Postgres), ElastiCache (Redis) | Cloud SQL (MySQL / Postgres), Memorystore (Redis) | Database for MySQL / Postgres, Azure Cache for Redis |
| Registry | ECR | Artifact Registry | ACR |
| Networking | VPC, Subnets, VPC Peering | VPC, Subnets, DNS Zones | VNet, Subnets |
| In-cluster components | Helm-based installer for ingress, observability, cert-manager, and platform add-ons | ||
How it works
- Pick a space — choose a connected cloud account and a region. ZopNight discovers existing VPCs, subnets, and DNS zones in the background so the wizard can offer real options instead of free-text fields.
- Pick a resource type — cluster, VM, datastore, or component.
- Configure — instance size, version, networking, backup retention, etc. Sensible defaults are pre-filled.
- Review — ZopNight shows the IAM operations it will perform and the estimated monthly cost before any action runs.
- Provision — the provisioner applies the changes through cloud SDKs; progress streams to the UI; an audit log entry is recorded under the
provisioningproduct scope. On success, freshly created clusters auto-register as spaces and resources flow into Reports + Recommendations.
Credentials never leave the cloud
RBAC
Provisioning ships with four policy categories so platform teams can delegate provisioning without granting full tenancy admin:
| Policy | What it grants |
|---|---|
space:view / create / update / delete | Manage Deployment Spaces (cloud account + region pairs targeted for provisioning) |
provisioning:view / create / update / delete | Run, monitor, modify, and tear down provisioning jobs |
service:view / create / update / delete | Manage in-cluster Helm components attached to a space |
deployment:view / create / update / delete | Manage application deployments orchestrated through config + deployer |
Default System roles inherit these policies; custom roles need explicit grants. See Roles & Permissions for how to assemble them.
Provisioning jobs
/provisioning-jobsSubmit a new provisioning job (cluster, VM, datastore, network, or component) into a space.
curl -X POST https://zopnight.com/api/provisioning-jobs \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"spaceID": "spc_abc123",
"kind": "kubernetes",
"config": {
"name": "platform-prod",
"version": "1.30",
"nodeGroups": [
{ "name": "default", "instanceType": "m5.large", "min": 2, "max": 6 }
],
"vpcID": "vpc-0abc",
"subnetIDs": ["subnet-0a", "subnet-0b"]
}
}'curl -X POST https://zopnight.com/api/provisioning-jobs \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"spaceID": "spc_abc123",
"kind": "datastore",
"config": {
"engine": "postgres",
"version": "16.3",
"instanceClass": "db.t3.medium",
"allocatedStorageGB": 50,
"multiAZ": true,
"backupRetentionDays": 7
}
}'curl -X POST https://zopnight.com/api/provisioning-jobs \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"spaceID": "spc_abc123",
"kind": "vm_pool",
"config": {
"name": "checkout-edge",
"instanceType": "m5.large",
"min": 2,
"max": 6,
"createLoadBalancer": true,
"listenerPorts": [80, 443]
}
}'/provisioning-jobsList provisioning jobs in a space, with status.
/provisioning-jobs/{id}Get a job's full state — phase, per-step output, and any errors.
/provisioning-jobs/{id}Update lightweight job fields (name, tags) without re-running the pipeline.
/provisioning-jobs/{id}Tear down a previously created cluster, VM pool, datastore, or component.
/provisioning-jobs/{id}/retryRetry a failed job from the last successful step. Useful for transient failures (rate-limited cloud API, throttled subnet creation).
Databases under a job
Datastore jobs are the parent of one or more databases — logical schemas the application code will connect to. Each database has its own credentials and is provisioned by issuing CREATE DATABASE + CREATE USER against the parent server.
/provisioning-jobs/{jobID}/databasesCreate a database on the parent datastore.
curl -X POST "https://zopnight.com/api/provisioning-jobs/job_abc/databases" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{ "name": "checkout", "owner": "checkout_app" }'/provisioning-jobs/{jobID}/databasesList databases under a job.
/provisioning-jobs/{jobID}/databases/{dbID}Get a database with its connection metadata (server, port, secret reference).
/provisioning-jobs/{jobID}/databases/{dbID}Drop the database and revoke its user.
Connections
A connection is a stored set of credentials (host, port, user, secret reference) an application fetches at deploy time. Connections live either at job scope (one set of credentials shared across all databases on the server) or at database scope (one set per database).
Job-scoped
/provisioning-jobs/{jobID}/connectionsCreate a job-scoped connection.
/provisioning-jobs/{jobID}/connectionsList job-scoped connections.
/provisioning-jobs/{jobID}/connections/{connID}Get a job-scoped connection by ID.
/provisioning-jobs/{jobID}/connections/{connID}Delete a job-scoped connection (does not drop the database user).
Database-scoped
/provisioning-jobs/{jobID}/databases/{dbID}/connectionsCreate a connection scoped to a specific database.
/provisioning-jobs/{jobID}/databases/{dbID}/connectionsList connections scoped to a database.
/provisioning-jobs/{jobID}/databases/{dbID}/connections/{connID}Get a database-scoped connection.
/provisioning-jobs/{jobID}/databases/{dbID}/connections/{connID}Delete a database-scoped connection.
Component Catalog
The catalog is the curated list of installable Helm components (ingress controllers, observability stacks, cert-manager, etc.). The provisioner installs them into the space's cluster via the Helm Go SDK.
/catalog/componentsList available catalog components and their default values.
{
"data": [
{
"id": "ingress-nginx",
"name": "NGINX Ingress Controller",
"category": "ingress",
"version": "4.10.0",
"description": "Production-ready ingress controller backed by NGINX."
},
{
"id": "cert-manager",
"name": "cert-manager",
"category": "tls",
"version": "1.15.0",
"description": "Automated TLS certificate issuance via ACME."
}
]
}/provisioning-jobs/install-componentInstall a catalog component into a space's cluster.
curl -X POST https://zopnight.com/api/provisioning-jobs/install-component \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"spaceID": "spc_abc123",
"componentID": "ingress-nginx",
"values": {
"controller.service.type": "LoadBalancer"
}
}'Auto-Remediation Jobs
The provisioner also drives the remediation workflows triggered from a recommendation's Remediate button. Workflows are stored in the same provisioning_jobs table (with kind = remediation) and execute through a step-kind dispatcher. The full audit trail is kept in a dedicated remediation_audit table, of which the reconciler is the sole writer.
Step kinds
- precondition — verifies the cloud resource is in the expected state before any change is made.
- approval — parks the workflow in
awaiting_userfor destructive ops; admins approve or reject from the wizard or via the API. - execute — delegates Start/Stop to the executor.
- pause_service — provisioner-owned pause for resources that don't have a Stop primitive (Synapse Pool, ML Compute, Data Factory, Cognitive Services, Databricks Cluster on Azure; Dataproc on GCP — terminal, not reversible).
- validate — final state check after the cloud action.
Executor boundary
/remediation-jobsStart a remediation workflow for a specific recommendation.
curl -X POST https://zopnight.com/api/remediation-jobs \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"recommendationID": "rec_abc123",
"subjectType": "resource",
"subjectID": "i-0123456789abcdef0"
}'/remediation-jobs/{id}/cancelCancel a running or paused remediation workflow.
Step approval / rejection
Two equivalent endpoint shapes: address a step by its stable UUID (recommended — survives template revisions), or by the step's name in the template.
/remediation-jobs/{jobID}/steps-by-id/{stepID}/approveApprove an approval-gated step by stable step ID.
/remediation-jobs/{jobID}/steps-by-id/{stepID}/rejectReject an approval-gated step by stable step ID. The workflow transitions to terminal cancelled.
/remediation-jobs/{id}/steps/{stepName}/approveApprove by step name (legacy form, kept for backward compatibility).
/remediation-jobs/{id}/steps/{stepName}/rejectReject by step name (legacy form).
/remediation-jobs/{id}/auditStep-level audit timeline (created / approved / rejected / completed / failed) with actor + reason. Backed by the dedicated remediation_audit table.
{
"data": [
{ "stepName": "precondition", "eventType": "completed", "actor": "system", "createdAt": "2026-04-29T11:00:00Z" },
{ "stepName": "approval", "eventType": "approved", "actor": "alice@acme.com", "reason": "Confirmed off-hours window", "createdAt": "2026-04-29T11:02:30Z" },
{ "stepName": "execute", "eventType": "completed", "actor": "system", "createdAt": "2026-04-29T11:03:15Z" }
]
}Failure model
When a step fails, the provisioner classifies the error into one of three buckets so the wizard can render the right CTA:
user_action— customer must fix (permission, quota, billing). The response carries afixHintand aconsoleUrldeep-link. Not retryable.transient— cloud-side throttling or eventual consistency. Retryable; the wizard offers a "Remediate again" button.system— our code or an unsupported API. Not retryable; recorded for follow-up.
Approval-required notifications are routed to org admins via the notification service (remediation.approval_required template). Terminal success/failure emails are gated behind the REMEDIATION_NOTIFY_TERMINAL environment variable to avoid noise during high-volume remediation runs.
VPC Peering on Architecture
VPC peering connections are first-class on the Architecture canvas alongside transit gateways and interconnects. When you provision a cluster or datastore in a peered VPC, the link is rendered automatically and click-throughs land on the discovered peering connection's detail.
See Resources for the full discovered-resource model and Deployment Spaces for cluster registration.