Node Pool
This feature is only supported in Crafting Enterprise Edition (self-hosted).
A Node Pool defines a group of compute nodes with specific hardware configuration (CPU, memory, storage, GPU) in the underlying cloud provider. Node pools provide the resources for sandbox workloads that have special requirements.
Node pools are created and managed via the System Admin Dashboard.
Selector Names
Workloads are matched to node pools using Selector Names — arbitrary strings assigned to a node pool. A node pool must be assigned at least one selector name, or be marked as default.
The default node pool is matched with workloads that do not specify a selector name in their schedule_spec.
Example: a node pool with selector names dev and dev-large can be targeted by workloads using either:
schedule_spec:
selector:
name: dev
or
schedule_spec:
selector:
name: dev-large
Multiple node pools can share the same selector name, making them backups of each other. If one node pool fails to scale out, another with the same selector name can handle the workloads.
Best practice: Express selector names as purposes (e.g.
dev,gpu-inference,deps-small) rather than using node pool names directly. This decouples workload definitions from infrastructure specifics.
Workload Kinds
A node pool can restrict which kinds of workloads it accepts:
- Workspaces only
- Dependencies and Containers only
- All workload kinds
As a best practice, create separate default node pools for workspaces and for dependencies/containers, as they have significantly different resource requirements.
Boot Disk and Swap
Every node in a pool requires a configured boot disk size. Additionally, Crafting requires swap to be enabled on the underlying nodes, using either a swap file or a swap device (when local attached volumes are supported by the cloud provider).
Without swap, Dynamic Resource Control is disabled. Workloads will be killed when physical memory is under pressure rather than being managed gracefully via swap.
Max Workloads per Node
This value limits the number of workloads that can be scheduled on a single node. It is a critical configuration for the auto-scaling algorithm — the scheduler uses it to determine when nodes are at capacity and new nodes need to be provisioned.
Auto Scaling
Crafting uses a custom auto-scaling algorithm designed for the needs of development environments, where developers expect near-instant sandbox readiness.
Pre-Scaling Policy
Waiting until nodes are full before scaling out leads to delays when new sandboxes are created. Crafting pre-scales using two parameters:
- Minimum available workloads: The number of additional workloads that can be scheduled immediately. The auto-scaler provisions enough nodes to maintain this headroom.
- Minimum nodes: Ensures a minimum number of nodes are always available, regardless of current workload count or resource usage.
The total number of nodes can be capped with a max node count to control costs.
These values can be configured with different values for different time windows to better match actual usage patterns.
Scale-In
The auto-scaler continuously releases nodes that have no scheduled workloads. A more aggressive scale-in mode can be enabled for specific time windows — this reschedules running workloads onto fewer nodes, releasing the vacated nodes. Rescheduling involves stopping workloads and restarting them on a different node, causing a brief period of unavailability.
Recommendation: Disable aggressive scale-in during working hours and enable it only during off-hours.
Workloads from sandboxes with Mission Critical state active are never rescheduled. See Auto Suspension.
Disabling Auto Scaling
Auto scaling can be disabled for a node pool, requiring a fixed node count to be specified explicitly.
See Also
- Schedule Specification — how workloads specify node pool requirements
- Dynamic Resource Control — memory and swap management within nodes
- System Admin Dashboard — creating and configuring node pools
- Auto Suspension — pinned and mission critical sandboxes
- Multi Region and Multi Cloud — node pools across multiple regions