Node Pool

This feature is only supported in Crafting Enterprise Edition (self-hosted).

A Node Pool defines a group of compute nodes with specific hardware configuration (CPU, memory, storage, GPU) in the underlying cloud provider. Node pools provide the resources for sandbox workloads that have special requirements.

Node pools are created and managed via the System Admin Dashboard.

Selector Names

Workloads are matched to node pools using Selector Names — arbitrary strings assigned to a node pool. A node pool must be assigned at least one selector name, or be marked as default.

The default node pool is matched with workloads that do not specify a selector name in their schedule_spec.

Example: a node pool with selector names dev and dev-large can be targeted by workloads using either:

schedule_spec:
  selector:
    name: dev

or

schedule_spec:
  selector:
    name: dev-large

Multiple node pools can share the same selector name, making them backups of each other. If one node pool fails to scale out, another with the same selector name can handle the workloads.

Best practice: Express selector names as purposes (e.g. dev, gpu-inference, deps-small) rather than using node pool names directly. This decouples workload definitions from infrastructure specifics.

Workload Kinds

A node pool can restrict which kinds of workloads it accepts:

As a best practice, create separate default node pools for workspaces and for dependencies/containers, as they have significantly different resource requirements.

Boot Disk and Swap

Every node in a pool requires a configured boot disk size. Additionally, Crafting requires swap to be enabled on the underlying nodes, using either a swap file or a swap device (when local attached volumes are supported by the cloud provider).

Without swap, Dynamic Resource Control is disabled. Workloads will be killed when physical memory is under pressure rather than being managed gracefully via swap.

Max Workloads per Node

This value limits the number of workloads that can be scheduled on a single node. It is a critical configuration for the auto-scaling algorithm — the scheduler uses it to determine when nodes are at capacity and new nodes need to be provisioned.

Auto Scaling

Crafting uses a custom auto-scaling algorithm designed for the needs of development environments, where developers expect near-instant sandbox readiness.

Pre-Scaling Policy

Waiting until nodes are full before scaling out leads to delays when new sandboxes are created. Crafting pre-scales using two parameters:

The total number of nodes can be capped with a max node count to control costs.

These values can be configured with different values for different time windows to better match actual usage patterns.

Scale-In

The auto-scaler continuously releases nodes that have no scheduled workloads. A more aggressive scale-in mode can be enabled for specific time windows — this reschedules running workloads onto fewer nodes, releasing the vacated nodes. Rescheduling involves stopping workloads and restarting them on a different node, causing a brief period of unavailability.

Recommendation: Disable aggressive scale-in during working hours and enable it only during off-hours.

Workloads from sandboxes with Mission Critical state active are never rescheduled. See Auto Suspension.

Disabling Auto Scaling

Auto scaling can be disabled for a node pool, requiring a fixed node count to be specified explicitly.

See Also