Every account has a running capacity, given by the number of processing engines available to run jobs. Each engine processes one input item at a time. The number of engines available determines the maximum amount of input items a job can process in parallel. Inputs not being processed stand by in the input queue until an engine picks them up.
Set a model’s version processing capacity to manage the number of processing engines the model can use from the account. If all the processing engines are being used to run models, new job requests hold in the queue until engines become available again.
Nodes
Nodes are virtual or physical machines that have resources available to run processing engines. Resources include CPU, GPU, memory, and a maximum number of processing engines that can be scheduled onto the node. Check out Kubernetes Docs for more details. The resources available on a node include: