Skip to content

Split Runners Registry

1. Usage

The Split Runners Registry is designed to manage and deploy split runner servers. It allows the registration, querying, updating, and deletion of split runner instances, which run model split APIs on Kubernetes clusters.

Steps to Use the Registry:

  1. Create a Split Runner
    Use the POST /split-runner endpoint to create a new Split Runner instance. This will:
  2. Deploy a new split runner server to the Kubernetes cluster using the provided cluster_k8s_config.
  3. Register the split runner instance in the registry.

  4. Get a Split Runner
    Use the GET /split-runner/{runner_id} endpoint to retrieve details of an existing split runner instance by its ID.

  5. Update a Split Runner
    Use the PUT /split-runner/{runner_id} endpoint to update an existing split runner's details.

  6. Delete a Split Runner
    Use the DELETE /split-runner/{runner_id} endpoint to delete a split runner instance from the registry and Kubernetes.

  7. Query Split Runners
    Use the POST /split-runners endpoint to query and filter split runners based on specific criteria.


2. API Documentation

2.1 POST /split-runner

Description:
Creates a new Split Runner instance and deploys it to the Kubernetes cluster.

Request Body:

{
  "cluster_k8s_config": { /* Kubernetes config dict */ },
  "split_runner_public_host": "<server-url>",
  "split_runner_metadata": { "key": "value" },
  "split_runner_tags": ["tag1", "tag2"]
}

Response:

{
  "success": true,
  "data": {
    "message": "SplitRunner created",
    "id": "runner-id"
  }
}

cURL Command:

curl -X POST http://<server-url>:8001/split-runner \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_k8s_config": { /* Kubernetes config dict */ },
    "split_runner_public_host": "192.168.0.106",
    "split_runner_metadata": {"key": "value"},
    "split_runner_tags": ["tag1", "tag2"]
  }'

2.2 GET /split-runner/{runner_id}

Description:
Retrieves a Split Runner instance by its ID.

Response:

{
  "success": true,
  "data": {
    "split_runner_id": "runner-id",
    "split_runner_public_url": "http://split-runner-url",
    "split_runner_metadata": { "key": "value" },
    "split_runner_public_host": "<server-url>",
    "split_runner_tags": ["tag1", "tag2"]
  }
}

cURL Command:

curl -X GET http://<server-url>:8001/split-runner/runner-id

2.3 PUT /split-runner/{runner_id}

Description:
Updates the details of a Split Runner instance.

Request Body:

{
  "$set": {
    "split_runner_metadata.version": "2.0"
  },
  "$addToSet": {
    "split_runner_tags": "auto-scaled"
  }
}

Response:

{
  "success": true,
  "data": {
    "message": "SplitRunner updated"
  }
}

cURL Command:

curl -X PUT http://<server-url>:8001/split-runner/runner-id \
  -H "Content-Type: application/json" \
  -d '{
    "$set": {
      "split_runner_metadata.version": "2.0"
    },
    "$addToSet": {
      "split_runner_tags": "auto-scaled"
    }
  }'

2.4 DELETE /split-runner/{runner_id}

Description:
Deletes a Split Runner instance by its ID.

Response:

{
  "success": true,
  "data": {
    "message": "SplitRunner deleted"
  }
}

cURL Command:

curl -X DELETE http://<server-url>:8001/split-runner/runner-id

2.5 POST /split-runners

Description:
Queries split runners based on a filter.

Request Body:

{
  "split_runner_metadata.framework": "transformers",
  "split_runner_tags": { "$in": ["llm"] }
}

Response:

{
  "success": true,
  "data": [
    {
      "split_runner_id": "runner-id",
      "split_runner_public_url": "http://host:32286",
      "split_runner_metadata": { "framework": "transformers" },
      "split_runner_public_host": "<server-url>",
      "split_runner_tags": ["llm", "split"]
    }
  ]
}

cURL Command:

curl -X POST http://<server-url>:8001/split-runners \
  -H "Content-Type: application/json" \
  -d '{
    "split_runner_metadata.framework": "transformers",
    "split_runner_tags": { "$in": ["llm"] }
  }'

Model layers registry

The Model Layers Registry is used to store information about individual model layers obtained as a result of model splitting. This registry enables reusability by tracking layer hashes that can be referenced as metadata within blocks. By doing so, the system can detect whether a block is already running with an existing split, allowing for intelligent sharing and avoiding redundant layer creation.

To ensure uniqueness and consistency, the MD5 hash of each model layer must be computed and stored as the model_layer_hash. This hash serves as the primary key in the model layers registry and is the basis for identifying and matching reusable layers across different model instantiations.

Certainly! Here's the technical documentation of the Model Layer Registry schema, presented in a structured table format:


Model Layer Registry schema:

@dataclass
class ModelLayerObject:
    model_layer_hash: str = ''
    model_asset_id: str = ''
    model_component_registry_uri: str = ''
    model_layer_public_url: str = ''
    model_layer_metadata: List[Dict[str, Any]] = field(default_factory=list)
    model_layer_rank: int = 0
    model_world_size: int = 0
Field Type Required Description
model_layer_hash string yes Primary Key. MD5 hash of the serialized model layer. Used for uniqueness.
model_asset_id string yes ID of the original model asset from which this layer was generated.
model_component_registry_uri string yes URI pointing to the component spec or metadata used for this model layer.
model_layer_public_url string yes Public URL where the layer artifact is hosted (e.g., S3, HTTP).
model_layer_metadata array[dict] no Arbitrary metadata attached to this layer (e.g., shape, precision, config).
model_layer_rank integer yes Rank/index of this layer in the full model pipeline (used in splits).
model_world_size integer yes Total number of model splits (parallel components) this layer belongs to.