Split Runners Registry
1. Usage
The Split Runners Registry is designed to manage and deploy split runner servers. It allows the registration, querying, updating, and deletion of split runner instances, which run model split APIs on Kubernetes clusters.
Steps to Use the Registry:
- Create a Split Runner
Use thePOST /split-runner
endpoint to create a new Split Runner instance. This will: - Deploy a new split runner server to the Kubernetes cluster using the provided
cluster_k8s_config
. -
Register the split runner instance in the registry.
-
Get a Split Runner
Use theGET /split-runner/{runner_id}
endpoint to retrieve details of an existing split runner instance by its ID. -
Update a Split Runner
Use thePUT /split-runner/{runner_id}
endpoint to update an existing split runner's details. -
Delete a Split Runner
Use theDELETE /split-runner/{runner_id}
endpoint to delete a split runner instance from the registry and Kubernetes. -
Query Split Runners
Use thePOST /split-runners
endpoint to query and filter split runners based on specific criteria.
2. API Documentation
2.1 POST /split-runner
Description:
Creates a new Split Runner instance and deploys it to the Kubernetes cluster.
Request Body:
{
"cluster_k8s_config": { /* Kubernetes config dict */ },
"split_runner_public_host": "<server-url>",
"split_runner_metadata": { "key": "value" },
"split_runner_tags": ["tag1", "tag2"]
}
Response:
{
"success": true,
"data": {
"message": "SplitRunner created",
"id": "runner-id"
}
}
cURL Command:
curl -X POST http://<server-url>:8001/split-runner \
-H "Content-Type: application/json" \
-d '{
"cluster_k8s_config": { /* Kubernetes config dict */ },
"split_runner_public_host": "192.168.0.106",
"split_runner_metadata": {"key": "value"},
"split_runner_tags": ["tag1", "tag2"]
}'
2.2 GET /split-runner/{runner_id}
Description:
Retrieves a Split Runner instance by its ID.
Response:
{
"success": true,
"data": {
"split_runner_id": "runner-id",
"split_runner_public_url": "http://split-runner-url",
"split_runner_metadata": { "key": "value" },
"split_runner_public_host": "<server-url>",
"split_runner_tags": ["tag1", "tag2"]
}
}
cURL Command:
curl -X GET http://<server-url>:8001/split-runner/runner-id
2.3 PUT /split-runner/{runner_id}
Description:
Updates the details of a Split Runner instance.
Request Body:
{
"$set": {
"split_runner_metadata.version": "2.0"
},
"$addToSet": {
"split_runner_tags": "auto-scaled"
}
}
Response:
{
"success": true,
"data": {
"message": "SplitRunner updated"
}
}
cURL Command:
curl -X PUT http://<server-url>:8001/split-runner/runner-id \
-H "Content-Type: application/json" \
-d '{
"$set": {
"split_runner_metadata.version": "2.0"
},
"$addToSet": {
"split_runner_tags": "auto-scaled"
}
}'
2.4 DELETE /split-runner/{runner_id}
Description:
Deletes a Split Runner instance by its ID.
Response:
{
"success": true,
"data": {
"message": "SplitRunner deleted"
}
}
cURL Command:
curl -X DELETE http://<server-url>:8001/split-runner/runner-id
2.5 POST /split-runners
Description:
Queries split runners based on a filter.
Request Body:
{
"split_runner_metadata.framework": "transformers",
"split_runner_tags": { "$in": ["llm"] }
}
Response:
{
"success": true,
"data": [
{
"split_runner_id": "runner-id",
"split_runner_public_url": "http://host:32286",
"split_runner_metadata": { "framework": "transformers" },
"split_runner_public_host": "<server-url>",
"split_runner_tags": ["llm", "split"]
}
]
}
cURL Command:
curl -X POST http://<server-url>:8001/split-runners \
-H "Content-Type: application/json" \
-d '{
"split_runner_metadata.framework": "transformers",
"split_runner_tags": { "$in": ["llm"] }
}'
Model layers registry
The Model Layers Registry is used to store information about individual model layers obtained as a result of model splitting. This registry enables reusability by tracking layer hashes that can be referenced as metadata within blocks. By doing so, the system can detect whether a block is already running with an existing split, allowing for intelligent sharing and avoiding redundant layer creation.
To ensure uniqueness and consistency, the MD5 hash of each model layer must be computed and stored as the model_layer_hash
. This hash serves as the primary key in the model layers registry and is the basis for identifying and matching reusable layers across different model instantiations.
Certainly! Here's the technical documentation of the Model Layer Registry schema, presented in a structured table format:
Model Layer Registry schema:
@dataclass
class ModelLayerObject:
model_layer_hash: str = ''
model_asset_id: str = ''
model_component_registry_uri: str = ''
model_layer_public_url: str = ''
model_layer_metadata: List[Dict[str, Any]] = field(default_factory=list)
model_layer_rank: int = 0
model_world_size: int = 0
Field | Type | Required | Description |
---|---|---|---|
model_layer_hash |
string |
yes | Primary Key. MD5 hash of the serialized model layer. Used for uniqueness. |
model_asset_id |
string |
yes | ID of the original model asset from which this layer was generated. |
model_component_registry_uri |
string |
yes | URI pointing to the component spec or metadata used for this model layer. |
model_layer_public_url |
string |
yes | Public URL where the layer artifact is hosted (e.g., S3, HTTP). |
model_layer_metadata |
array[dict] |
no | Arbitrary metadata attached to this layer (e.g., shape, precision, config). |
model_layer_rank |
integer |
yes | Rank/index of this layer in the full model pipeline (used in splits). |
model_world_size |
integer |
yes | Total number of model splits (parallel components) this layer belongs to. |