Inference Server Registry
The Inference Server Registry is used to list all inference servers for search and discovery purposes. Users can add their inference servers to this global registry if they want them to be publicly accessible.
Schema:
Here is the data-class used to represent the inference server in inference server registry:
@dataclass
class InferenceServer:
inference_server_id: str = field(default_factory=lambda: str(uuid.uuid4()))
inference_server_name: str = ''
inference_server_metadata: Dict[str, str] = field(default_factory=dict)
inference_server_tags: List[str] = field(default_factory=list)
inference_server_public_url: str = ''
Here's a explanation for each field in the InferenceServer
:
inference_server_id: str
-
A unique identifier for the inference server, generated automatically using
uuid.uuid4()
. -
inference_server_name: str
-
A human-readable name assigned to the inference server.
-
inference_server_metadata: Dict[str, str]
-
A dictionary storing additional metadata about the server, such as model type, framework, or hardware specifications.
-
inference_server_tags: List[str]
-
A list of tags associated with the inference server, which can help in categorization and searchability.
-
inference_server_public_url: str
- A publicly accessible URL where users can interact with the inference server.
Inference server registry APIs
Here are all the APIs with updated metadata, where inference_server_metadata
represents server location data (e.g., region, availability zone, provider, etc.).
1. Register an Inference Server
Endpoint:
POST /inference_server
Description:
Registers a new inference server in the system.
Example Request
curl -X POST <server-url>/inference_server \
-H "Content-Type: application/json" \
-d '{
"inference_server_name": "inference-server-us-east-1",
"inference_server_metadata": {
"region": "us-east-1",
"availability_zone": "us-east-1a",
"provider": "AWS",
"cluster_id": "cluster-123",
"quota_management_data": {
"requests_per_second_total": 100,
"requests_per_second_per_session": 10,
"requests_per_second_per_block_id": 10,
"requests_per_session_id": 1000,
"requests_per_block_id": 100
}
},
"inference_server_tags": ["NLP", "Transformer", "BERT"],
"inference_server_public_url": "https://us-east-1.inference.example.com"
}'
2. Get an Inference Server
Endpoint:
GET /inference_server/<server_id>
Description:
Fetches details of a specific inference server by its ID.
Example Request
curl -X GET <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd
3. Update an Inference Server
Endpoint:
PUT /inference_server/<server_id>
Description:
Updates an inference server’s details using mongo DB update format.
Example Request: Change Region & Provider
curl -X PUT <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd \
-H "Content-Type: application/json" \
-d '{
"$set": {
"inference_server_metadata.region": "us-west-2",
"inference_server_metadata.provider": "Google Cloud"
}
}'
Example Request: Add a New Tag
curl -X PUT <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd \
-H "Content-Type: application/json" \
-d '{
"$push": {
"inference_server_tags": "LLM"
}
}'
4. Delete an Inference Server
Endpoint:
DELETE /inference_server/<server_id>
Description:
Deletes an inference server from the registry.
Example Request
curl -X DELETE <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd
5. Query Inference Servers
Endpoint:
POST /inference_servers
Description:
Retrieves a list of inference servers matching specific criteria using MongoDB-style queries.
Example Query: Find All Servers in us-east-1a
Using GPUs
curl -X POST <server-url>/inference_servers \
-H "Content-Type: application/json" \
-d '{
"inference_server_metadata.availability_zone": "us-east-1a",
"inference_server_metadata.gpu_available": true
}'
Example Query: Find All Servers Hosted on AWS
curl -X POST <server-url>/inference_servers \
-H "Content-Type: application/json" \
-d '{
"inference_server_metadata.provider": "AWS"
}'
Example Query: Find Servers With NLP or LLM Tags
curl -X POST <server-url>/inference_servers \
-H "Content-Type: application/json" \
-d '{
"inference_server_tags": { "$in": ["NLP", "LLM"] }
}'
Example Query: Find Servers Registered After 2025-01-01
curl -X POST <server-url>/inference_servers \
-H "Content-Type: application/json" \
-d '{
"registered_at": { "$gte": "2025-01-01T00:00:00Z" }
}'
Summary of API Endpoints
Method | Endpoint | Description |
---|---|---|
POST |
/inference_server |
Create a new inference server |
GET |
/inference_server/<server_id> |
Get details of a specific inference server |
PUT |
/inference_server/<server_id> |
Update an inference server using MongoDB format |
DELETE |
/inference_server/<server_id> |
Delete an inference server |
POST |
/inference_servers |
Query inference servers with filters |