Skip to content

Inference Server Registry

The Inference Server Registry is used to list all inference servers for search and discovery purposes. Users can add their inference servers to this global registry if they want them to be publicly accessible.

Schema:

Here is the data-class used to represent the inference server in inference server registry:

@dataclass
class InferenceServer:
    inference_server_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    inference_server_name: str = ''
    inference_server_metadata: Dict[str, str] = field(default_factory=dict)
    inference_server_tags: List[str] = field(default_factory=list)
    inference_server_public_url: str = ''

Here's a explanation for each field in the InferenceServer:

  1. inference_server_id: str
  2. A unique identifier for the inference server, generated automatically using uuid.uuid4().

  3. inference_server_name: str

  4. A human-readable name assigned to the inference server.

  5. inference_server_metadata: Dict[str, str]

  6. A dictionary storing additional metadata about the server, such as model type, framework, or hardware specifications.

  7. inference_server_tags: List[str]

  8. A list of tags associated with the inference server, which can help in categorization and searchability.

  9. inference_server_public_url: str

  10. A publicly accessible URL where users can interact with the inference server.

Inference server registry APIs

Here are all the APIs with updated metadata, where inference_server_metadata represents server location data (e.g., region, availability zone, provider, etc.).


1. Register an Inference Server

Endpoint:
POST /inference_server

Description:
Registers a new inference server in the system.

Example Request

curl -X POST <server-url>/inference_server \
     -H "Content-Type: application/json" \
     -d '{
           "inference_server_name": "inference-server-us-east-1",
           "inference_server_metadata": {
               "region": "us-east-1",
               "availability_zone": "us-east-1a",
               "provider": "AWS",
               "cluster_id": "cluster-123",
               "quota_management_data": {
                    "requests_per_second_total": 100,
                    "requests_per_second_per_session": 10,
                    "requests_per_second_per_block_id": 10,
                    "requests_per_session_id": 1000,
                    "requests_per_block_id": 100
                }
           },
           "inference_server_tags": ["NLP", "Transformer", "BERT"],
           "inference_server_public_url": "https://us-east-1.inference.example.com"
         }'

2. Get an Inference Server

Endpoint:
GET /inference_server/<server_id>

Description:
Fetches details of a specific inference server by its ID.

Example Request

curl -X GET <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd

3. Update an Inference Server

Endpoint:
PUT /inference_server/<server_id>

Description:
Updates an inference server’s details using mongo DB update format.

Example Request: Change Region & Provider

curl -X PUT <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd \
     -H "Content-Type: application/json" \
     -d '{
           "$set": {
               "inference_server_metadata.region": "us-west-2",
               "inference_server_metadata.provider": "Google Cloud"
           }
         }'

Example Request: Add a New Tag

curl -X PUT <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd \
     -H "Content-Type: application/json" \
     -d '{
           "$push": {
               "inference_server_tags": "LLM"
           }
         }'

4. Delete an Inference Server

Endpoint:
DELETE /inference_server/<server_id>

Description:
Deletes an inference server from the registry.

Example Request

curl -X DELETE <server-url>/inference_server/d4e8c3f0-7c8d-4a2e-b23e-2d4b6789abcd

5. Query Inference Servers

Endpoint:
POST /inference_servers

Description:
Retrieves a list of inference servers matching specific criteria using MongoDB-style queries.


Example Query: Find All Servers in us-east-1a Using GPUs

curl -X POST <server-url>/inference_servers \
     -H "Content-Type: application/json" \
     -d '{
           "inference_server_metadata.availability_zone": "us-east-1a",
           "inference_server_metadata.gpu_available": true
         }'

Example Query: Find All Servers Hosted on AWS

curl -X POST <server-url>/inference_servers \
     -H "Content-Type: application/json" \
     -d '{
           "inference_server_metadata.provider": "AWS"
         }'

Example Query: Find Servers With NLP or LLM Tags

curl -X POST <server-url>/inference_servers \
     -H "Content-Type: application/json" \
     -d '{
           "inference_server_tags": { "$in": ["NLP", "LLM"] }
         }'

Example Query: Find Servers Registered After 2025-01-01

curl -X POST <server-url>/inference_servers \
     -H "Content-Type: application/json" \
     -d '{
           "registered_at": { "$gte": "2025-01-01T00:00:00Z" }
         }'

Summary of API Endpoints

Method Endpoint Description
POST /inference_server Create a new inference server
GET /inference_server/<server_id> Get details of a specific inference server
PUT /inference_server/<server_id> Update an inference server using MongoDB format
DELETE /inference_server/<server_id> Delete an inference server
POST /inference_servers Query inference servers with filters