TORCHSERVE REST API

2021. 12. 22. 17:29

참조: https://pytorch.org/serve/management_api.html#register-a-model
---------------------------------------------------
TORCHSERVE REST API
---------------------------------------------------
# API Description:
curl -X OPTIONS http://localhost:8080

# Health check API:
curl http://localhost:8080/ping

[응답]
{
  "health": "healthy!"
}

# Predictions API
POST /predictions/{model_name}
POST /predictions/{model_name}/{vrsion}

추론 호출 예:
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg  # 고양이 이미지 다운로드
curl http://localhost:8080/predictions/resnet-18 -T kitten_small.jpg               # 추론 호출 방법 1
curl http://localhost:8080/predictions/resnet-18 -F "data=@kitten_small.jpg"  # 추론 호출 방법 2

[응답]
{
    "class": "n02123045 tabby, tabby cat",
    "probability": 0.42514491081237793
}

--- 버전 포함 호출
curl http://localhost:8080/predictions/resnet-18/2.0 -T kitten_small.jpg
curl http://localhost:8080/predictions/resnet-18/2.0 -F "data=@kitten_small.jpg"

# Explanations API
POST /explanations/{model_name}
curl http://127.0.0.1:8080/explanations/mnist -T examples/image_classifier/mnist/test_data/0.png

[응답]
  [
    [
      [
        [
          0.004570948731989492,
          0.006216969640322402,
          0.008197565423679522,
          ...
        ]
      ]
    ]
  ]

-----------------------------------------------------------------------
[참고] KFServing Inference API
POST /v1/models/{model_name}:predict
curl -H "Content-Type: application/json" --data @kubernetes/kfserving/kf_request_json/mnist.json http://127.0.0.1:8080/v1/models/mnist:predict

[응답]
{
  "predictions": [
    2
  ]
}

KFServing Explanations API

/v1/models/{model_name}:explain
curl -H "Content-Type: application/json" --data @kubernetes/kfserving/kf_request_json/mnist.json http://127.0.0.1:8080/v1/models/mnist:explain
---------------------------------------------------------------------------------

# Register a model
POST /models
curl -X POST  "http://localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar"

# Scale workers
PUT /models/{model_name}

curl -v -X PUT "http://localhost:8081/models/noop?min_worker=3"

[응답]
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 42adc58e-6956-4198-ad07-db6c620c4c1e
< content-length: 47
< connection: keep-alive
<
{
  "status": "Processing worker updates..."
}

curl -v -X PUT "http://localhost:8081/models/noop?min_worker=3&synchronous=true"

[응답]
< HTTP/1.1 200 OK
< content-type: application/json
< x-request-id: b72b1ea0-81c6-4cce-92c4-530d3cfe5d4a
< content-length: 63
< connection: keep-alive
<
{
  "status": "Workers scaled to 3 for model: noop"
}

curl -v -X PUT "http://localhost:8081/models/noop/2.0?min_worker=3&synchronous=true"

[응답]
< HTTP/1.1 200 OK
< content-type: application/json
< x-request-id: 3997ccd4-ae44-4570-b249-e361b08d3d47
< content-length: 77
< connection: keep-alive
<
{
  "status": "Workers scaled to 3 for model: noop, version: 2.0"
}

GET /models/{model_name}
curl http://localhost:8081/models/noop

[응답]
[
    {
      "modelName": "noop",
      "modelVersion": "1.0",
      "modelUrl": "noop.mar",
      "engine": "Torch",
      "runtime": "python",
      "minWorkers": 1,
      "maxWorkers": 1,
      "batchSize": 1,
      "maxBatchDelay": 100,
      "workers": [
        {
          "id": "9000",
          "startTime": "2018-10-02T13:44:53.034Z",
          "status": "READY",
          "gpu": false,
          "memoryUsage": 89247744
        }
      ]
    }
]

# Unregister a model
-----------------------
curl -X DELETE http://localhost:8081/models/noop/1.0

[응답]
{
  "status": "Model \"noop\" unregistered"
}

# List models
-----------------------
curl "http://localhost:8081/models"

# API Description
------------------
# To view all inference APIs:
curl -X OPTIONS http://localhost:8080
# To view all management APIs:
curl -X OPTIONS http://localhost:8081

# Set Default Version
---------------------
PUT /models/{model_name}/{version}/set-default
curl -v -X PUT http://localhost:8081/models/noop/2.0/set-default

# METRICS API
-----------------
Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see TorchServe Configuration. The default metrics endpoint returns Prometheus formatted metrics. You can query metrics using curl requests or point a Prometheus Server to the endpoint and use Grafana for dashboards.
By default these APIs are enable however same can be disabled by setting enable_metrics_api=false in torchserve config.properties file.

curl http://127.0.0.1:8082/metrics

[응답]
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
...

curl "http://127.0.0.1:8082/metrics?name[]=ts_inference_latency_microseconds&name[]=ts_queue_latency_microseconds" --globoff

[응답]
# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
...

# Prometheus server
--------------------------------------------------
download and install...   and Create a minimal prometheus.yml config file as below and run
./prometheus --config.file=prometheus.yml.

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'torchserve'
    static_configs:
    - targets: ['localhost:8082']  # TorchServe metrics endpoint
Navigate to http://localhost:9090/ on a browser to execute queries and create graphs

Prometheus server --> Grafana <--> WebBrowser

# Grafana
---------------------------------
[다운로드]
https://grafana.com/grafana/download
https://grafana.com/grafana/download?platform=windows

[윈도우]
C:\Program Files\GrafanaLabs\grafana\bin
grafana-server start

[리눅스]
sudo systemctl daemon-reload && sudo systemctl enable grafana-server && sudo systemctl start grafana-server
계정 정보:  admin / admin

[웹브라우저]
http://localhost:3000/

저작자표시 비영리 변경금지 (새창열림)

'TorchServe' 카테고리의 다른 글

TORCHSERVE (WINDOWS SUBSYSTEM FOR LINUX, WSL) (0)	2022.03.29
토치서브 시스템 측정 항목 (System Metrics) (0)	2021.12.22
토치서브 명령 옵션 사용하기 (0)	2021.12.20
토치서브 설치, 대시보드 설치, HTTP 요청 (0)	2021.12.08
PyTorch에서 모델 배포 (TorchServe) (0)	2021.12.07

수알치 블로그

TORCHSERVE REST API

'TorchServe' 카테고리의 다른 글

+ Recent posts

티스토리툴바