Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

MetricMetric available?

Exposed via Prometheus endpoint?

Comment
Availability of policy-api serviceYesNoYes

Exposed by policy-api healthcheck and policy-pap consolidated healthcheck.

Latency


NoYesNoYes

To be implemented for all CRUD endpoints exposed by policy-api.

Sample s3p numbers for policy-api stress tests.

Successful API request counterNoYesNoYes

Prometheus query for Number of successful API calls per minute

Failed API request counterNoYesNoYes

Prometheus query for Number of API calls with non 20* family of status codes per minute

...

MetricMetric available?Exposed via Prometheus endpoint?Comment
Availability of the policy-pap serviceYesNoYes

policy-pap healthcheck API

Successful API request counter

NoYesNoYes

To be implemented for all the endpoints exposed by policy-pap.

Sample s3p numbers for policy-pap stress tests. 

Failed API request counter

NoYesNoYes

To be implemented for all the endpoints exposed by policy-pap.

Number of API calls with non 200 family of status codes per minute

Latency

NoYesNoYes

To be implemented for all the endpoints exposed by policy-pap.

Policy deployment statistics

policyDeployFailureCount
policyDeploySuccessCount
totalPolicyDeployCount

YesNoYes

Sample:

Code Block
languagebash
titleGET /policy/pap/v1/statistics
collapsetrue
{
    "code": 200,
    "policyDeployFailureCount": 0,
    "policyDeploySuccessCount": 0,
    "policyDownloadFailureCount": 0,
    "policyDownloadSuccessCount": 0,
    "totalPdpCount": 0,
    "totalPdpGroupCount": 0,
    "totalPolicyDeployCount": 0,
    "totalPolicyDownloadCount": 0
}


...

MetricMetric available?Exposed via Prometheus endpoint?Comment
Availability of the policy-distribution serviceYesNoYes

Exposed by policy-distribution healthcheck and consolidated policy-pap healthcheck

Successful API request counter

NoYesNoYes

To be implemented for all the endpoints exposed by policy-distribution.

Sample s3p numbers for policy-distribution stress tests. 

Failed API request counter

NoYesNoYes

To be implemented for all the endpoints exposed by policy-distribution.

Number of API calls with non 200 family of status codes per minute

Latency

NoYesNoYes

To be implemented for all the endpoints exposed by policy-distribution.

Policy distribution statistics

distributions
distribution_complete_ok
distribution_complete_fail
downloads
downloads_ok
downloads_error

YesNoYes

...

MetricMetric available?Exposed via Prometheus endpoint?Comment
Availability of policy-apex-pdpYesNoYes

Exposed by policy-apex-pdp healthcheck and policy-pap consolidated healthcheck.

TOSCA Policy Deployment counter (per apex-pdp instance)

policyDeployCount
policyDeploySuccessCount
policyDeployFailCount

YesNoYes

Exposed by policy-pap statistics

Code Block
titleGET /policy/pap/v1/statistics/defaultGroup/apex
collapsetrue
{
  "defaultGroup": {
    "apex": [
      {
        "pdpInstanceId": "devdev-policy-apex-pdp-0",
        "timeStamp": "2021-09-07T20:10:52.242Z",
        "pdpGroupName": "defaultGroup",
        "pdpSubGroupName": "apex",
        "policyDeployCount": 2,
        "policyDeploySuccessCount": 2,
        "policyDeployFailCount": 0,
        "policyExecutedCount": 0,
        "policyExecutedSuccessCount": 0,
        "policyExecutedFailCount": 0,
        "engineStats": [
          {
            "engineId": "NSOApexEngine-0:0.0.1",
            "engineWorkerState": "READY",
            "engineTimeStamp": 1630550345549,
            "eventCount": 0,
            "lastExecutionTime": 0,
            "averageExecutionTime": 0,
            "upTime": 0,
            "lastEnterTime": 0,
            "lastStart": 1630550345549
          },
          ......
        ]
      }
    ]
  }
}






TOSCA Policy Execution counter (per apex-pdp instance)

# of policies executed
# of policies executed with success status
# of policies executed with a failure status

*Note: the stats currently displays APEX policy counters

NoNoYesYes

Engine stats (by engineID per apex-pdp instance)

eventCount: number of APEX events processed
engineWorkerState: possible values defined in AxEngineState
upTime: time that has elapsed since the policy engine was started
averageExecutionTime: average time taken to process an APEX policy
lastExecutionTime: time taken to process the last APEX policy
lastStart:  time time at which the policy engine was last started, uptime is derived from this metric

YesNoYes

Count of events processed (per engine thread, per apex-pdp instance)

#  of incoming trigger events processed by policy-apex-pdp
# of incoming trigger events processed successfully by policy-apex-pdp
# of incoming trigger events processed by policy-apex-pdp that resulted in a failure

*Note: the stats currently displays APEX event counters processed by the engine

NoYesNoYes

Latency

NoYesYes

Time taken for processing an incoming APEX event 

*Note: the stats currently displays execution time for processing APEX policy, and is a measure of system saturation and is sufficient

Kafka consumer lag

NoNo

Can be implemented outside of the Policy FWK.

Monitor kafka consumer lag increase for kafka/dmaap-message-router topics related to apex-pdp

...