...
System Metrics that aply to all Policy components
...
These metrics are available and exposed via a Prometheus endpoint since Istanbul release.
Note: Standard metrics are already exposed for Policy DB (MariaDB) via common charts.
Metric | Prometheus Query | |||
---|---|---|---|---|
Memory usage | jvm_memory_bytes_used | Yes | Yes | Available in Istanbul |
CPU Usage | process_cpu_seconds_total | Yes | Yes||
JVM threads | jvm_threads_current | Yes | Yes | |
Process uptime | process_start_time_seconds | Yes | Yes | |
Garbage Collectors | GCs per second: rate(jvm_gc_collection_seconds_sum[1m]) Avg GC time: rate(jvm_gc_collection_seconds_sum[1m]) / rate(jvm_gc_collection_seconds_count[1m]) | Yes | Yes |
Note: Standard metrics are already exposed for Policy DB (MariaDB) via common charts.
Key metrics for Policy API
Metric | Metric available? | Exposed via Prometheus endpoint? | Comment | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Availability of policy-api service | Yes | No | Exposed by policy-api healthcheck
and policy-pap consolidated healthcheck. | |||||||
Latency | No | No | To be implemented for all CRUD endpoints exposed by policy-api. Sample s3p numbers for policy-api stress tests. | |||||||
Request rate (API requests per minute) | No | No | Number of API calls per minute | |||||||
Failure rate (API errors per minute) | No | No | Number of API calls with non 20* family of status codes per minute | |||||||
SSL certificate expiry time | No | No |
...
Metric | Metric available? | Exposed via Prometheus endpoint? | Comment | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Availability of the policy-pap service | Yes | No | policy-pap healthcheck API | ||||||||||||||
Status of PDPs as registered with policy-pap | Yes | No | policy-pap consolidated healthcheck API | ||||||||||||||
Request rate (API requests per minute) | No | No | To be implemented for all the endpoints exposed by policy-pap. Sample s3p numbers for policy-pap stress tests. | ||||||||||||||
Failure rate (API errors per minute) | No | No | To be implemented for all the endpoints exposed by policy-pap. Number of API calls with non 200 family of status codes per minute | ||||||||||||||
Latency | No | No | To be implemented for all the endpoints exposed by policy-pap. | ||||||||||||||
Policy deployment statistics policyDeployFailureCount | Yes | No | Sample:
| Latency | No | No | To be implemented for all the endpoints exposed by policy-pap. | ||||||||||
SSL certificate expiry time | No | No | https is disabled for entire Policy framework |
...