CPS-2533: Investigate the JVM Dashboard and the metrics published by MicroMeter

CPS-2533: Investigate the JVM Dashboard and the metrics published by MicroMeter

 

References

https://lf-onap.atlassian.net/wiki/spaces/DW/pages/78643234

Issues & Decisions

Issue

Notes 

Decision

Issue

Notes 

Decision

1

What version of Grafana to use for CPS Docker Compose (used when locally), Nordix version used when running nightly/scheduled

See the pros/cons below

It only affects local really, not sure if we need to align, maybe it is better to keep the lightweight version locally

Dec 12, 2024 @Toine Siebelink @Halil Cakal

Having the grafana/grafana image for local CPS' docker-compose since there is no licensing issue. Jan 13, 2025

2

Write access to the Grafana dashboard etc for other team members

New a few team members at least

Dec 12, 2024 @Toine Siebelink @Halil Cakal

The committers can have ADMIN rights on Nordix Grafana. Jan 13, 2025

Modifying the Grafana Dashboard

Step 1: Log In to Grafana

  1. Open your web browser and navigate to your Grafana instance.

  2. Enter your username and password to log in.

Step 2: Access the Dashboards Section

  1. In the left-hand side menu, click on the Dashboard icon (a four-square grid icon).

  2. Click + Create or navigate to New Dashboard.

  3. Click + Add visualization

  4. Select the data source and choose the Prometheus

Step 3: Add a New Panel

  1. Click the Add a new panel button to open the panel editor.

  2. In the panel editor, there are three main divisions:

    • Visualization

    • Data Queries

    • Panel Options

  3. In the panel, define:

    • Time range: Set the time range to 30 minutes

    • Query: Select the metric: cps_ncmp_dmi_get_seconds_count and Add Rate with a Range of $__rate_interval

      • It should create a Query like this: sum(rate(cps_ncmp_dmi_get_seconds_count{}[$__rate_interval]))

    • Query Label (Optional): Set the Label filters amongst instances, jobs, and the method.

    • Title: Enter a title for your panel in the "Panel Title" field like Dmi Get

    • Unit: Search for a unit on the panel options and Select Throughput > counts/sec

    • Visualization: Choose how the data will be displayed: Time series

    • Apply: Save the panel

Step 4: Add More Panels (Optional)

  1. To add more panels, click the Add panel button or the "+" icon in the dashboard toolbar.

  2. Repeat the process of defining queries, visualization types, and configurations.

Step 5: Save the Dashboard

  1. Once you've added all necessary panels, click the Save dashboard button (floppy disk icon) at the top of the page.

  2. Provide a name for your dashboard e.g. CPS Metrics

  3. Optionally, choose a folder to organize your dashboards or keep it in the general folder.

  4. Click Save.

Note: The new dashboard won’t be saved on your local machine if you were using the docker-compose from the CPS repository.

Since the Grafana container will be terminated and re-created for the next time. If you want to keep your new dashboard for the future, you have to add the json

model of it into the CPS code repository.

Step 6: Customize Dashboard Settings (Optional)

  1. Click the Dashboard settings (gear icon) to customize settings such as:

    • Dashboard time range and refresh interval.

    • Variables for dynamic queries.

    • Permissions for specific users or teams.

Step 7: Share Your Dashboard (Optional)

  1. To share the dashboard, click the Share button (arrow icon).

  2. You can:

    • Generate a shareable link.

    • Embed the dashboard in an iframe.

    • Export the dashboard as JSON.

Step 8: Explore or Test

  1. Use the time selector at the top right to test your dashboard with different time ranges.

  2. Check that the panels display data as expected.

All Available MicroMeter’ Metrics

A screenshot from Nordix Grafana shows the dashboard available to CPS.

CPS Grafana Dashboards

Prefix

Included in Dashboard

Metrics

Description

Prefix

Included in Dashboard

Metrics

Description

JVM

JVM

jvm_buffer_count_buffers
jvm_buffer_total_capacity_bytes
jvm_classes_unloaded_classes_total
jvm_gc_live_data_size_bytes
jvm_gc_memory_allocated_bytes_total
jvm_gc_overhead_percent
jvm_gc_pause_seconds_max
jvm_info
jvm_memory_max_bytes
jvm_memory_used_bytes
jvm_threads_live_threads
jvm_threads_started_threads_total

Micrometer in Spring Boot publishes JVM metrics such as memory usage (heap, non-heap, and buffer pools), garbage collection activity (count and time per collector), and thread states (live, daemon, and non-daemon), providing critical insights into application performance.

PostgreSQL

PostgreSQL Statistics

Database version, size, shared buffer, effective cache
Connection / Transaction Statistics
Read Stats, Change Stats
Cache Hit Rates
Deadlocks, Locked Tables, Temp File

It reveals plenty of statistics of PostgreSQL Database such as db version, size, and caches…

https://lf-onap.atlassian.net/browse/CPS-2585

application

N/A

application_ready_time_seconds

The time the application becomes fully ready.

CPS

N/A

cps_data_persistence_service_datanode_batch_get_seconds
cps_data_persistence_service_datanode_get_seconds
cps_data_persistence_service_datanode_query_seconds
cps_data_service_datanode_batch_get_seconds
cps_data_service_datanode_child_save_seconds
cps_data_service_datanode_descendants_batch_update_seconds
cps_data_service_datanode_get_seconds
cps_data_service_datanode_query_seconds
cps_data_service_list_element_save_seconds
cps_dataupdate_events_publish_seconds
cps_module_persistence_schemaset_store_seconds
cps_module_service_module_reference_query_by_attribute_seconds
cps_module_service_schemaset_create_seconds
cps_utils_yangparser_nodedata_with_parent_parse_seconds
cps_yangschema_cache_gauge
cps_yangtextschemasourceset_build_seconds

Monitor various custom metrics enabled to keep track of services/methods in CPS development.

Note: Divide CPS and NCMP metrics into two different categories. Jan 13, 2025

NCMP

Data REST Interfaces

cps_ncmp_inventory_controller_update_seconds
cps_ncmp_cmhandle_state_update_batch_seconds (can be ignored since LCM State dashboard is already in place)
cps_ncmp_lcm_events_publish_seconds


cps_ncmp_controller_get_seconds
cps_ncmp_dmi_get_seconds
cps_ncmp_inventory_module_references_from_dmi_seconds


cps_ncmp_inventory_persistence_datanode_get_seconds

Monitor various custom metrics enabled to keep track of services/methods in NCMP development.

There are three sub-metrics by default available for the metrics that are annotated with “@Timed" annotation which are:

_count: Frequency of the method invocation.

_sum: The average of cumulative time spent on the method invocation.

_max: The longest duration of the method invocation.

Jan 20, 2025 It has been decided to have two dashboards: Inventory (NetworkCmProxyInventoryController.java) and Data (NetworkCmProxyController.java) in the Grooming Meeting.

https://lf-onap.atlassian.net/browse/CPS-2573

NCMP Inventory and LCM States

Inventory REST Interfaces

ADVISED
LOCKED
READY
DELETING
DELETED

NCMP metrics tracking the number of cmHandles by their LCM State.

As an initial dashboard development start with this. Also, include these metrics inside NCMP when dividing it into CPS and NCMP

Jan 13, 2025

https://lf-onap.atlassian.net/browse/CPS-2567

process

N/A

process_cpu_usage
process_files_open_files
process_uptime_seconds

Monitors resource usage and stability.

spring

N/A

spring_data_repository_invocations_seconds_count
spring_data_repository_invocations_seconds_sum
spring_kafka_listener_seconds_max
spring_kafka_template_seconds_count
spring_kafka_template_seconds_sum

Spring-specific metrics monitor repository and Kafka operations for detailed performance analysis.

task

N/A

tasks_scheduled_execution_active_seconds_active_count
tasks_scheduled_execution_active_seconds_max
tasks_scheduled_execution_seconds_max

Providing insights into task execution performance.

cache

N/A

cache_eviction_weight_total
cache_gets_total
cache_puts_total
cache_size

Provides cached YangTextSchemaSourceSet. Responsible for some cached operations for Yang Modules.

disk

N/A

disk_free_bytes

Measures the available disk space in bytes on the storage device where the application is running.

executor

N/A

executor_active_threads
executor_pool_core_threads
executor_pool_size_threads
executor_queued_tasks

Monitor thread pool activities such as thread count (total) queued tasks awaiting execution.

hikaricp

Cps Database Pool

hikaricp_connections
hikaricp_connections_acquire_seconds_max
hikaricp_connections_active
hikaricp_connections_creation_seconds_max
hikaricp_connections_idle
hikaricp_connections_min
hikaricp_connections_timeout_total
hikaricp_connections_usage_seconds_max

Track database connection pool performance such as connection count, active and idle connections, acquisition, creation, etc…

These metrics will be considered in a dashboard like Datasource.(Jan 27, 2025 in the Grooming meeting)

https://lf-onap.atlassian.net/browse/CPS-2592

HTTP

Inventory and Data REST Interfaces

http_client_requests_active_seconds_active_count
http_client_requests_active_seconds_max
http_client_requests_seconds_max
http_server_requests_active_seconds_active_count
http_server_requests_active_seconds_max
http_server_requests_seconds_max

Monitor HTTP client and server metrics such as request counts, latencies, and max client req. duration, and total client req. time…

https://lf-onap.atlassian.net/browse/CPS-2567

https://lf-onap.atlassian.net/browse/CPS-2573

JDBC

Cps Database Pool

jdbc_connections_active
jdbc_connections_max

Monitor the number of currently active and max db connections.

https://lf-onap.atlassian.net/browse/CPS-2592

jetty

N/A

jetty_connections_bytes_in_bytes_count
jetty_connections_bytes_in_bytes_sum
jetty_connections_bytes_out_bytes_max
jetty_connections_current_connections
jetty_connections_messages_in_messages_total
jetty_connections_request_seconds_count
jetty_connections_request_seconds_sum
jetty_threads_config_max
jetty_threads_current
jetty_threads_jobs

Monitor internal jetty server parameters such as inbound and outbound connection bytes, current active connections, message counts, request processing times, etc…

system

N/A

system_cpu_count
system_load_average_1m

Track the total number of CPU cores available on the system and the system average. load over the past 1 min.

The Differences between ‘grafana’ and 'grafana-oss' images

Jira Link: https://lf-onap.atlassian.net/browse/CPS-2544

Feature/Aspect

grafana/grafana

grafana/grafana-oss

License

Proprietary (Grafana Enterprise features included)

Open Source (Grafana OSS)

Features

Includes Enterprise features (e.g., advanced security, data source plugins, and reporting)

Core OSS features only, fewer plugins and integrations

Cost

Requires an Enterprise license for some features

Free to use under AGPLv3 license

Plugins

Supports all plugins, including Enterprise ones

Limited to community and OSS plugins

Usage Scenario

Suitable for enterprises needing premium features and support

Ideal for open-source enthusiasts or basic use cases

Maintenance

Regular updates and official support are available

Community-supported, regular updates

Docker Image Size

Larger due to bundled Enterprise features

Smaller, leaner image