Performance testing CPS/NCMP using metrics
Abstract
This document describes analyzing performance of CPS/NCMP using Prometheus metrics and analyzing performance of SQL queries.
This guide will use case studies of analyzing performance of some NCMP operations, and be able to answer the following questions:
What methods take the most time?
What percentage of time is spent in the Database, Hazelcast, Java, and DMI?
Additionally, there are some instructions on doing deeper performance analysis of SQL queries.
Gathering metrics on a laptop
Disabling Turbo Boost on CPU
Some CPUs have a feature called dynamic frequency scaling which allows the CPU to run at a higher clock speed when running demanding tasks. Intel processors call this Turbo Boost. While the feature can improve performance, it can make testing very unstable, as the system may overheat and throttle when running sustained loads. This is because Turbo Boost allows the system to exceed Thermal Design Power (TDP).
Additionally, the server running NCMP KPI Jenkins pipeline job has clock speed of around 2.5 GHz. On a laptop with i9, the max clock speed is 2.6 GHz with turbo off, 5.0 GHz with turbo on. Disabling it will give a better approximation of server performance.
It is needed to disable Turbo Boost if running performance tests on a laptop, otherwise test results will fluctuate wildly between runs. Here are instructions to disable it on Dell XPS laptops: Tutorial on disable Turbo boost within BIOS | Dell Ireland
Running k6 tests on a laptop
Running the k6 tests is straightforward. From the command line, go to the cps repo and execute:
./k6-tests/run-k6-tests.sh
As can be seen below, the script will install needed dependencies, and kick of the KPI test suite:
---> install-deps.sh
Installing dependencies
docker-compose already installed
Docker Compose version v2.29.2
k6 already installed
k6 v0.51.0 (go1.22.3, linux/amd64)
Extensions:
github.com/mostafa/xk6-kafka v0.26.0, k6/x/kafka [js]
Test profile provided: kpi
Spinning off the CPS and NCMP containers for kpi testing...
[+] Running 9/9
✔ Network kpi_default Created 0.2s
✔ Container kpi-dbpostgresql Healthy 6.1s
✔ Container kpi-ncmp-dmi-plugin-demo-and-csit-stub Healt... 11.6s
✔ Container kpi-zookeeper Healthy 11.1s
✔ Container kpi-postgres-exporter Healthy 2.1s
✔ Container kpi-cps-and-ncmp-2 Healthy 37.1s
✔ Container kpi-cps-and-ncmp-1 Healthy 37.1s
✔ Container kpi-kafka Healthy 12.5s
✔ Container kpi-nginx-loadbalancer Healthy 37.5s
Build information:
{"git":{"commit":{"message":{"short":"Enhance batch operation performance with AlternateIdMatcher"},"user":{"name":"leventecsanyi","email":"levente.csanyi@est.tech"},"id":{"full":"20358e890a54627d6f6962c462d1a55ea24d44a7"}},"branch":"levente","build":{"time":"2025-03-19T11:13:39.039Z","version":"3.6.2-SNAPSHOT"}}}
Running kpi performance tests...
After around 30 minutes, you will see a test report - note the CSV report can be copy-pasted into a spreadsheet for further analysis if needed.:
-- BEGIN CSV REPORT
#,Test Name,Unit,Fs Requirement,Current Expectation,Actual
0,HTTP request failures for all tests,rate of failed requests,0,0,0.000
1,Registration of CM-handles,CM-handles/second,22,90,137.451
2,De-registration of CM-handles,CM-handles/second,22,140,391.481
3a,CM-handle ID search with No filter,milliseconds,2000,300,242.793
3b,CM-handle ID search with Module filter,milliseconds,2000,120,65.597
3c,CM-handle ID search with Property filter,milliseconds,2000,750,520.572
3d,CM-handle ID search with Cps Path filter,milliseconds,2000,750,536.860
3e,CM-handle ID search with Trust Level filter,milliseconds,2000,3000,445.227
4a,CM-handle search with No filter,milliseconds,15000,3000,3003.934
4b,CM-handle search with Module filter,milliseconds,15000,4000,3458.808
4c,CM-handle search with Property filter,milliseconds,15000,4500,3975.444
4d,CM-handle search with Cps Path filter,milliseconds,15000,4500,3914.949
4e,CM-handle search with Trust Level filter,milliseconds,15000,7000,4051.910
5a,NCMP overhead for Synchronous single CM-handle pass-through read,milliseconds,40,20,18.244
5b,NCMP overhead for Synchronous single CM-handle pass-through read with alternate id,milliseconds,40,40,25.716
6a,NCMP overhead for Synchronous single CM-handle pass-through write,milliseconds,40,20,19.702
6b,NCMP overhead for Synchronous single CM-handle pass-through write with alternate id,milliseconds,40,40,25.995
7,Legacy batch read operation,events/second,150,300,648.992
-- END CSV REPORT
Finally, the docker environment will be shut down:
Stopping, Removing containers and volumes for kpi tests...
[+] Running 9/9
✔ Container kpi-ncmp-dmi-plugin-demo-and-csit-stub Remov... 0.4s
✔ Container kpi-kafka Removed 1.0s
✔ Container kpi-nginx-loadbalancer Removed 0.4s
✔ Container kpi-postgres-exporter Removed 0.3s
✔ Container kpi-cps-and-ncmp-2 Removed 2.6s
✔ Container kpi-cps-and-ncmp-1 Removed 2.5s
✔ Container kpi-zookeeper Removed 0.6s
✔ Container kpi-dbpostgresql Removed 0.5s
✔ Network kpi_default Removed 0.3s
Tips & Tricks for running quick tests
To Do: Document which variables to change for quicker test runs: 20k, removing shutdown delay, lowering initial module delay, lowering test durations, etc.
Analyzing SQL queries in Postgresql
During performance analysis of operations in CPS/NCMP, it will often be the case that much time is spent in database queries. Postgres can be used to analyze performance using the EXPLAIN ANALYZE command. But in order to run those queries used in NCMP, it’s necessary to populate the database with realistic data. We will do this by running the k6 tests to do CM-handle registration.
Populating the database with CM-handles
A k6 script will be used to populate the database with CM-handles. Start docker-compose using kpi profile, and run the k6 script to register CM-handles:
docker-compose -f docker-compose/docker-compose.yml --profile dmi-stub --project-name kpi up --wait
k6 run ./k6-tests/ncmp/register-cmhandles-only.js -e TEST_PROFILE=kpi
The system will now be running and populated with CM-handles.
Connecting to the database
You will need a Postgresql client for this. On Ubuntu Linux, you can install the postgresql client:
sudo apt-get install postgresql-client
On command line, connect to the database using psql to localhost port 5432:
psql -h 127.0.0.1 -p 5432 -U cps cpsdb
The default password is ‘cps’.
You can verify that the database is populated by examining the dataspace/anchor/fragment tables. For example:
cpsdb=# select * from dataspace;
id | name
----+-----------------
1 | CPS-Admin
2 | NCMP-Admin
4 | NFP-Operational
(3 rows)
cpsdb=# select count(*) from anchor;
count
-------
50003
(1 row)
Analyzing SQL queries
Let’s say you wish to analyze the performance of the query to get a dataspace by name. First run the query to make sure it returns the expected result:
cpsdb=# SELECT * FROM dataspace WHERE name = 'NCMP-Admin';
id | name
----+------------
2 | NCMP-Admin
(1 row)
Now you may analyze performance with EXPLAIN ANALYZE:
cpsdb=# EXPLAIN ANALYZE SELECT * FROM dataspace WHERE name = 'NCMP-Admin';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Scan using uq_name on dataspace (cost=0.15..8.17 rows=1 width=36) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (name = 'NCMP-Admin'::text)
Planning Time: 0.076 ms
Execution Time: 0.055 ms
(4 rows)
In the above case, it can be seen that database is using and Index Scan to find the row, which is very efficient. You can also see the time taken to plan and execute the query.
Shutting down test deployment
When you are finished analysis, you can shut down the system using docker-compose down:
docker-compose -f docker-compose/docker-compose.yml --profile dmi-stub --project-name kpi down
Profiling code with reports of @Timed annotations
Within the cps repo, there is a folder called ‘test-tools’ which contains some scripts useful for generating reports.
Generating reports
Running the generate-metrics-report.sh script will generate a report and output the filename of it:
cd test-tools/
./generate-metrics-report.sh
metrics-reports/metrics-2025-03-18T10:40:21+00:00.tsv
Opening the generated file in a spreadsheet will show a report of exposed Prometheus metrics from CPS/NCMP. It shows the count (how many times the method was called) and the sum (total amount of time spent in that method):
Subtracting reports
It is useful to be able to subtract two reports to generate a delta report. Exposed metrics are cumulative from the application startup, but many times you will only want metrics during a given time. For example, when testing NCMP operations, you may want to exclude set up time like CM-handle registration, and only include metrics while running a test case.
Given you have two reports, you can use the script ‘subtract-metrics-reports.py’ to generate a delta report:
cd test-tools/
./generate-metrics-report.sh
metrics-reports/metrics-2025-03-18T10:40:21+00:00.tsv
./generate-metrics-report.sh
metrics-reports/metrics-2025-03-18T10:42:22+00:00.tsv
./subtract-metrics-reports.py -a metrics-reports/metrics-2025-03-18T10:42:22+00:00.tsv \
-b metrics-reports/metrics-2025-03-18T10:40:21+00:00.tsv \
-o my-report.tsv
-a specifies the later report (after)
-b specifies the earlier report (before)
-o specifies the name of the output file
Note you are likely to encounter an error if you are running two instances of NCMP, as each instance have their own metrics. It is recommended to change the docker-compose.yml to have 1 replica of cps-and-ncmp container to prevent this.
Generating report for a single test case
The basic flow is this:
Modify docker-compose.yml to set 1 replica for cps-and-ncmp container.
Modify the k6 config (k6-tests/ncmp/config/kpi.json) to remove scenarios you are not interested in.
Run the k6 tests using k6-tests/run-k6-tests.sh
Wait until registration is finished (e.g. monitoring docker logs: docker logs -f kpi-nginx-loadbalancer)
Take a metrics report using generate-metrics-report.sh
Wait a few minutes
Take another metrics report using generate-metrics-report.sh
Subtract the two reports using subtract-metrics-reports.py
Review the results.
Adding new metrics
The metric reports will only show metrics that are exposed in the application. If you want to add new metrics, you may add @Timed annotations to methods in the Java code and recompile. For example:
@Timed(value = "cps.data.service.datanode.query", description = "Time taken to query data nodes")
Case Studies
Note: many code changes were made since these two analyses were made. As such, they are just examples and do not reflect current performance.
Example Analysis: NCMP overhead for Synchronous single CM-handle pass-through read with alternate id
Modify the k6 test config so it only includes the scenario of passthrough read with alternate ID - just delete all the others from k6-tests/ncmp/config/kpi.son:
"scenarios": {
"passthrough_read_alt_id_scenario": {
"executor": "constant-arrival-rate",
"exec": "passthroughReadAltIdScenario",
"rate": 25,
"timeUnit": "1s",
"duration": "15m",
"preAllocatedVUs": 10,
"startTime": "200ms"
}
},
"thresholds": {
"http_req_failed": ["rate == 0"],
Then follow the outlined steps in Performance testing CPS/NCMP using metrics | Generating report for a single test case
The output file will look something like this:
Method | Count | Sum |
http_server_requests_seconds{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/ncmp/v1/ch/{cm-handle}/data/ds/{datastore-name}"} | 1203 | 395.00 |
cps_ncmp_controller_get_seconds{class="org.onap.cps.ncmp.rest.controller.NetworkCmProxyController",exception="none",method="getResourceDataForCmHandle"} | 1202 | 392.99 |
http_client_requests_seconds{client_name="ncmp-dmi-plugin-demo-and-csit-stub",error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/dmi/v1/ch/{cmHandleId}/data/ds/{datastore}?resourceIdentifier={resourceIdentifier}"} | 1202 | 364.85 |
cps_ncmp_dmi_get_seconds{class="org.onap.cps.ncmp.impl.data.DmiDataOperations",exception="none",method="getResourceDataFromDmi"} | 1203 | 27.20 |
cps_data_service_datanode_query_seconds{class="org.onap.cps.impl.CpsQueryServiceImpl",exception="none",method="queryDataNodes"} | 1203 | 22.51 |
spring_data_repository_invocations_seconds{exception="None",method="findByAnchorAndCpsPath",repository="FragmentRepository",state="SUCCESS"} | 1203 | 18.50 |
cps_data_service_datanode_get_seconds{class="org.onap.cps.impl.CpsDataServiceImpl",exception="none",method="getDataNodes"} | 1203 | 4.23 |
cps_data_persistence_service_datanode_get_seconds{class="org.onap.cps.ri.CpsDataPersistenceServiceImpl",exception="none",method="getDataNodes"} | 1203 | 4.22 |
spring_data_repository_invocations_seconds{exception="None",method="prefetchDescendantsOfFragmentEntities",repository="FragmentRepository",state="SUCCESS"} | 2406 | 2.12 |
spring_data_repository_invocations_seconds{exception="None",method="findByName",repository="DataspaceRepository",state="SUCCESS"} | 2405 | 2.02 |
spring_data_repository_invocations_seconds{exception="None",method="findByDataspaceAndName",repository="AnchorRepository",state="SUCCESS"} | 2405 | 1.67 |
spring_data_repository_invocations_seconds{exception="None",method="findByAnchorIdAndXpathIn",repository="FragmentRepository",state="SUCCESS"} | 1203 | 0.54 |
From the above, nearly all time is spent in the http server call to /ncmp/v1/ch/{cm-handle}/data/ds/{datastore-name} - which is the Rest endpoint for the passthrough data request. Following this, most of the time is in NetworkCmProxyController. The third most frequent
Example Analysis: CM-handle search with Property filter
Rerunning above with only the k6 scenario cm_handle_search_property_scenario will give output like this:
Method | Count | Sum |
http_server_requests_seconds{error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/ncmp/v1/ch/searches"} | 8 | 33.77 |
cps_data_service_datanode_batch_get_seconds{class="org.onap.cps.impl.CpsDataServiceImpl",exception="none",method="getDataNodesForMultipleXpaths"} | 9 | 29.69 |
cps_data_persistence_service_datanode_batch_get_seconds{class="org.onap.cps.ri.CpsDataPersistenceServiceImpl",exception="none",method="getDataNodesForMultipleXpaths"} | 9 | 29.69 |
spring_data_repository_invocations_seconds{exception="None",method="prefetchDescendantsOfFragmentEntities",repository="FragmentRepository",state="SUCCESS"} | 66 | 18.17 |
spring_data_repository_invocations_seconds{exception="None",method="findByAnchorAndCpsPath",repository="FragmentRepository",state="SUCCESS"} | 58 | 7.35 |
cps_data_service_datanode_query_seconds{class="org.onap.cps.impl.CpsQueryServiceImpl",exception="none",method="queryDataNodes"} | 50 | 3.78 |
spring_data_repository_invocations_seconds{exception="None",method="findByAnchorIdAndXpathIn",repository="FragmentRepository",state="SUCCESS"} | 8 | 2.25 |
spring_data_repository_invocations_seconds{exception="None",method="findByName",repository="DataspaceRepository",state="SUCCESS"} | 66 | 0.07 |
spring_data_repository_invocations_seconds{exception="None",method="findByDataspaceAndName",repository="AnchorRepository",state="SUCCESS"} | 66 | 0.04 |
As can be seen:
33.77 seconds were spent executing /ncmp/v1/ch/searches
29.69 seconds were spent in CpsDataService getDataNodesForMultipleXpaths.
Thus 88% of the total request time was in the getDataNodesForMultipleXpaths method.