Performance Testing CPS & NCMP using Metrics-Driven Analysis
- 1 Abstract
- 2 Prerequisites for Gathering Metrics on a Laptop
- 3 Running k6 tests
- 3.1 Running k6 test suite locally
- 3.2 Tips & Tricks
- 3.2.1 Running endurance test suite locally
- 3.2.2 How to Run k6 test suite with Kafka Support on Windows locally
- 3.2.2.1 Overview
- 3.2.2.2 Prerequisites
- 3.2.2.3 Install jq for Windows
- 3.2.2.4 Verify Your Setup
- 3.2.2.5 Step 1: Build a Custom k6 with Kafka Support
- 3.2.2.6 Step 2: (Optional) Add Custom k6 to System PATH
- 3.2.2.7 Step 3: Run k6 Tests
- 3.2.3 Importing k6 summary report to a spreadsheet
- 3.2.4 Quick tests: When you are in a hurry to make a report
- 4 Profiling with VisualVM
- 5 Analyzing SQL queries in Postgresql
- 6 Profiling with reports of @Timed annotations
Abstract
This document describes analyzing the performance of CPS/NCMP using metrics-based analysis.
This guide will use case studies of analyzing performance of some NCMP operations, and be able to answer questions like:
What methods take the most time?
What percentage of time is spent in the Database, Hazelcast, Java, and DMI?
Additionally, there are instructions on doing deeper performance analysis using code profiling and analysis of SQL queries.
Prerequisites for Gathering Metrics on a Laptop
Disabling Turbo Boost on CPU
It is needed to disable Turbo Boost if running performance tests on a laptop, otherwise test results will fluctuate wildly between runs. Here are instructions to disable it on Dell XPS laptops: https://www.dell.com/support/contents/en-ie/videos/videoplayer/tutorial-on-disable-turbo-boost-within-bios/6176626459001
Background on Turbo Boost and Performance Testing
Some CPUs have a feature called dynamic frequency scaling which allows the CPU to run at a higher clock speed when running demanding tasks. Intel processors call this “Turbo Boost”. While the feature can improve performance, it can make testing unreliable, as the system may overheat and throttle when running sustained loads. This is because Turbo Boost allows the system to temporarily exceed Thermal Design Power (TDP).
Additionally, the server running NCMP’s KPI Jenkins job has a clock speed of around 2.5 GHz. On a modern laptop with an i9 processor, the max clock speed is 2.6 GHz with turbo off, but 5.0 GHz with turbo on (twice as fast). Disabling Turbo will give a better approximation of real-world server performance.
Running k6 tests
Running k6 test suite locally
Running the k6 tests is straightforward. From the command line, go to the cps repo and execute:
./k6-tests/run-k6-tests.shAs can be seen below, the script will install needed dependencies, and kick of the KPI test suite:
---> install-deps.sh
Installing dependencies
docker-compose already installed
Docker Compose version v2.29.2
k6 already installed
k6 v0.51.0 (go1.22.3, linux/amd64)
Extensions:
github.com/mostafa/xk6-kafka v0.26.0, k6/x/kafka [js]
Test profile provided: kpi
Spinning off the CPS and NCMP containers for kpi testing...
[+] Running 9/9
✔ Network kpi_default Created 0.2s
✔ Container kpi-dbpostgresql Healthy 6.1s
✔ Container kpi-ncmp-dmi-plugin-demo-and-csit-stub Healt... 11.6s
✔ Container kpi-zookeeper Healthy 11.1s
✔ Container kpi-postgres-exporter Healthy 2.1s
✔ Container kpi-cps-and-ncmp-2 Healthy 37.1s
✔ Container kpi-cps-and-ncmp-1 Healthy 37.1s
✔ Container kpi-kafka Healthy 12.5s
✔ Container kpi-nginx-loadbalancer Healthy 37.5s
Build information:
{"git":{"commit":{"message":{"short":"Enhance batch operation performance with AlternateIdMatcher"},"user":{"name":"leventecsanyi","email":"levente.csanyi@est.tech"},"id":{"full":"20358e890a54627d6f6962c462d1a55ea24d44a7"}},"branch":"levente","build":{"time":"2025-03-19T11:13:39.039Z","version":"3.6.2-SNAPSHOT"}}}
Running kpi performance tests...After around 30 minutes, you will see a test report - note the CSV report can be copy-pasted into a spreadsheet for further analysis if needed.:
-- BEGIN CSV REPORT
#,Test Name,Unit,Fs Requirement,Current Expectation,Actual
0,HTTP request failures for all tests,rate of failed requests,0,0,0.000
1,Registration of CM-handles,CM-handles/second,22,90,137.451
2,De-registration of CM-handles,CM-handles/second,22,140,391.481
3a,CM-handle ID search with No filter,milliseconds,2000,300,242.793
3b,CM-handle ID search with Module filter,milliseconds,2000,120,65.597
3c,CM-handle ID search with Property filter,milliseconds,2000,750,520.572
3d,CM-handle ID search with Cps Path filter,milliseconds,2000,750,536.860
3e,CM-handle ID search with Trust Level filter,milliseconds,2000,3000,445.227
4a,CM-handle search with No filter,milliseconds,15000,3000,3003.934
4b,CM-handle search with Module filter,milliseconds,15000,4000,3458.808
4c,CM-handle search with Property filter,milliseconds,15000,4500,3975.444
4d,CM-handle search with Cps Path filter,milliseconds,15000,4500,3914.949
4e,CM-handle search with Trust Level filter,milliseconds,15000,7000,4051.910
5a,NCMP overhead for Synchronous single CM-handle pass-through read,milliseconds,40,20,18.244
5b,NCMP overhead for Synchronous single CM-handle pass-through read with alternate id,milliseconds,40,40,25.716
6a,NCMP overhead for Synchronous single CM-handle pass-through write,milliseconds,40,20,19.702
6b,NCMP overhead for Synchronous single CM-handle pass-through write with alternate id,milliseconds,40,40,25.995
7,Legacy batch read operation,events/second,150,300,648.992
-- END CSV REPORTFinally, the docker environment will be shut down:
Stopping, Removing containers and volumes for kpi tests...
[+] Running 9/9
✔ Container kpi-ncmp-dmi-plugin-demo-and-csit-stub Remov... 0.4s
✔ Container kpi-kafka Removed 1.0s
✔ Container kpi-nginx-loadbalancer Removed 0.4s
✔ Container kpi-postgres-exporter Removed 0.3s
✔ Container kpi-cps-and-ncmp-2 Removed 2.6s
✔ Container kpi-cps-and-ncmp-1 Removed 2.5s
✔ Container kpi-zookeeper Removed 0.6s
✔ Container kpi-dbpostgresql Removed 0.5s
✔ Network kpi_default Removed 0.3s Tips & Tricks
Running endurance test suite locally
When running the test suite with the script, you may run endurance test instead using the script:
./k6-tests/run-k6-tests.sh enduranceHow to Run k6 test suite with Kafka Support on Windows locally
This guide provides step-by-step instructions to run k6 performance tests with Kafka support on Windows using the xk6 ecosystem.
Overview
The Kafka extension for k6 is part of the xk6 ecosystem, which requires building a custom k6 binary with the desired extension (xk6-kafka).
Prerequisites
Ensure the following are installed on your system:
Go: Download Go
Git: Download Git
jq: Command-line JSON processor
Install jq for Windows
Go to the jq GitHub Releases page.
Download:
jq-win64.exefor 64-bit systemsjq-win32.exefor 32-bit systems
Rename the downloaded file to
jq.exe(optional but recommended).Move
jq.exeto a folder included in your system’s PATH (e.g.,C:\Windows\System32orC:\Program Files\jq\).
Verify Your Setup
Open a new PowerShell or Command Prompt window and run:
go version
git --version
jq --versionStep 1: Build a Custom k6 with Kafka Support
Open PowerShell.
Install the
xk6CLI:
go install go.k6.io/xk6/cmd/xk6@latest Build the custom
k6binary with Kafka support:
xk6 build --with github.com/mostafa/xk6-kafka@latest Note : This will generate a k6.exe file in your current directory with Kafka support built-in.
Step 2: (Optional) Add Custom k6 to System PATH
To use k6.exe globally without navigating to its directory each time:
Move the binary to a folder already in your PATH (e.g.,
C:\Tools\k6\)Or, add the folder containing
k6.exeto the Environment Variables >Path
Step 3: Run k6 Tests
Running the k6 tests is straightforward:
Open Git Bash or any compatible terminal.
Navigate to the
cpsrepository directory:
cd path/to/cpsExecute the test runner script:
./k6-tests/run-k6-tests.sh kpi or ./k6-tests/run-k6-tests.sh endurance
The k6 tests should then start running:
If you have the docker desktop, you should be able to see its own container where logs are also visible.
Importing k6 summary report to a spreadsheet
In the above section, it was shown that running k6 suite using ./k6-tests/run-k6-tests.sh will produce a CSV report on the command-line: copy text between -- BEGIN CSV REPORT and -- END CSV REPORT tags. (Note the Jenkins jobs also include such reports at https://jenkins.nordix.org/job/onap-cps-performance-test-k6/)
The report can be copied and pasted to your spreadsheet application. For LibreOffice Calc, pasting the output will give a prompt for text import - ensure that “comma” is the selected separator:
After importing, the data should be in table form like below:
If you have problems pasting into a table, try the “Paste Special > Unformatted Text” option.
Once you are able to import to spreadsheet, you can now use spreadsheet capabilities to produce nice comparative reports!
Quick tests: When you are in a hurry to make a report
It can take roughly 30 minutes to run the whole k6 test suite. If you want to do quick comparisons of patches, you can adjust different parameters to speed up the test suite so it runs in under 10 minutes.
Reduce container shutdown delay
By default, the k6 KPI test suite will wait 5 minutes after the suite completes. This was done so that the system can be observed for a while after the test suite finishes.
To reduce this delay, in k6-tests/ncmp/config/kpi.json, change containerCoolDownTimeInSeconds to a small value like 10.
Decrease test duration
By default, the k6 KPI test suite will run a mixed-load test for 15 minutes, while endurance test suite will run for 2 hours - not including test setup/teardown times (registration/de-registration)!
To reduce the main test durations, edit k6-tests/ncmp/config/kpi.json, and replace all instances of “15m” with “5m” or similar.
Even a 5 minute test should be reflective of overall performance. For example, the test suite does 5 CM-handle ID searches per 2 second, so in 5 minutes we do 750 searches! For pass-through data operations, we do 25 reads per second, so in 5 minutes we do 7500 operations! A lot of operations happen in 5 minutes.
Reduce number of CM-handles
To speed up testing, you may want to test with less CM-handles. In k6-tests/ncmp/common/utils.js, TOTAL_CM_HANDLES controls the number of handles used in the test suite. Reducing this from 50000 to 20000 will give faster registration/de-registration, while still being useful for local performance comparisons.
Disable DMI Stub’s initial module processing delay
k6 tests use the DMI stub implementation, which simulates delays for various operations. This includes a delay for module sync. In docker-compose/docker-compose.yml, change the variable MODULE_INITIAL_PROCESSING_DELAY_MS to 0 to disable this delay. By default, it is 180000 ms (3 minutes).
Profiling with VisualVM
VisualVM is a visual tool integrating commandline JDK tools and lightweight profiling capabilities.
Setup VisualVM
Download & Run VisualVM
Download VisualVM from the official website: https://visualvm.github.io/
Run VisualVM. For example, on a Linux system where VisualVM is unzipped to the “Downloads” folder:
~/Downloads/visualvm_22/bin/visualvmRuning it will present the startup screen:
Expose JMX Remote Port in docker-compose
In docker-compose.yml, add parameters to enable JMX, and expose the JMX port 9010:
cps-and-ncmp:
ports:
- ${CPS_PORT_RANGE:-8698-8699}:8080
- ${CPS_CORE_JMX_PORT:-9010-9011}:9010
environment:
JAVA_TOOL_OPTIONS: >
-XX:InitialRAMPercentage=70.0 -XX:MaxRAMPercentage=70.0
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=falseNow start the docker-compose environment:
docker-compose -f docker-compose/docker-compose.yml --profile dmi-stub --project-name kpi up --waitAttach VisualVM to CPS
On the first time running, do “File → Add JMX Connection”, and add “localhost:9010” with no security:
Double-click “org.onap.cps.Application” and see the overview:
Confirm VisualVM is working
You can also browse through the tabs, such as Overview, Monitor, Threads, and Sampler:
Overview | Monitor | Threads | Sampler |
|---|---|---|---|
|
|
|
|
Profiling CPS with VisualVM
Experimenting with Profiling
Start the k6 KPI test suite:
./k6-tests/run-k6-tests.shWait for the system to start, then attach VisualVM to CPS/NCMP when available. You should see activity in VisualVM while the test suite is running. For example:
You may monitor docker logs to see which REST requests are being sent:
docker logs -f kpi-nginx-loadbalancerProfiling Code with VisualVM
In VisualVM’s Sampler, change the settings to only include classes from “org.onap.cps.**”:
Ensure the sampler is stopped.
Select the “Settings” box on the right side, as in the screenshot below.
Only profile classes in the org.onap.cps package:
See that there are options to “Profile only classes” or “Do no profile classes”.
It is more useful to only profile classes under the org.onap.cps package. To include all classes in that package and sub-packages, use the ** operator: “org.onap.cps.**”.
It could be useful in some cases to add additional packages like for Spring Data or Hazelcast.
Now start sampling CPU - after setting the above package filter and setting parameters such as sort by “Total Time (CPU)”, you should see which Java methods take the most time:
In the above analysis, CpsLoggingAspectService is consuming around 5% of overall CPU - even though trace logging is not enabled! This a great candidate for optimization!
Analyzing SQL queries in Postgresql
During performance analysis of operations in CPS/NCMP, it will often be the case that much time is spent in database queries. Postgres can be used to analyze performance using the EXPLAIN ANALYZE command. But in order to run those queries used in NCMP, it’s necessary to populate the database with realistic data. We will do this by running the k6 tests to do CM-handle registration.
Populating the database with CM-handles
A k6 script will be used to populate the database with CM-handles. Start docker-compose using kpi profile, and run the k6 script to register CM-handles:
docker-compose -f docker-compose/docker-compose.yml --project-name kpi up --wait
k6 run ./k6-tests/ncmp/register-cmhandles-only.js -e TEST_PROFILE=kpiThe system will now be running and populated with CM-handles.
Connecting to the database
You will need a Postgresql client for this. On Ubuntu Linux, you can install the postgresql client:
sudo apt-get install postgresql-clientOn command line, connect to the database using psql to localhost port 5432:
psql -h 127.0.0.1 -p 5432 -U cps cpsdbThe default password is ‘cps’.
You can verify that the database is populated by examining the dataspace/anchor/fragment tables. For example:
cpsdb=# select * from dataspace;
id | name
----+-----------------
1 | CPS-Admin
2 | NCMP-Admin
4 | NFP-Operational
(3 rows)
cpsdb=# select count(*) from anchor;
count
-------
50003
(1 row)Analyzing SQL queries
Let’s say you wish to analyze the performance of the query to get a dataspace by name. First run the query to make sure it returns the expected result:
cpsdb=# SELECT * FROM dataspace WHERE name = 'NCMP-Admin';
id | name
----+------------
2 | NCMP-Admin
(1 row)Now you may analyze performance with EXPLAIN ANALYZE:
cpsdb=# EXPLAIN ANALYZE SELECT * FROM dataspace WHERE name = 'NCMP-Admin';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Scan using uq_name on dataspace (cost=0.15..8.17 rows=1 width=36) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (name = 'NCMP-Admin'::text)
Planning Time: 0.076 ms
Execution Time: 0.055 ms
(4 rows)In the above case, it can be seen that database is using and Index Scan to find the row, which is very efficient. You can also see the time taken to plan and execute the query.
Shutting down test deployment
When you are finished analysis, you can shut down the system using docker-compose down:
docker-compose -f docker-compose/docker-compose.yml --project-name kpi downIdentify longest-running queries
SQL query statistics need to be enabled.
Enable SQL statement statistics
In docker-compose.yml, change the startup command of the postgres container to load the pg_stat_statements module:
dbpostgresql:
command: postgres -c shared_preload_libraries=pg_stat_statementsNow enable the extension in the postgres initialization file. Add this line to docker-compose/config/postgres-init.sql
CREATE EXTENSION pg_stat_statements;Find the slowest SQL queries
SELECT
mean_exec_time,
max_exec_time,
min_exec_time,
stddev_exec_time,
calls,
LEFT(query, 10000)
FROM
pg_stat_statements
ORDER BY
mean_exec_time DESC LIMIT 3; mean_exec_time | max_exec_time | min_exec_time | stddev_exec_time | calls | query
--------------------+--------------------+--------------------+--------------------+-------+-------------------------------------------------------------------
120.00796527777777 | 160.84067299999998 | 95.650754 | 16.42441218152074 | 18 | WITH RECURSIVE ancestors AS ( +
| | | | | SELECT parentFragment.* FROM fragment parentFragment +
| | | | | WHERE parentFragment.id IN ( +
| | | | | SELECT parent_id FROM fragment WHERE anchor_id = $1 AND (+
| | | | | (xpath LIKE $2) +
| | | | | OR +
| | | | | (xpath LIKE $3||$10 AND xpath NOT LIKE $4||$11) +
| | | | | ) +
| | | | | AND (attributes ->> $12 = $13) ) +
| | | | | UNION +
| | | | | SELECT fragment.* +
| | | | | FROM fragment +
| | | | | JOIN ancestors ON ancestors.parent_id = fragment.id +
| | | | | ) +
| | | | | SELECT DISTINCT (attributes -> $5) FROM ancestors WHERE +
| | | | | jsonb_exists(ancestors.attributes, $6) AND( +
| | | | | (xpath LIKE $7) +
| | | | | OR +
| | | | | (xpath LIKE $8||$14 AND xpath NOT LIKE $9||$15) +
| | | | | )
75.89297900000001 | 123.357365 | 26.776759000000002 | 26.243277705782784 | 14 | WITH RECURSIVE ancestors AS ( +
| | | | | SELECT parentFragment.* FROM fragment parentFragment +
| | | | | WHERE parentFragment.id IN ( +
| | | | | SELECT parent_id FROM fragment WHERE anchor_id = $1 AND (+
| | | | | (xpath LIKE $2) +
| | | | | OR +
| | | | | (xpath LIKE $3||$8 AND xpath NOT LIKE $4||$9) +
| | | | | ) +
| | | | | AND (attributes ->> $10 = $11) ) +
| | | | | UNION +
| | | | | SELECT fragment.* +
| | | | | FROM fragment +
| | | | | JOIN ancestors ON ancestors.parent_id = fragment.id +
| | | | | ) +
| | | | | SELECT * FROM ancestors WHERE( +
| | | | | (xpath LIKE $5) +
| | | | | OR +
| | | | | (xpath LIKE $6||$12 AND xpath NOT LIKE $7||$13) +
| | | | | )
61.071241 | 61.071241 | 61.071241 | 0 | 1 | CREATE DATABASE "cpsdb"
Alternatively, you can output everything to a CSV file:
\copy (SELECT mean_exec_time, max_exec_time, min_exec_time, stddev_exec_time, calls, left(query, 10000) AS query FROM pg_stat_statements ORDER BY mean_exec_time DESC) TO '/tmp/slow_queries.csv' WITH CSV DELIMITER ',' HEADER;You can then open the CSV in a spreadsheet:
Profiling with reports of @Timed annotations
Within the cps repo, there is a folder called ‘test-tools’ which contains some scripts useful for generating reports.
Generating reports
Running the generate-metrics-report.sh script will generate a report and output the filename of it:
cd test-tools/
./generate-metrics-report.sh
metrics-reports/metrics-2025-03-18T10:40:21+00:00.tsvOpening the generated file in a spreadsheet will show a report of exposed Prometheus metrics from CPS/NCMP. It shows the count (how many times the method was called) and the sum (total amount of time spent in that method):
Subtracting reports
It is useful to be able to subtract two reports to generate a delta report. Exposed metrics are cumulative from the application startup, but many times you will only want metrics during a given time. For example, when testing NCMP operations, you may want to exclude set up time like CM-handle registration, and only include metrics while running a test case.
Given you have two reports, you can use the script ‘subtract-metrics-reports.py’ to generate a delta report:
cd test-tools/
./generate-metrics-report.sh
metrics-reports/metrics-2025-03-18T10:40:21+00:00.tsv
./generate-metrics-report.sh
metrics-reports/metrics-2025-03-18T10:42:22+00:00.tsv
./subtract-metrics-reports.py -a metrics-reports/metrics-2025-03-18T10:42:22+00:00.tsv \
-b metrics-reports/metrics-2025-03-18T10:40:21+00:00.tsv \
-o my-report.tsv-a specifies the later report (after)
-b specifies the earlier report (before)
-o specifies the name of the output file
Note you are likely to encounter an error if you are running two instances of NCMP, as each instance have their own metrics. It is recommended to change the docker-compose.yml to have 1 replica of cps-and-ncmp container to prevent this.
Generating report for a single test case
The basic flow is this:
Modify docker-compose.yml to set 1 replica for cps-and-ncmp container.
Modify the k6 config (k6-tests/ncmp/config/kpi.json) to remove scenarios you are not interested in.
Run the k6 tests using k6-tests/run-k6-tests.sh
Wait until registration is finished (e.g. monitoring docker logs: docker logs -f kpi-nginx-loadbalancer)
Take a metrics report using generate-metrics-report.sh
Wait a few minutes
Take another metrics report using generate-metrics-report.sh
Subtract the two reports using subtract-metrics-reports.py
Review the results.
Adding new metrics
The metric reports will only show metrics that are exposed in the application. If you want to add new metrics, you may add @Timed annotations to methods in the Java code and recompile. For example:
@Timed(value = "cps.data.service.datanode.query", description = "Time taken to query data nodes")Example Analysis: CM-handle search with Property filter
Modify the k6 test config so it only includes the cm_handle_search_property_scenario - just delete all the other scenarios from k6-tests/ncmp/config/kpi.son:
"scenarios": {
"cm_handle_search_property_scenario": {
"executor": "constant-arrival-rate",
"exec": "cmHandleSearchPropertyScenario",
"rate": 1,
"timeUnit": "15s",
"duration": "5m",
"preAllocatedVUs": 1,
"startTime": "6s"
}
},
"thresholds": {
"http_req_failed": ["rate == 0"],Then follow the outlined steps in Performance Testing CPS & NCMP using Metrics-Driven Analysis | Generating report for a single test case
The output file will look something like this:
Method | Count | Sum |
http_server_requests_seconds{error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/ncmp/v1/ch/searches"} | 8 | 33.77 |
cps_data_service_datanode_batch_get_seconds{class="org.onap.cps.impl.CpsDataServiceImpl",exception="none",method="getDataNodesForMultipleXpaths"} | 9 | 29.69 |
cps_data_persistence_service_datanode_batch_get_seconds{class="org.onap.cps.ri.CpsDataPersistenceServiceImpl",exception="none",method="getDataNodesForMultipleXpaths"} | 9 | 29.69 |
spring_data_repository_invocations_seconds{exception="None",method="prefetchDescendantsOfFragmentEntities",repository="FragmentRepository",state="SUCCESS"} | 66 | 18.17 |
spring_data_repository_invocations_seconds{exception="None",method="findByAnchorAndCpsPath",repository="FragmentRepository",state="SUCCESS"} | 58 | 7.35 |
cps_data_service_datanode_query_seconds{class="org.onap.cps.impl.CpsQueryServiceImpl",exception="none",method="queryDataNodes"} | 50 | 3.78 |
spring_data_repository_invocations_seconds{exception="None",method="findByAnchorIdAndXpathIn",repository="FragmentRepository",state="SUCCESS"} | 8 | 2.25 |
spring_data_repository_invocations_seconds{exception="None",method="findByName",repository="DataspaceRepository",state="SUCCESS"} | 66 | 0.07 |
spring_data_repository_invocations_seconds{exception="None",method="findByDataspaceAndName",repository="AnchorRepository",state="SUCCESS"} | 66 | 0.04 |
As can be seen:
33.77 seconds were spent executing /ncmp/v1/ch/searches
29.69 seconds were spent in CpsDataService getDataNodesForMultipleXpaths
3.78 seconds were spent in the query for the property: queryDataNodes
Thus 88% of the total request time was in the getDataNodesForMultipleXpaths method.