CPS-2444: Add Endurance tests for NCMP
- 1 References
- 2 Issues & Decisions
- 3 Requirements
- 3.1 Functional
- 3.2 Error Handling
- 3.3 Characteristics
- 4 Out of Scope
- 5 Suggested Tasks
- 6 Solution Proposal
- 6.1 Agree and Define new ‘Suite’ (js)
- 6.2 Visualizing the ENDURANCE test results
- 6.2.1 Grafana
- 6.2.1.1 Permanent Storage Alternatives
- 6.2.1.1.1 Prometheus
- 6.2.1.1 Permanent Storage Alternatives
- 6.2.2 GNUPlot
- 6.2.1 Grafana
References
CPS-2444: Add Endurance tests for NCMPIn Progress
Issues & Decisions
Issue | Notes | Decision | |
---|---|---|---|
1 | Agree on endurance performance KPI | The aim of the study is to early detection of bugs similar to CPS-2430: CPS-NCMP: Runs out of heap space and restartsClosed or any memory leakage in the future. Thus, we should spot the specific function/s currently used in k6 to be improved. These function/s will be used in the Endurance test suit. Actions:
| CPS need to run all the test cases according to FS not according to what was tested in cps-2430 (Seeing they ran test that was outside of FS). Less focus on the KPI for this endurance run. @Kolawole Adebisi-Adeolokun @Halil Cakal |
2 | Agree on the test environment | Blocked by CPS-2463: Two docker-compose deployments simultaneouslyClosed
|
|
3 | Visualization of memory trend | The most convenient way to represent the memory usage trend in a GUI is Grafana.
| Agreed to use Grafana majorly because it allows storage for longer periods sticking with our A/C @Kolawole Adebisi-Adeolokun @Halil Cakal Note: A PoC for the GNUPlot option has been made, please see the relevant Jira ticket details. @Daniel Hanrahan @Halil Cakal @Toine Siebelink Oct 29, 2024 CPS-2466: Add Grafana support to visualize memory usage patternOpen |
4 | Grafana access externally (If Grafana is selected at #3 then this issue should be discussed, otherwise ignore) |
Team Kraken support is needed either way. | Since Grafana is the preferred option, access issues shall be discussed with team Kraken @Halil Cakal (consider Jenkins plug-in option as well)7
|
5 | Permanent storage (DB) for Grafana (If Grafana is selected at #3 then this issue should be discussed, otherwise, ignore) | Investigate the solution strategy for the storage e.g. a permanent service 7/24 running … | @Halil Cakal to investigate how to tune the volume |
Requirements
Functional
Interface | Requirement | Additional Information | Signoff | |
---|---|---|---|---|
1 | N/A | Configurable duration (longevity) |
| Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink |
2 | N/A | The Grafana support |
| Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink |
3 | N/A | Parallel run | The new k6 suit should be able to run in parallel to existing ones | Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink |
4 | N/A | A new Jenkins job | The new Jenkin’s job should be independent of the current ones | Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink |
Error Handling
Scenario | Expected Behavior | Notes | Signoff | |
---|---|---|---|---|
1 | N/A | N/A |
|
|
Characteristics
Parameter | Expectation | Notes | Signoff | |
---|---|---|---|---|
1 | longevity - configurable duration | Should be able to identify any memory leakage |
|
|
Out of Scope
Reporting the reason for memory leakage if any.
Troubleshooting if the Endurance Tests can not run parallel to existing ones or cause issues in other tests on the physical test server.
Suggested Tasks
Description | Jira |
---|---|
Add new test profile ‘Endurance’ | CPS-2464: Add new k6 endurance-performance installment scriptsClosed |
Add new Jenkins job to run endurance test | CPS-2465: Add new Jenkins job to run endurance testIn Progress |
Add Grafana support to visualize memory usage pattern | CPS-2466: Add Grafana support to visualize memory usage patternOpen |
Two docker-compose deployments simultaneously | CPS-2463: Two docker-compose deployments simultaneouslyClosed |
Agree and Define new ‘Suite’ (js) |
Solution Proposal
Agree and Define new ‘Suite’ (js)
The existing K6 performance tests are enough to detect any memory leakage so that no new K6 test will be added. @Daniel Hanrahan @Halil Cakal Oct 29, 2024
There should be a new test suit @Toine Siebelink @Kolawole Adebisi-Adeolokun @Halil Cakal Nov 11, 2024 CPS-2493: Agree and Define new 'Suite' (js)Submitted
HOW: Daniel and I decided to define a new test suite (endurance). @Daniel Hanrahan @Halil Cakal Nov 14, 2024
Have one ncmp-test-runner.js with different configs: kpi.json and endurance.json, moving the scenarios and thresholds settings into the json configs
The endurance test should run all tests in parallel. Also, the same scenarios, executor types, and VUs should be the same in the Endurance suit except for the legacy_batch_consume_scenario. The executor type for this scenario should be changed to constant-arrival-rate with 1 req/second. @Toine Siebelink @Daniel Hanrahan @Kolawole Adebisi-Adeolokun @Halil Cakal Nov 14, 2024
As an example:{ "hosts": { "ncmpBaseUrl": "http://localhost:8883", "dmiStubUrl": "http://ncmp-dmi-plugin-demo-and-csit-stub:8092", "kafkaBootstrapServer": "localhost:9092" }, "kafka": { "legacyBatchTopic": "legacy_batch_topic" }, "scenarios": { "passthrough_read_scenario": { "executor": "constant-vus", "exec": "passthroughReadScenario", "vus": 2, "duration": "15m" } }, "thresholds": { "http_req_failed": ["rate == 0"], "cmhandles_created_per_second": ["avg >= 22"], "cmhandles_deleted_per_second": ["avg >= 22"], "ncmp_overhead_passthrough_read": ["avg <= 40"] } }
The current KPI and ENDURANCE (proposed) test parameters (Last updated Nov 14, 2024 )
1 | K P I | E N D U R A N C E | ||||||||
2 | Test Stages | Scenario Name | Unit | Executor Type | VUs | Duration | Scenario Name | VUS | Executor Type | Duration |
3 | setup | create_cm_handles | CM-handles/second | N/A | N/A | 20m | create_cm_handles | N/A | N/A | 20m |
4 |
scenario
| passthrough_read_scenario | overhead | constant-vus | 2 |
15m | passthrough_read_scenario | 2 | constant-vus |
1h |
5 | passthrough_read_alt_id_scenario | overhead | constant-vus | 2 | passthrough_read_alt_id_scenario | 2 | constant-vus | |||
6 | passthrough_write_scenario | overhead | constant-vus | 2 | passthrough_write_scenario | 2 | constant-vus | |||
7 | passthrough_write_alt_id_scenario | overhead | constant-vus | 2 | passthrough_write_alt_id_scenario | 2 | constant-vus | |||
8 | cm_handle_id_search_nofilter_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_nofilter_scenario | 1 | constant-vus | |||
9 | cm_handle_id_search_module_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_module_scenario | 1 | constant-vus | |||
10 | cm_handle_id_search_property_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_property_scenario | 1 | constant-vus | |||
11 | cm_handle_id_search_cpspath_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_cpspath_scenario | 1 | constant-vus | |||
12 | cm_handle_id_search_trustlevel_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_trustlevel_scenario | 1 | constant-vus | |||
13 | cm_handle_search_nofilter_scenario | milliseconds | constant-vus | 1 | cm_handle_search_nofilter_scenario | 1 | constant-vus | |||
14 | cm_handle_search_module_scenario | milliseconds | constant-vus | 1 | cm_handle_search_module_scenario | 1 | constant-vus | |||
15 | cm_handle_search_property_scenario | milliseconds | constant-vus | 1 | cm_handle_search_property_scenario | 1 | constant-vus | |||
16 | cm_handle_search_cpspath_scenario | milliseconds | constant-vus | 1 | cm_handle_search_cpspath_scenario | 1 | constant-vus | |||
17 | cm_handle_search_trustlevel_scenario | milliseconds | constant-vus | 1 | cm_handle_search_trustlevel_scenario | 1 | constant-vus | |||
18 | legacy_batch_produce_scenario | milliseconds | shared-iterations | 2 | N/A | legacy_batch_produce_scenario | 1 (1 req. sec) | constant-arrival-rate | ||
19 | legacy_batch_consume_scenario | events/second | per-vu-iterations | 1 |
|
|
| |||
20 | teardown | delete_cm_handles | CM-handles/second | N/A | N/A | 20m | delete_cm_handles | N/A |
| 10m |
Visualizing the ENDURANCE test results
As mentioned in issues/decisions, there are two alternative ways of representing memory trends: Grafana and GnuPlot.
Grafana
EST has its own Prometheus and Grafana (externally accessible, no need to install Globalprotect) and it can be configured to show cps-and-ncmp memory trends.
Link to EST Grafana: https://monitoring.nordix.org/login
Through prometheus.yml (of EST), a new scrape_configs for the cps-and-ncmp microservice can be added.
scrape_configs:
- job_name: 'cps-and-ncmp'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets:
- 'cps-and-ncmp:8080' // replace by <physical-server-ip:port>
Also, the dashboard provider and dashboard config (jvm-micrometer-dashboard.json) can be added.
providers:
- name: default
orgId: 1
type: file
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true
Then, the trend of G1 Old Gen space can be observed as seen below:
Permanent Storage Alternatives
Prometheus
Configuring the Prometheus with a persistet volume to retain data is possible.
GNUPlot
GnuPlot can also draw a plot for only G1 Old Gen space.