CPS-2444: Add Endurance tests for NCMP

References

CPS-2444: Add Endurance tests for NCMPIn Progress

Issues & Decisions

Issue

Notes 

Decision

Issue

Notes 

Decision

1

Agree on endurance performance KPI

The aim of the study is to early detection of bugs similar to CPS-2430: CPS-NCMP: Runs out of heap space and restartsClosed or any memory leakage in the future.

Thus, we should spot the specific function/s currently used in k6 to be improved. These function/s will be used in the Endurance test suit.

Actions:

  1. @Daniel Hanrahan and @Lee Anjella Macabuhay can shed light on this item since they were resolving the bug.

  2. After identifying the test function/s, what can we functionally do to improve them?
    The reproduction of the bug includes two more data operations that are not available in the current k6 suit:

    1. Passthrough-write (PATCH)

    2. Passthrough (DELETE) / Delete Resource Data

  3. Regarding parallelism, how many VUs should be assigned to these tests?
    The suggested number of VUs with 'constant-vus' executor:

    1. Passthrough-read (READ): 4

    2. Passthrough-write (POST): 4

    3. Passthrough-write (PATCH) : 4

    4. Passthrough (DELETE) / Delete Resource Data: 4

CPS need to run all the test cases according to FS not according to what was tested in cps-2430 (Seeing they ran test that was outside of FS). Less focus on the KPI for this endurance run. @Kolawole Adebisi-Adeolokun @Halil Cakal

2

Agree on the test environment

Blocked by CPS-2463: Two docker-compose deployments simultaneouslyClosed

  1. if a new test server can be provided, then no need to have a secondary docker-compose deployment descriptor (port and container name changes)

  2. if no new server can be provided, then a secondary docker-compose deployment descriptor will be necessary

    1. There will be 3 different pipelines will be running on the physical test server:

      1. CPS performance tests (groovy-based)

      2. NCMP performance tests (k6-based)

      3. NCMP Endurance tests (k6-based, long-running)

  3. Team Kraken could set up test envi for us

  1. No parallelism. A conversation with Kraken to discuss option #3 @Halil Cakal

  2. The discussion was made with Team Kraken and decided to have two docker-compose deployments on the test server. @Daniel Hanrahan @Halil Cakal @Toine Siebelink Oct 29, 2024

    1. The PoC has been done on Nordix.
      PoC using two docker-compose using an environment file

  3. The number of available executors should be sufficient to run a possible three jobs on the physical performance test server. @Daniel Hanrahan @Toine Siebelink @Kolawole Adebisi-Adeolokun @Halil Cakal Nov 14, 2024

3

Visualization of memory trend

The most convenient way to represent the memory usage trend in a GUI is Grafana.

  1. Grafana

  2. GNUPlot (3pp library current in the use of k6 and Groovy test Visualization)

    1. If Gnuplot is to visualize memory usage, which parameters, logs, or output should be input?

    2. Issue #4 (external access) will no longer be valid. Since Jenkin’s HTML publisher can do it.

Agreed to use Grafana majorly because it allows storage for longer periods sticking with our A/C @Kolawole Adebisi-Adeolokun @Halil Cakal

Note: A PoC for the GNUPlot option has been made, please see the relevant Jira ticket details. @Daniel Hanrahan @Halil Cakal @Toine Siebelink Oct 29, 2024 CPS-2466: Add Grafana support to visualize memory usage patternOpen

4

Grafana access externally (If Grafana is selected at #3 then this issue should be discussed, otherwise ignore)

  1. Make local GUI link accessible externally

  2. Give server access to the CPS team

Team Kraken support is needed either way.

Since Grafana is the preferred option, access issues shall be discussed with team Kraken @Halil Cakal (consider Jenkins plug-in option as well)7

  • Discussed with Team Kraken and will revisit this option after having a secondary docker-compose deployment and their pipelines. @Daniel Hanrahan @Halil Cakal @Toine Siebelink Oct 29, 2024

5

Permanent storage (DB) for Grafana (If Grafana is selected at #3 then this issue should be discussed, otherwise, ignore)

Investigate the solution strategy for the storage e.g. a permanent service 7/24 running …

@Halil Cakal to investigate how to tune the volume

Requirements

Functional

Interface

Requirement

Additional Information

Signoff

Interface

Requirement

Additional Information

Signoff

1

N/A

Configurable duration (longevity)

  • The endurance test should be running 2 hours

  • The duration should be configurable

Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink

2

N/A

The Grafana support

  • Memory usage should be visualized

  • The history should be kept for the last seven days e.g. in Grafana storage/DB

Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink

3

N/A

Parallel run

The new k6 suit should be able to run in parallel to existing ones

Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink

4

N/A

A new Jenkins job

The new Jenkin’s job should be independent of the current ones

Oct 21, 2024 @Kolawole Adebisi-Adeolokun @Toine Siebelink

Error Handling

Scenario

Expected Behavior

Notes

Signoff

Scenario

Expected Behavior

Notes

Signoff

1

N/A

N/A

 

 

Characteristics

Parameter

Expectation

Notes

Signoff

Parameter

Expectation

Notes

Signoff

1

longevity - configurable duration

Should be able to identify any memory leakage

 

 

Out of Scope

  • Reporting the reason for memory leakage if any.

  • Troubleshooting if the Endurance Tests can not run parallel to existing ones or cause issues in other tests on the physical test server.

 

Suggested Tasks

Description

Jira

Description

Jira

Add new k6 performance test suit

CPS-2464: Add new k6 endurance-performance installment scriptsSubmitted

Add new Jenkins job to run endurance test

CPS-2465: Add new Jenkins job to run endurance testIn Progress

Add Grafana support to visualize memory usage pattern

CPS-2466: Add Grafana support to visualize memory usage patternOpen

Two docker-compose deployments simultaneously

CPS-2463: Two docker-compose deployments simultaneouslyClosed

Add new k6 test suit

CPS-2493: Agree and Define new 'Suite' (js)Submitted

Solution Proposal

Agree and Define new ‘Suite’ (js)

  1. The existing K6 performance tests are enough to detect any memory leakage so that no new K6 test will be added. @Daniel Hanrahan @Halil Cakal Oct 29, 2024
    There should be a new test suit @Toine Siebelink @Kolawole Adebisi-Adeolokun @Halil Cakal Nov 11, 2024 CPS-2493: Agree and Define new 'Suite' (js)Submitted
    HOW: Daniel and I decided to define a new test suite (endurance). @Daniel Hanrahan @Halil Cakal Nov 14, 2024
    Have one ncmp-test-runner.js with different configs: kpi.json and endurance.json, moving the scenarios and thresholds settings into the json configs
    The endurance test should run all tests in parallel. Also, the same scenarios, executor types, and VUs should be the same in the Endurance suit except for the legacy_batch_consume_scenario. The executor type for this scenario should be changed to constant-arrival-rate with 1 req/second. @Toine Siebelink @Daniel Hanrahan @Kolawole Adebisi-Adeolokun @Halil Cakal Nov 14, 2024
    As an example:

    { "hosts": { "ncmpBaseUrl": "http://localhost:8883", "dmiStubUrl": "http://ncmp-dmi-plugin-demo-and-csit-stub:8092", "kafkaBootstrapServer": "localhost:9092" }, "kafka": { "legacyBatchTopic": "legacy_batch_topic" }, "scenarios": { "passthrough_read_scenario": { "executor": "constant-vus", "exec": "passthroughReadScenario", "vus": 2, "duration": "15m" } }, "thresholds": { "http_req_failed": ["rate == 0"], "cmhandles_created_per_second": ["avg >= 22"], "cmhandles_deleted_per_second": ["avg >= 22"], "ncmp_overhead_passthrough_read": ["avg <= 40"] } }

The current KPI and ENDURANCE (proposed) test parameters (Last updated Nov 14, 2024 )

1

K P I

E N D U R A N C E

2

Test Stages

Scenario Name

Unit

Executor Type

VUs

Duration

Scenario Name

VUS

Executor Type

Duration

3

setup

create_cm_handles

CM-handles/second

N/A

N/A

20m

create_cm_handles

N/A

N/A

20m

4

 

 

 

 

 

 

 

 

 

scenario

 

passthrough_read_scenario

overhead

constant-vus

2

 

 

 

 

 

 

 

 

15m

passthrough_read_scenario

2

constant-vus

 

 

 

 

 

 

 

 

1h

5

passthrough_read_alt_id_scenario

overhead

constant-vus

2

passthrough_read_alt_id_scenario

2

constant-vus

6

passthrough_write_scenario

overhead

constant-vus

2

passthrough_write_scenario

2

constant-vus

7

passthrough_write_alt_id_scenario

overhead

constant-vus

2

passthrough_write_alt_id_scenario

2

constant-vus

8

cm_handle_id_search_nofilter_scenario

milliseconds

constant-vus

1

cm_handle_id_search_nofilter_scenario

1

constant-vus

9

cm_handle_id_search_module_scenario

milliseconds

constant-vus

1

cm_handle_id_search_module_scenario

1

constant-vus

10

cm_handle_id_search_property_scenario

milliseconds

constant-vus

1

cm_handle_id_search_property_scenario

1

constant-vus

11

cm_handle_id_search_cpspath_scenario

milliseconds

constant-vus

1

cm_handle_id_search_cpspath_scenario

1

constant-vus

12

cm_handle_id_search_trustlevel_scenario

milliseconds

constant-vus

1

cm_handle_id_search_trustlevel_scenario

1

constant-vus

13

cm_handle_search_nofilter_scenario

milliseconds

constant-vus

1

cm_handle_search_nofilter_scenario

1

constant-vus

14

cm_handle_search_module_scenario

milliseconds

constant-vus

1

cm_handle_search_module_scenario

1

constant-vus

15

cm_handle_search_property_scenario

milliseconds

constant-vus

1

cm_handle_search_property_scenario

1

constant-vus

16

cm_handle_search_cpspath_scenario

milliseconds

constant-vus

1

cm_handle_search_cpspath_scenario

1

constant-vus

17

cm_handle_search_trustlevel_scenario

milliseconds

constant-vus

1

cm_handle_search_trustlevel_scenario

1

constant-vus

18

legacy_batch_produce_scenario

milliseconds

shared-iterations

2

N/A

legacy_batch_produce_scenario

1 (1 req. sec)

constant-arrival-rate

19

legacy_batch_consume_scenario

events/second

per-vu-iterations

1

 

 

 

20

teardown

delete_cm_handles

CM-handles/second

N/A

N/A

20m

delete_cm_handles

N/A

 

10m

Visualizing the ENDURANCE test results

As mentioned in issues/decisions, there are two alternative ways of representing memory trends: Grafana and GNUPlot.

  1. Grafana
    EST has its own Prometheus and Grafana: (externally accessible)
    https://monitoring.nordix.org/login
    http://monitoring.est.tech/

    Through prometheus.yml (of EST), a new scrape_configs for the cps-and-ncmp microservice can be added:

    scrape_configs: - job_name: 'cps-and-ncmp' metrics_path: '/actuator/prometheus' scrape_interval: 5s static_configs: - targets: - 'cps-and-ncmp:8080' // replace by <physical-server-ip:port>

 

The trend of G1 Old Gen space can be observed as seen below:

image-20241104-112456.png
  • Gnuplot also allows one to draw a plot for only G1 Old Gen space, as we used it for our K6 plots.