Content Comparison

...

Issue

Notes

Decision

1

Agree on endurance performance KPI

The aim of the study is to early detection of bugs similar to

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2430

or any memory leakage in the future.

Thus, we should spot the specific function/s currently used in k6 to be improved. These function/s will be used in the Endurance test suit.

Actions:

Daniel Hanrahan and Lee Anjella Macabuhay can shed light on this item since they were resolving the bug.
After identifying the test function/s, what can we functionally do to improve them?
The reproduction of the bug includes two more data operations that are not available in the current k6 suit:
1. Passthrough-write (PATCH)
2. Passthrough (DELETE) / Delete Resource Data
Regarding parallelism, how many VUs should be assigned to these tests?
The suggested number of VUs with 'constant-vus' executor:
1. Passthrough-read (READ): 4
2. Passthrough-write (POST): 4
3. Passthrough-write (PATCH) : 4
4. Passthrough (DELETE) / Delete Resource Data: 4

CPS need to run all the test cases according to FS not according to what was tested in cps-2430 (Seeing they ran test that was outside of FS). Less focus on the KPI for this endurance run. Kolawole Adebisi-Adeolokun Halil Cakal

2

Agree on the test environment

Blocked by

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2463

if a new test server can be provided, then no need to have a secondary docker-compose deployment descriptor (port and container name changes)
if no new server can be provided, then a secondary docker-compose deployment descriptor will be necessary
1. There will be 3 different pipelines will be running on the physical test server:
  1. CPS performance tests (groovy-based)
  2. NCMP performance tests (k6-based)
  3. NCMP Endurance tests (k6-based, long-running)
Team Kraken could set up test envi for us

~~No parallelism. A conversation with Kraken to discuss option #3~~ Halil Cakal
The discussion was made with Team Kraken and decided to have two docker-compose deployments on the test server. Daniel Hanrahan Halil Cakal Toine Siebelink 29 Oct 2024
1. The PoC has been done on Nordix.
  PoC using two docker-compose using an environment file
The number of available executors should be sufficient to run a possible three jobs on the physical performance test server. Daniel Hanrahan Toine Siebelink Kolawole Adebisi-Adeolokun Halil Cakal 14 Nov 2024

3

Visualization of memory trend

The most convenient way to represent the memory usage trend in a GUI is Grafana.

Grafana
GNUPlot (3pp library current in the use of k6 and Groovy test Visualization)
1. If Gnuplot is to visualize memory usage, which parameters, logs, or output should be input?
2. Issue #4 (external access) will no longer be valid. Since Jenkin’s HTML publisher can do it.

Agreed to use Grafana majorly because it allows storage for longer periods sticking with our A/C Kolawole Adebisi-Adeolokun Halil Cakal

The same decision (proceeding with Grafana) was made by Daniel Hanrahan and Halil Cakal during the daily stand-up. 25 Nov 2024

Note: A PoC for the GNUPlot option has been made, please see the relevant Jira ticket details. Daniel Hanrahan Halil Cakal Toine Siebelink 29 Oct 2024

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2466

4

Grafana access externally (If Grafana is selected at #3 then this issue should be discussed, otherwise ignore)

Make local GUI link accessible externally
Give server access to the CPS team

Team Kraken support is needed either way.

Since Grafana is the preferred option, access issues shall be discussed with team Kraken Halil Cakal (consider Jenkins plug-in option as well)7

Discussed with Team Kraken and will revisit this option after having a secondary docker-compose deployment and their pipelines. Daniel Hanrahan Halil Cakal Toine Siebelink 29 Oct 2024

5

Permanent storage (DB) for Grafana (If Grafana is selected at #3 then this issue should be discussed, otherwise, ignore)

Investigate the solution strategy for the storage e.g. a permanent service 7/24 running …

Halil Cakal to investigate how to tune the volume

...

	Parameter	Expectation	Notes	Signoff
1	longevity - configurable duration	Should be able to identify any memory leakage	2 hours

Out of Scope

Reporting the reason for memory leakage if any.
Troubleshooting if the Endurance Tests ‘endurance' test can not run in parallel to existing ones (‘kpi' test and 'groovy based’ test) or cause issues in other tests on the CPS’ physical test server.

Suggested Tasks

...

Description

...

Jira

...

Add new k6 performance test suit

...

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2464

...

Add new Jenkins job to run endurance test

...

https://lf-onap.atlassian.net/issues/?jql=parent%3DCPS-2444%20ORDER%20BY%20rank

Solution Proposal

Agree and Define new ‘Suite’ (js)

~~The existing K6 performance tests are enough to detect any memory leakage so that no new K6 test will be added.~~ Daniel Hanrahan Halil Cakal 29 Oct 2024
There should be a new test suit Toine Siebelink Kolawole Adebisi-Adeolokun Halil Cakal 11 Nov 2024
Jira Legacy
server System Jira
serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176
key CPS-

...

Add Grafana support to visualize memory usage pattern

...

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2466

...

Two docker-compose deployments simultaneously

...

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2463

...

Add new k6 test suit

...

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2493

Solution Proposal

Agree and Define new ‘Suite’ (js)

...

Jira Legacy

server	System Jira
serverId	4733707d-2057-3a0f-ae5e-4fd8aff50176
key	CPS-2493

...

Code Block

language	json

{
  "hosts": {
    "ncmpBaseUrl": "http://localhost:8883",
    "dmiStubUrl": "http://ncmp-dmi-plugin-demo-and-csit-stub:8092",
    "kafkaBootstrapServer": "localhost:9092"
  },
  "kafka": {
    "legacyBatchTopic": "legacy_batch_topic"
  },
  "scenarios": {
    "passthrough_read_scenario": {
      "executor": "constant-vus",
      "exec": "passthroughReadScenario",
      "vus": 2,
      "duration": "15m"
    }
  },
  "thresholds": {
    "http_req_failed": ["rate == 0"],
    "cmhandles_created_per_second": ["avg >= 22"],
    "cmhandles_deleted_per_second": ["avg >= 22"],
    "ncmp_overhead_passthrough_read": ["avg <= 40"]
  }
}

The current KPI and ENDURANCE (proposed) test parameters (Last updated 14 Nov 2024 )

...

K P I

...

E N D U R A N C E

...

Test Stages

...

Scenario Name

...

Unit

...

Executor Type

...

VUs

...

Duration

...

Scenario Name

...

VUS

...

Executor Type

...

Duration

...

setup

...

create_cm_handles

...

CM-handles/second

...

N/A

...

N/A

...

20m

...

create_cm_handles

...

N/A

...

N/A

...

20m

...

scenario

...

passthrough_read_scenario

...

overhead

...

constant-vus

...

2

15m

...

passthrough_read_scenario

...

2

...

constant-vus

1h

...

passthrough_read_alt_id_scenario

...

overhead

...

constant-vus

...

2

...

passthrough_read_alt_id_scenario

...

2

...

constant-vus

...

passthrough_write_scenario

...

overhead

...

constant-vus

...

2

...

passthrough_write_scenario

...

2

...

constant-vus

...

passthrough_write_alt_id_scenario

...

overhead

...

constant-vus

...

2

...

passthrough_write_alt_id_scenario

...

2

...

constant-vus

...

cm_handle_id_search_nofilter_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_id_search_nofilter_scenario

...

1

...

constant-vus

...

cm_handle_id_search_module_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_id_search_module_scenario

...

1

...

constant-vus

...

cm_handle_id_search_property_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_id_search_property_scenario

...

1

...

constant-vus

...

cm_handle_id_search_cpspath_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_id_search_cpspath_scenario

...

1

...

constant-vus

...

cm_handle_id_search_trustlevel_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_id_search_trustlevel_scenario

...

1

...

constant-vus

...

cm_handle_search_nofilter_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_search_nofilter_scenario

...

1

...

constant-vus

...

cm_handle_search_module_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_search_module_scenario

...

1

...

constant-vus

...

cm_handle_search_property_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_search_property_scenario

...

1

...

constant-vus

...

cm_handle_search_cpspath_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_search_cpspath_scenario

...

1

...

constant-vus

...

cm_handle_search_trustlevel_scenario

...

milliseconds

...

constant-vus

...

1

...

cm_handle_search_trustlevel_scenario

...

1

...

constant-vus

...

legacy_batch_produce_scenario

...

milliseconds

...

shared-iterations

...

2

...

N/A

...

legacy_batch_produce_scenario

...

1 (1 req. sec)

...

constant-arrival-rate

...

legacy_batch_consume_scenario

...

events/second

...

per-vu-iterations

...

1

...

teardown

...

delete_cm_handles

...

CM-handles/second

...

N/A

...

N/A

...

20m

...

delete_cm_handles

...

N/A

...

10m

Visualizing the ENDURANCE test results

As mentioned in issues/decisions, there are two alternative ways of representing memory trends: Grafana and GNUPlot.

...

Code Block
scrape_configs: - job_name: 'cps-and-ncmp' metrics_path: '/actuator/prometheus' scrape_interval: 5s static_configs: - targets: - 'cps-and-ncmp:8080' // replace by <physical-server-ip:port>

The trend of G1 Old Gen space can be observed as seen below:

...

2493

HOW: Daniel and I decided to define a new test suite (endurance.json). Daniel Hanrahan Halil Cakal 14 Nov 2024
Have one ncmp-test-runner.js with different configs: kpi.json and endurance.json, moving the scenarios and thresholds settings into the json configs.
The endurance test should run all tests in parallel. Also, the same scenarios, executor types, and VUs should be the same in the Endurance suit except for the legacy_batch_consume_scenario.
The executor type for this scenario should be changed to constant-arrival-rate with 1 req/second. Toine Siebelink Daniel Hanrahan Kolawole Adebisi-Adeolokun Halil Cakal 14 Nov 2024
As an example of kpi.json:

Code Block

language	json

{
  "hosts": {
    "ncmpBaseUrl": "http://localhost:8883",
    "dmiStubUrl": "http://ncmp-dmi-plugin-demo-and-csit-stub:8092",
    "kafkaBootstrapServer": "localhost:9092"
  },
  "scenarios": {
    "passthrough_read_scenario": {
      "executor": "constant-vus",
      "exec": "passthroughReadScenario",
      "vus": 2,
      "duration": "15m"
    },
    "passthrough_read_alt_id_scenario": {
      "executor": "constant-vus",
      "exec": "passthroughReadAltIdScenario",
      "vus": 2,
      "duration": "15m"
    },
    "passthrough_write_scenario": {
      "executor": "constant-vus",
      "exec": "passthroughWriteScenario",
      "vus": 2,
      "duration": "15m"
    },
    "passthrough_write_alt_id_scenario": {
      "executor": "constant-vus",
      "exec": "passthroughWriteAltIdScenario",
      "vus": 2,
      "duration": "15m"
    },
    "cm_handle_id_search_nofilter_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleIdSearchNoFilterScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_search_nofilter_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleSearchNoFilterScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_id_search_module_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleIdSearchModuleScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_search_module_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleSearchModuleScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_id_search_property_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleIdSearchPropertyScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_search_property_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleSearchPropertyScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_id_search_cpspath_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleIdSearchCpsPathScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_search_cpspath_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleSearchCpsPathScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_id_search_trustlevel_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleIdSearchTrustLevelScenario",
      "vus": 1,
      "duration": "15m"
    },
    "cm_handle_search_trustlevel_scenario": {
      "executor": "constant-vus",
      "exec": "cmHandleSearchTrustLevelScenario",
      "vus": 1,
      "duration": "15m"
    },
    "legacy_batch_produce_scenario": {
      "executor": "shared-iterations",
      "exec": "legacyBatchProduceScenario",
      "vus": 2,
      "iterations": 100
    },
    "legacy_batch_consume_scenario": {
      "executor": "per-vu-iterations",
      "exec": "legacyBatchConsumeScenario",
      "vus": 1,
      "iterations": 1
    }
  },
  "thresholds": {
    "http_req_failed": ["rate == 0"],
    "cmhandles_created_per_second": ["avg >= 22"],
    "cmhandles_deleted_per_second": ["avg >= 22"],
    "ncmp_overhead_passthrough_read": ["avg <= 40"],
    "ncmp_overhead_passthrough_write": ["avg <= 40"],
    "ncmp_overhead_passthrough_read_alt_id": ["avg <= 40"],
    "ncmp_overhead_passthrough_write_alt_id": ["avg <= 40"],
    "id_search_nofilter_duration": ["avg <= 2000"],
    "id_search_module_duration": ["avg <= 2000"],
    "id_search_property_duration": ["avg <= 2000"],
    "id_search_cpspath_duration": ["avg <= 2000"],
    "id_search_trustlevel_duration": ["avg <= 2000"],
    "cm_search_nofilter_duration": ["avg <= 15000"],
    "cm_search_module_duration": ["avg <= 15000"],
    "cm_search_property_duration": ["avg <= 15000"],
    "cm_search_cpspath_duration": ["avg <= 15000"],
    "cm_search_trustlevel_duration": ["avg <= 15000"],
    "legacy_batch_read_cmhandles_per_second": ["avg >= 150"]
  }
}

The current KPI and ENDURANCE (proposed) test parameters (Last updated 14 Nov 2024 )

1	K P I						E N D U R A N C E
2	Test Stages	Scenario Name	Unit	Executor Type	VUs	Duration	Scenario Name	VUS	Executor Type	Duration
3	setup	create_cm_handles	CM-handles/second	N/A	N/A	20m	create_cm_handles	N/A	N/A	20m
4	scenario	passthrough_read_scenario	overhead	constant-vus	2	15m	passthrough_read_scenario	2	constant-vus	1h
5		passthrough_read_alt_id_scenario	overhead	constant-vus	2		passthrough_read_alt_id_scenario	2	constant-vus
6		passthrough_write_scenario	overhead	constant-vus	2		passthrough_write_scenario	2	constant-vus
7		passthrough_write_alt_id_scenario	overhead	constant-vus	2		passthrough_write_alt_id_scenario	2	constant-vus
8		cm_handle_id_search_nofilter_scenario	milliseconds	constant-vus	1		cm_handle_id_search_nofilter_scenario	1	constant-vus
9		cm_handle_id_search_module_scenario	milliseconds	constant-vus	1		cm_handle_id_search_module_scenario	1	constant-vus
10		cm_handle_id_search_property_scenario	milliseconds	constant-vus	1		cm_handle_id_search_property_scenario	1	constant-vus
11		cm_handle_id_search_cpspath_scenario	milliseconds	constant-vus	1		cm_handle_id_search_cpspath_scenario	1	constant-vus
12		cm_handle_id_search_trustlevel_scenario	milliseconds	constant-vus	1		cm_handle_id_search_trustlevel_scenario	1	constant-vus
13		cm_handle_search_nofilter_scenario	milliseconds	constant-vus	1		cm_handle_search_nofilter_scenario	1	constant-vus
14		cm_handle_search_module_scenario	milliseconds	constant-vus	1		cm_handle_search_module_scenario	1	constant-vus
15		cm_handle_search_property_scenario	milliseconds	constant-vus	1		cm_handle_search_property_scenario	1	constant-vus
16		cm_handle_search_cpspath_scenario	milliseconds	constant-vus	1		cm_handle_search_cpspath_scenario	1	constant-vus
17		cm_handle_search_trustlevel_scenario	milliseconds	constant-vus	1		cm_handle_search_trustlevel_scenario	1	constant-vus
18		legacy_batch_produce_scenario	milliseconds	shared-iterations	2	N/A	legacy_batch_produce_scenario	1 (1 req. sec)	constant-arrival-rate
19		legacy_batch_consume_scenario	events/second	per-vu-iterations	1	N/A
20	teardown	delete_cm_handles	CM-handles/second	N/A	N/A	20m	delete_cm_handles	N/A		10m

Visualizing the ENDURANCE test results

As mentioned in issues/decisions, there are two alternative ways of representing memory trends: Grafana and GnuPlot.

Grafana

Nordix has its own Prometheus and Grafana (externally accessible, no need to install Globalprotect) and it can be configured to show cps-and-ncmp memory trends.
Link to Nordix Grafana: https://monitoring.nordix.org/login

Through prometheus.yml (of Nordix), a new scrape_configs for the cps-and-ncmp microservice can be added.

Code Block

language	yaml

scrape_configs:
- job_name: 'cps-and-ncmp'
  metrics_path: '/actuator/prometheus'
  scrape_interval: 5s
  static_configs:
    - targets:
      - 'cps-and-ncmp:8080' // replace by <physical-server-ip:port>

Also, the dashboard provider and dashboard config (jvm-micrometer-dashboard.json) can be added.

Code Block

language	yaml

providers:
  - name: default
    orgId: 1
    type: file
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true

Then, the trend of G1 Old Gen space can be observed as seen below:

...

Permanent Storage Alternatives

Prometheus

Configuring the Prometheus with a persistent volume to retain data is possible.

This is an example service config for Prometheus:

Code Block

language	json

  prometheus:
    container_name: ${PROMETHEUS_CONTAINER_NAME:-prometheus}
    image: prom/prometheus:latest
    ports:
      - ${PROMETHEUS_PORT:-9090}:9090
    restart: always
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    environment:
      - PROMETHEUS_RETENTION_TIME=${PROMETHEUS_RETENTION_TIME:-15d}
    healthcheck:
      test: [ "CMD-SHELL", "wget --spider --quiet --tries=1 --timeout=10 http://localhost:9090/-/healthy || exit 1" ]
      interval: 30s
      timeout: 10s
      retries: 3
    profiles:
      - monitoring
      
volumes:
  prometheus_data:
    driver: local

GNUPlot

As an alternate, it is possible to create an image (plot) using Prometheus and Gnuplot. For example, to record the last 15 minutes of Old Generation heap usage, this can used:

Code Block

curl -G 'http://localhost:9090/api/v1/query_range' --data-urlencode 'query=jvm_memory_used_bytes{area="heap",id="G1 Old Gen",instance="cps-and-ncmp:8080",job="cps-and-ncmp"}' --data-urlencode "start=$(date -u -d '15 minutes ago' +%s)" --data-urlencode "end=$(date -u +%s)" --data-urlencode 'step=15s' > heap_usage.json
jq -r '.data.result[0].values[] | @csv' heap_usage.json > heap_usage.csv
echo "set terminal png; set output 'heap_usage.png'; set datafile separator ','; set xdata time; set timefmt '%s'; set format x '%H:%M:%S'; plot 'heap_usage.csv' using 1:2 with lines title 'Old Gen Heap Usage'" | gnuplot

However, this alternative was not chosen to proceed. 25 Nov 2024 Daniel Hanrahan and Halil Cakal during the CPS Team 2 Daily

Version	Old Version 31	New Version Current
Changes made by	Halil Cakal	Halil Cakal
Saved on	Nov 21, 2024	Dec 04, 2024

Versions Compared

Key

Out of Scope

Suggested Tasks

Solution Proposal

Agree and Define new ‘Suite’ (js)

Solution Proposal

Agree and Define new ‘Suite’ (js)

Visualizing the ENDURANCE test results

Visualizing the ENDURANCE test results

Grafana

Permanent Storage Alternatives

Prometheus

GNUPlot