...
Issue | Notes | Decision | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Agree on endurance performance KPI | The aim of the study is to early detection of bugs similar to
Thus, we should spot the specific function/s currently used in k6 to be improved. These function/s will be used in the Endurance test suit. Actions:
| CPS need to run all the test cases according to FS not according to what was tested in cps-2430 (Seeing they ran test that was outside of FS). Less focus on the KPI for this endurance run. Kolawole Adebisi-Adeolokun Halil Cakal | ||||||||
2 | Agree on the test environment | Blocked by
|
| ||||||||
3 | Visualization of memory trend | The most convenient way to represent the memory usage trend in a GUI is Grafana.
| Agreed to use Grafana majorly because it allows storage for longer periods sticking with our A/C Kolawole Adebisi-Adeolokun Halil Cakal The same decision (proceeding with Grafana) was made by Daniel Hanrahan and Halil Cakal during the daily stand-up. Note: A PoC for the GNUPlot option has been made, please see the relevant Jira ticket details. Daniel Hanrahan Halil Cakal Toine Siebelink
| ||||||||
4 | Grafana access externally (If Grafana is selected at #3 then this issue should be discussed, otherwise ignore) |
Team Kraken support is needed either way. | Since Grafana is the preferred option, access issues shall be discussed with team Kraken Halil Cakal (consider Jenkins plug-in option as well)7
| ||||||||
5 | Permanent storage (DB) for Grafana (If Grafana is selected at #3 then this issue should be discussed, otherwise, ignore) | Investigate the solution strategy for the storage e.g. a permanent service 7/24 running … | Halil Cakal to investigate how to tune the volume |
...
Parameter | Expectation | Notes | Signoff | |
---|---|---|---|---|
1 | longevity - configurable duration | Should be able to identify any memory leakage | 2 hours |
Out of Scope
Reporting the reason for memory leakage if any.
Troubleshooting if the Endurance Tests ‘endurance' test can not run in parallel to existing ones (‘kpi' test and 'groovy based’ test) or cause issues in other tests on the CPS’ physical test server.
Suggested Tasks
...
Description
...
Jira
...
Add new k6 performance test suit
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
Add new Jenkins job to run endurance test
...
https://lf-onap.atlassian.net/issues/?jql=parent%3DCPS-2444%20ORDER%20BY%20rank
Solution Proposal
Agree and Define new ‘Suite’ (js)
The existing K6 performance tests are enough to detect any memory leakage so that no new K6 test will be added.Daniel Hanrahan Halil Cakal
There should be a new test suit Toine Siebelink Kolawole Adebisi-Adeolokun Halil CakalJira Legacy server System Jira serverId 4733707d-2057-3a0f-ae5e-4fd8aff50176 key CPS-
...
Add Grafana support to visualize memory usage pattern
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
Two docker-compose deployments simultaneously
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
Add new k6 test suit
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Solution Proposal
Agree and Define new ‘Suite’ (js)
...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
...
Code Block | ||
---|---|---|
| ||
{
"hosts": {
"ncmpBaseUrl": "http://localhost:8883",
"dmiStubUrl": "http://ncmp-dmi-plugin-demo-and-csit-stub:8092",
"kafkaBootstrapServer": "localhost:9092"
},
"kafka": {
"legacyBatchTopic": "legacy_batch_topic"
},
"scenarios": {
"passthrough_read_scenario": {
"executor": "constant-vus",
"exec": "passthroughReadScenario",
"vus": 2,
"duration": "15m"
}
},
"thresholds": {
"http_req_failed": ["rate == 0"],
"cmhandles_created_per_second": ["avg >= 22"],
"cmhandles_deleted_per_second": ["avg >= 22"],
"ncmp_overhead_passthrough_read": ["avg <= 40"]
}
} |
The current KPI and ENDURANCE (proposed) test parameters (Last updated )
...
K P I
...
E N D U R A N C E
...
Test Stages
...
Scenario Name
...
Unit
...
Executor Type
...
VUs
...
Duration
...
Scenario Name
...
VUS
...
Executor Type
...
Duration
...
setup
...
create_cm_handles
...
CM-handles/second
...
N/A
...
N/A
...
20m
...
create_cm_handles
...
N/A
...
N/A
...
20m
...
scenario
...
passthrough_read_scenario
...
overhead
...
constant-vus
...
2
15m
...
passthrough_read_scenario
...
2
...
constant-vus
1h
...
passthrough_read_alt_id_scenario
...
overhead
...
constant-vus
...
2
...
passthrough_read_alt_id_scenario
...
2
...
constant-vus
...
passthrough_write_scenario
...
overhead
...
constant-vus
...
2
...
passthrough_write_scenario
...
2
...
constant-vus
...
passthrough_write_alt_id_scenario
...
overhead
...
constant-vus
...
2
...
passthrough_write_alt_id_scenario
...
2
...
constant-vus
...
cm_handle_id_search_nofilter_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_id_search_nofilter_scenario
...
1
...
constant-vus
...
cm_handle_id_search_module_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_id_search_module_scenario
...
1
...
constant-vus
...
cm_handle_id_search_property_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_id_search_property_scenario
...
1
...
constant-vus
...
cm_handle_id_search_cpspath_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_id_search_cpspath_scenario
...
1
...
constant-vus
...
cm_handle_id_search_trustlevel_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_id_search_trustlevel_scenario
...
1
...
constant-vus
...
cm_handle_search_nofilter_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_search_nofilter_scenario
...
1
...
constant-vus
...
cm_handle_search_module_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_search_module_scenario
...
1
...
constant-vus
...
cm_handle_search_property_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_search_property_scenario
...
1
...
constant-vus
...
cm_handle_search_cpspath_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_search_cpspath_scenario
...
1
...
constant-vus
...
cm_handle_search_trustlevel_scenario
...
milliseconds
...
constant-vus
...
1
...
cm_handle_search_trustlevel_scenario
...
1
...
constant-vus
...
legacy_batch_produce_scenario
...
milliseconds
...
shared-iterations
...
2
...
N/A
...
legacy_batch_produce_scenario
...
1 (1 req. sec)
...
constant-arrival-rate
...
legacy_batch_consume_scenario
...
events/second
...
per-vu-iterations
...
1
...
teardown
...
delete_cm_handles
...
CM-handles/second
...
N/A
...
N/A
...
20m
...
delete_cm_handles
...
N/A
...
10m
Visualizing the ENDURANCE test results
As mentioned in issues/decisions, there are two alternative ways of representing memory trends: Grafana and GNUPlot.
...
Code Block |
---|
scrape_configs:
- job_name: 'cps-and-ncmp'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets:
- 'cps-and-ncmp:8080' // replace by <physical-server-ip:port> |
The trend of G1 Old Gen space can be observed as seen below:
...
2493
HOW: Daniel and I decided to define a new test suite (endurance.json). Daniel Hanrahan Halil Cakal
Have one ncmp-test-runner.js with different configs: kpi.json and endurance.json, moving the scenarios and thresholds settings into the json configs.
The endurance test should run all tests in parallel. Also, the same scenarios, executor types, and VUs should be the same in the Endurance suit except for the legacy_batch_consume_scenario.
The executor type for this scenario should be changed to constant-arrival-rate with 1 req/second. Toine Siebelink Daniel Hanrahan Kolawole Adebisi-Adeolokun Halil Cakal
As an example of kpi.json:Code Block language json { "hosts": { "ncmpBaseUrl": "http://localhost:8883", "dmiStubUrl": "http://ncmp-dmi-plugin-demo-and-csit-stub:8092", "kafkaBootstrapServer": "localhost:9092" }, "scenarios": { "passthrough_read_scenario": { "executor": "constant-vus", "exec": "passthroughReadScenario", "vus": 2, "duration": "15m" }, "passthrough_read_alt_id_scenario": { "executor": "constant-vus", "exec": "passthroughReadAltIdScenario", "vus": 2, "duration": "15m" }, "passthrough_write_scenario": { "executor": "constant-vus", "exec": "passthroughWriteScenario", "vus": 2, "duration": "15m" }, "passthrough_write_alt_id_scenario": { "executor": "constant-vus", "exec": "passthroughWriteAltIdScenario", "vus": 2, "duration": "15m" }, "cm_handle_id_search_nofilter_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchNoFilterScenario", "vus": 1, "duration": "15m" }, "cm_handle_search_nofilter_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchNoFilterScenario", "vus": 1, "duration": "15m" }, "cm_handle_id_search_module_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchModuleScenario", "vus": 1, "duration": "15m" }, "cm_handle_search_module_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchModuleScenario", "vus": 1, "duration": "15m" }, "cm_handle_id_search_property_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchPropertyScenario", "vus": 1, "duration": "15m" }, "cm_handle_search_property_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchPropertyScenario", "vus": 1, "duration": "15m" }, "cm_handle_id_search_cpspath_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchCpsPathScenario", "vus": 1, "duration": "15m" }, "cm_handle_search_cpspath_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchCpsPathScenario", "vus": 1, "duration": "15m" }, "cm_handle_id_search_trustlevel_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchTrustLevelScenario", "vus": 1, "duration": "15m" }, "cm_handle_search_trustlevel_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchTrustLevelScenario", "vus": 1, "duration": "15m" }, "legacy_batch_produce_scenario": { "executor": "shared-iterations", "exec": "legacyBatchProduceScenario", "vus": 2, "iterations": 100 }, "legacy_batch_consume_scenario": { "executor": "per-vu-iterations", "exec": "legacyBatchConsumeScenario", "vus": 1, "iterations": 1 } }, "thresholds": { "http_req_failed": ["rate == 0"], "cmhandles_created_per_second": ["avg >= 22"], "cmhandles_deleted_per_second": ["avg >= 22"], "ncmp_overhead_passthrough_read": ["avg <= 40"], "ncmp_overhead_passthrough_write": ["avg <= 40"], "ncmp_overhead_passthrough_read_alt_id": ["avg <= 40"], "ncmp_overhead_passthrough_write_alt_id": ["avg <= 40"], "id_search_nofilter_duration": ["avg <= 2000"], "id_search_module_duration": ["avg <= 2000"], "id_search_property_duration": ["avg <= 2000"], "id_search_cpspath_duration": ["avg <= 2000"], "id_search_trustlevel_duration": ["avg <= 2000"], "cm_search_nofilter_duration": ["avg <= 15000"], "cm_search_module_duration": ["avg <= 15000"], "cm_search_property_duration": ["avg <= 15000"], "cm_search_cpspath_duration": ["avg <= 15000"], "cm_search_trustlevel_duration": ["avg <= 15000"], "legacy_batch_read_cmhandles_per_second": ["avg >= 150"] } }
The current KPI and ENDURANCE (proposed) test parameters (Last updated )
1 | K P I | E N D U R A N C E | ||||||||
2 | Test Stages | Scenario Name | Unit | Executor Type | VUs | Duration | Scenario Name | VUS | Executor Type | Duration |
3 | setup | create_cm_handles | CM-handles/second | N/A | N/A | 20m | create_cm_handles | N/A | N/A | 20m |
4 | scenario | passthrough_read_scenario | overhead | constant-vus | 2 | 15m | passthrough_read_scenario | 2 | constant-vus | 1h |
5 | passthrough_read_alt_id_scenario | overhead | constant-vus | 2 | passthrough_read_alt_id_scenario | 2 | constant-vus | |||
6 | passthrough_write_scenario | overhead | constant-vus | 2 | passthrough_write_scenario | 2 | constant-vus | |||
7 | passthrough_write_alt_id_scenario | overhead | constant-vus | 2 | passthrough_write_alt_id_scenario | 2 | constant-vus | |||
8 | cm_handle_id_search_nofilter_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_nofilter_scenario | 1 | constant-vus | |||
9 | cm_handle_id_search_module_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_module_scenario | 1 | constant-vus | |||
10 | cm_handle_id_search_property_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_property_scenario | 1 | constant-vus | |||
11 | cm_handle_id_search_cpspath_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_cpspath_scenario | 1 | constant-vus | |||
12 | cm_handle_id_search_trustlevel_scenario | milliseconds | constant-vus | 1 | cm_handle_id_search_trustlevel_scenario | 1 | constant-vus | |||
13 | cm_handle_search_nofilter_scenario | milliseconds | constant-vus | 1 | cm_handle_search_nofilter_scenario | 1 | constant-vus | |||
14 | cm_handle_search_module_scenario | milliseconds | constant-vus | 1 | cm_handle_search_module_scenario | 1 | constant-vus | |||
15 | cm_handle_search_property_scenario | milliseconds | constant-vus | 1 | cm_handle_search_property_scenario | 1 | constant-vus | |||
16 | cm_handle_search_cpspath_scenario | milliseconds | constant-vus | 1 | cm_handle_search_cpspath_scenario | 1 | constant-vus | |||
17 | cm_handle_search_trustlevel_scenario | milliseconds | constant-vus | 1 | cm_handle_search_trustlevel_scenario | 1 | constant-vus | |||
18 | legacy_batch_produce_scenario | milliseconds | shared-iterations | 2 | N/A | legacy_batch_produce_scenario | 1 (1 req. sec) | constant-arrival-rate | ||
19 | legacy_batch_consume_scenario | events/second | per-vu-iterations | 1 | ||||||
20 | teardown | delete_cm_handles | CM-handles/second | N/A | N/A | 20m | delete_cm_handles | N/A | 10m |
Visualizing the ENDURANCE test results
As mentioned in issues/decisions, there are two alternative ways of representing memory trends: Grafana and GnuPlot.
Grafana
Nordix has its own Prometheus and Grafana (externally accessible, no need to install Globalprotect) and it can be configured to show cps-and-ncmp memory trends.
Link to Nordix Grafana: https://monitoring.nordix.org/login
Through prometheus.yml (of Nordix), a new scrape_configs for the cps-and-ncmp microservice can be added.
Code Block | ||
---|---|---|
| ||
scrape_configs:
- job_name: 'cps-and-ncmp'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets:
- 'cps-and-ncmp:8080' // replace by <physical-server-ip:port> |
Also, the dashboard provider and dashboard config (jvm-micrometer-dashboard.json) can be added.
Code Block | ||
---|---|---|
| ||
providers:
- name: default
orgId: 1
type: file
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true |
Then, the trend of G1 Old Gen space can be observed as seen below:
...
Permanent Storage Alternatives
Prometheus
Configuring the Prometheus with a persistent volume to retain data is possible.
This is an example service config for Prometheus:
Code Block | ||
---|---|---|
| ||
prometheus:
container_name: ${PROMETHEUS_CONTAINER_NAME:-prometheus}
image: prom/prometheus:latest
ports:
- ${PROMETHEUS_PORT:-9090}:9090
restart: always
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
environment:
- PROMETHEUS_RETENTION_TIME=${PROMETHEUS_RETENTION_TIME:-15d}
healthcheck:
test: [ "CMD-SHELL", "wget --spider --quiet --tries=1 --timeout=10 http://localhost:9090/-/healthy || exit 1" ]
interval: 30s
timeout: 10s
retries: 3
profiles:
- monitoring
volumes:
prometheus_data:
driver: local |
GNUPlot
As an alternate, it is possible to create an image (plot) using Prometheus and Gnuplot. For example, to record the last 15 minutes of Old Generation heap usage, this can used:
Code Block |
---|
curl -G 'http://localhost:9090/api/v1/query_range' --data-urlencode 'query=jvm_memory_used_bytes{area="heap",id="G1 Old Gen",instance="cps-and-ncmp:8080",job="cps-and-ncmp"}' --data-urlencode "start=$(date -u -d '15 minutes ago' +%s)" --data-urlencode "end=$(date -u +%s)" --data-urlencode 'step=15s' > heap_usage.json
jq -r '.data.result[0].values[] | @csv' heap_usage.json > heap_usage.csv
echo "set terminal png; set output 'heap_usage.png'; set datafile separator ','; set xdata time; set timefmt '%s'; set format x '%H:%M:%S'; plot 'heap_usage.csv' using 1:2 with lines title 'Old Gen Heap Usage'" | gnuplot |
However, this alternative was not chosen to proceed. Daniel Hanrahan and Halil Cakal during the CPS Team 2 Daily