CPS-2794 Improving KPI test error logging

CPS-2794 Improving KPI test error logging

Limitations

Limitation

What does this mean to us

Limitation

What does this mean to us

1

Stopping one scenario in k6 test run is not supported

  • use of methods such as fail() or threshold abortOnFail aborts the current VU iteration not only one scenario

  1. we can not stop just one scenario and continue on to the next scenarios

  2. we can abort the whole test run only

2

Each VU has its own states and cannot access other VUs states (no shared mutable state) as it runs on its own JS runtime

  1. if we wanted to control the number of errors for the whole k6 test run (all scenarios) i.e. only 10 errors for the whole test run and abort, we cannot create a global variable to control this.

3

Environmental variables are read only for each VU

  1. we cannot use env variables as a shared variable for each VU (as a work around for limitation #2)

4

Threshold metrics are only evaluated after the k6 test run

  1. use of metric such as http_req_failed or a custom metric cannot be used to abort the whole k6 test run or scenario

Options

Note: Options below does not include use of an external client to output logs to or to create a shared variable

Option Description

Pros

Cons

Notes

Option Description

Pros

Cons

Notes

1

Use console.error('Logging message') to log every errors when response is != 200

image-20250517-133212.png

 

  1. Can modify to control what part of response is logged

    1. response.url

    2. response.status

    3. response.body

  1. used on its own will overload logs if all errors are logged

 

2

Use console.error() + log errors to a file

image-20250517-132407.png

 

image-20250517-132455.png
  1. Only errors are saved into a file which means if there are no errors, file cant be as big

    1. Options reference | Grafana k6 documentation

  2. Log file can be discarded next day in our Jenkins workspace or when we choose to discard it

    1. log has timestamp which allows us to relate which build it happened on

  1. Possibility of having huge file for multiple errors if not controlled

  2. Does not support multiple log output files

    1. info level logs will be missed if specified only errors

    2. see example where multiple log output files is used to separate info and error logs

image-20250517-134417.png

image-20250517-134353.png

 

3

Use fail() when response is not OK

Sample result

image-20250517-142103.png

 

  1. Marks the current iteration failed

  2. Useful for non rest call

    1. a failed rest call automatically updates http_req_failed

  3. Can log error message with it

  1. Aborts only the current iteration for the VU, not the entire scenario.

  2. the VU is not stopped and continues onto the next iteration after

 

4

Limiting error log counts per VUs

i.e. each VU iteration

set limit = 5 set countedErrors = 0 check countedErrors if countedErrors > limit then do rest call countedErrors++ if countedErrors == limit then log ‘next iteration aborted …’ else do nothing or fail()
  1. Controlled logs, won’t create a huge file filled with errors

  2. if ‘do nothing’ when error limit is reached, continues other iterations and scenarios without adding to the metrics (stops VU from performing any actions)

  3. if used ‘fail()’ then the threshold "http_req_failed" is still updated on the next iterations

  1. if ‘do nothing’ the number of iterationsCompleted on the k6 instance summary will include the iterations where no actual rest call was made

 

5

Use exec.test.abort() to abort ALL test scenarios (all k6 tests) and do k6 test teardown() immediately

 

Below example shows that for an error limit of 5 per VU - when one VU reach its limit, whole k6 test run is cancelled. The action is logged at an error level and detail of which test caused it.

image-20250517-140210.png
  1. Can be used when error limit on one VU is reached

  2. Can abort with a message as well

  3. Can be used to control on specific error types , i.e.

    1. if response code is 400 then we can do option #4

    2. if response code is 500 then abort all test

  1. Error limit can only be tracked per VU which means that one VU reaching error limit can abort ALL test

Would be great if a shared mutable variable between VUs is available

6

Combination of the following:

  1. console.error() / console.warning()

  2. limiting logs per VUs based on status code i.e.

    1. errorsLimit = 3

    2. warningsLimit = 10

  3. if errors limit hit exec.test.abort()

  4. if warning limit hit do nothing, don’t log next warnings

  1. we control the tolerated errors and warnings

  2. does not abort all test due to warnings

  3. we can see which test hit an error limit

 

 


Note: for constant-arrival-rate , maxVus = preAllocatedVUs if maxVus is not set (Constant arrival rate | Grafana k6 documentation)

Scenario

KPI Scenario Properties

Iterations

Error limit per VU

 

Notes

Scenario

KPI Scenario Properties

Iterations

Error limit per VU

 

Notes

1

Registration of CM-handles

 

 

 

 

 

2

De-registration of CM-handles

 

 

 

 

 

3

CM-handle ID search with No filter

"cm_handle_id_search_nofilter_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleIdSearchNoFilterScenario", "rate": 1, "timeUnit": "2600ms", "duration": "15m", "preAllocatedVUs": 2, "startTime": "0ms" },

~ 346

5

 

 

4

CM-handle ID search with Module filter

"cm_handle_id_search_module_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleIdSearchModuleScenario", "rate": 1, "timeUnit": "2600ms", "duration": "15m", "preAllocatedVUs": 2, "startTime": "400ms" },

~ 346

5

 

 

5

CM-handle ID search with Property filter

"cm_handle_id_search_property_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleIdSearchPropertyScenario", "rate": 1, "timeUnit": "2600ms", "duration": "15m", "preAllocatedVUs": 2, "startTime": "800ms" },

~ 346

5

 

 

6

CM-handle ID search with Cps Path filter

"cm_handle_id_search_cpspath_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleIdSearchCpsPathScenario", "rate": 1, "timeUnit": "2600ms", "duration": "15m", "preAllocatedVUs": 2, "startTime": "1200ms" },

~ 346

5

 

 

7

CM-handle ID search with Trust Level filter

"cm_handle_id_search_trustlevel_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleIdSearchTrustLevelScenario", "rate": 1, "timeUnit": "2600ms", "duration": "15m", "preAllocatedVUs": 2, "startTime": "1600ms" },

~ 346

5

 

 

8

CM-handle search with No filter

"cm_handle_search_nofilter_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleSearchNoFilterScenario", "rate": 1, "timeUnit": "24s", "duration": "15m", "preAllocatedVUs": 2, "startTime": "0s" },

~ 37

5

 

 

9

CM-handle search with Module filter

"cm_handle_search_module_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleSearchModuleScenario", "rate": 1, "timeUnit": "24s", "duration": "15m", "preAllocatedVUs": 2, "startTime": "3s" },

~ 37

5

 

 

10

CM-handle search with Property filter

"cm_handle_search_property_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleSearchPropertyScenario", "rate": 1, "timeUnit": "24s", "duration": "15m", "preAllocatedVUs": 2, "startTime": "6s" },

~ 37

5

 

 

11

CM-handle search with Cps Path filter

"cm_handle_search_cpspath_scenario": { "executor": "constant-arrival-rate", "exec": "cmHandleSearchCpsPathScenario", "rate": 1, "timeUnit": "24s", "duration": "15m", "preAllocatedVUs": 2, "startTime": "9s" },

~ 37

5

 

 

12

NCMP overhead for Synchronous single CM-handle pass-through read with alternate id

"passthrough_read_alt_id_scenario": { "executor": "constant-arrival-rate", "exec": "passthroughReadAltIdScenario", "rate": 25, "timeUnit": "1s", "duration": "15m", "preAllocatedVUs": 5, "maxVUs": 25, "startTime": "200ms" },

~ 22,500

5

 

 

13

NCMP overhead for Synchronous single CM-handle pass-through write with alternate id

"passthrough_write_alt_id_scenario": { "executor": "constant-arrival-rate", "exec": "passthroughWriteAltIdScenario", "rate": 13, "timeUnit": "1s", "duration": "15m", "preAllocatedVUs": 5, "maxVUs": 25, "startTime": "600ms" },

~ 11,700

5

 

 

14

Legacy batch read operation

"legacy_batch_consume_scenario": { "executor": "per-vu-iterations", "exec": "legacyBatchConsumeScenario", "vus": 1, "iterations": 1, "maxDuration": "16m" },

1

1

 

 

15

 

"produceCmAvcBackGroundLoadAtPeakRate": { "executor": "constant-arrival-rate", "exec": "produceAvcEventsScenario", "rate": 11, "timeUnit": "1s", "duration": "15m", "preAllocatedVUs": 11, "maxVUs": 12, "gracefulStop": "10s" }

~ 9,900

-

 

 

 

error limit on endurance tests ?

Scenario

Endurance Scenario Properties

Iterations

Error limit per VU

 

Notes

Scenario

Endurance Scenario Properties

Iterations

Error limit per VU

 

Notes

1

Registration of CM-handles

 

 

 

 

 

2

De-registration of CM-handles

 

 

 

 

 

3

CM-handle ID search with No filter

"cm_handle_id_search_nofilter_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchNoFilterScenario", "vus": 1, "duration": "2h" },

 

 

 

 

4

CM-handle ID search with Module filter

"cm_handle_id_search_module_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchModuleScenario", "vus": 1, "duration": "2h" },

 

 

 

 

5

CM-handle ID search with Property filter

"cm_handle_id_search_property_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchPropertyScenario", "vus": 1, "duration": "2h" },

 

 

 

 

6

CM-handle ID search with Cps Path filter

"cm_handle_id_search_cpspath_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchCpsPathScenario", "vus": 1, "duration": "2h" },

 

 

 

 

7

CM-handle ID search with Trust Level filter

"cm_handle_id_search_trustlevel_scenario": { "executor": "constant-vus", "exec": "cmHandleIdSearchTrustLevelScenario", "vus": 1, "duration": "2h" },

 

 

 

 

8

CM-handle search with No filter

"cm_handle_search_nofilter_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchNoFilterScenario", "vus": 1, "duration": "2h" },

 

 

 

 

9

CM-handle search with Module filter

"cm_handle_search_module_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchModuleScenario", "vus": 1, "duration": "2h" },

 

 

 

 

10

CM-handle search with Property filter

"cm_handle_search_property_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchPropertyScenario", "vus": 1, "duration": "2h" },

 

 

 

 

11

CM-handle search with Cps Path filter

"cm_handle_search_cpspath_scenario": { "executor": "constant-vus", "exec": "cmHandleSearchCpsPathScenario", "vus": 1, "duration": "2h" },

 

 

 

 

12

NCMP overhead for Synchronous single CM-handle pass-through read with alternate id

"passthrough_read_alt_id_scenario": { "executor": "constant-vus", "exec": "passthroughReadAltIdScenario", "vus": 4, "duration": "2h" },

 

 

 

 

13

NCMP overhead for Synchronous single CM-handle pass-through write with alternate id

"passthrough_write_alt_id_scenario": { "executor": "constant-vus", "exec": "passthroughWriteAltIdScenario", "vus": 4, "duration": "2h" },

 

 

 

 

14

Legacy batch produce operation

"legacy_batch_produce_scenario": { "executor": "constant-arrival-rate", "exec": "legacyBatchProduceScenario", "duration": "2h", "rate": 1, "timeUnit": "1s", "preAllocatedVUs": 1 }