Setup
Environment:
OS: Zorin OS 16.2
RAM: 32 GB
CPU: Intel® Core™ i7-10610U CPU @ 1.80GHz × 8
Data:
Included in ZIP file (at bottom)
- All data under 1 anchors
- Under
/openroadm-devices
we have list of 10,000openroadm-device[..]
- Under
- tree-size per 'device' fragments 86 fragments
- KB per devices: 333 KB
Single-large object request
Query: cps/api/v1/dataspaces/openroadm/anchors/owb-msa221-anchor/node?xpath=/openroadm-devices/openroadm-device[@device-id='C201-7-13A-5A1']&include-descendants=true
Durations are average of 100 measurements
(1 object out of many)
Patch | Devices | E2E duration (s) | Fragment Query duration (s) | Service Overhead | Graph |
---|---|---|---|---|---|
1) Baseline | 1,000 | 0.045 | 0.023 | 0.022 | |
2,000 | 0.054 | 0.035 | 0.018 | ||
5,000 | 0.144 | 0.117 | 0.027 | ||
10,000 | 0.290 | 0.260 | 0.030 | ||
2) https://gerrit.onap.org/r/c/cps/+/133511/2 | 1,000 | 0.054 | 0.053 | 0.001 | |
2,000 | 0.100 | 0.100 | 0.000 | ||
5,000 | 0.229 | 0.229 | 0.000 | ||
10,000 | 0.213 | 0.212 | 0.000 | ||
3) https://gerrit.onap.org/r/c/cps/+/133511/12 | 1,000 | 0.020 | 0.016 | 0.004 | |
2,000 | 0.030 | 0.026 | 0.003 | ||
5,000 | 0.113 | 0.108 | 0.005 | ||
10,000 | 0.100 | 0.096 | 0.003 |
Observations (patch 3)
- Is 'findByAnchorAndCspPath' being used (shouldn't?!)
- Query time increases until list-size reached 6,000 elements and then levels off
Whole data tree as one request
1 object containing all node as descendants (mainly one big list)
Query: cps/api/v1/dataspaces/openroadm/anchors/owb-msa221-anchor/node?xpath=/openroadm-device&include-descendants=true
All queries ran 10-reames
Patch | Devices | E2E duration (s) | Fragment Query duration (s) | Service duration (s) | Object Size (MB) | Object Size #Fragments | Graph |
---|---|---|---|---|---|---|---|
1) Baseline | 1,000 | 11.8 | <0.1 * | 12 | 0.3 | 86,000 | |
2,000 | 28.5 | <0.1 * | 28 | 0.7 | 172,000 | ||
5,000 | 87.0 | <0.1 * | 86 | 1,7 | 430,000 | ||
10,000 | 201.0 | <0.1* | 201 | 3.3 | 860,000 | ||
2) | 1,000 | 0.5 | 0.2 | 0.3 | 0.3 | 86,000 | |
2,000 | 1.0 | 0.4 | 0.6 | 0.7 | 172,000 | ||
5,000 | 2.5 | 1.1 | 1.4 | 1.7 | 430,000 | ||
10,000 | 7.0 | 2.9 | 4.0 | 3.3 | 860,000 | ||
1,000 | 3.0 | 1.3 | 1.7 | 0.3 | 86,000 | ||
2,000 | 5.5 | 2.3 | 3.2 | 0.7 | 172,000 | ||
5,000 | 11.0 | 5.4 | 5.6 | 1.7 | 430,000 | ||
10,000 | 25.4 | 11.7 | 13.6 | 3.3 | 860,000 |
*Only initial Hibernate query, hibernate will lazily fetch data later which is reflected in E2E time
Observations:
- PathsSet #2 did perform better than the latest patch! Need to compare Daniel Hanrahan will follow up
Get nodes parallel
Fetch 1 device from a database with 10,000 devices
Bash parallel Curl commands, 1 thread executed 10 Sequential requests with no delays, average response times are reported
Query: cps/api/v1/dataspaces/openroadm/anchors/owb-msa221-anchor/node?xpath=/openroadm-devices/openroadm-device[@device-id='C201-7-13A-5A1']&include-descendants=true
Patch: https://gerrit.onap.org/r/c/cps/+/133511/12
Threads | E2E duration (s) | Succes Ratio | Fragment Query duration (s) |
---|---|---|---|
1 | 0.082 | 100% | 0.2 |
2 | 0.091 | 100% | 0.1 |
3 | 0.120 | 100% | 0.1 |
5 | 0.3 | 100% | 0.2 |
10 | 0.3 | 99.9% | 0.3 |
20 | 0.5 | 99.5% | 0.5 |
50 | 1.0 | 99.4% | 1.0 |
100 | 2.3 | 99.7% | 2.3 |
200 | 7.6 | 99.7% | 6.2 |
500 | 17.1 | 41.4% | 13.8 |
1,000 | 15.3 (many connection errors) | 26.0% | 11.9 |
Graphs:
- Average E2E Execution Time
- Internal Method Counts (total)
Observations
- From 10 Parallel request (of 10 sequential request) the client can't always connect and we see time out error (succes ratio <100%)
- Sequential request are fired faster than actual responses so from DB perspective they are almost parallel request as well
- Database probably already become bottleneck with 2 threads, effectively firening a total of 20 call very quickly. Its know that the DB connection pool/internal will slow down from 12 or more 'parallel' request
Get 1000 nodes in Parallel with varying thread count
In this test, 1000 requests are sent using curl, but with varying thread count (using --parallel-max option).
echo -e "Threads\tTime" for threads in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 20 30 40 50; do echo -n -e "$threads\t" /usr/bin/time -f "%e" curl --silent --output /dev/null --fail --show-error \ --header "Authorization: Basic Y3BzdXNlcjpjcHNyMGNrcyE=" \ --get "http://localhost:8883/cps/api/v1/dataspaces/openroadm/anchors/owb-msa221-anchor/node?xpath=/openroadm-devices/openroadm-device\[@device-id='C201-7-[1-25]A-[1-40]A1'\]&include-descendants=true" \ --parallel --parallel-max $threads --parallel-immediate done
Results
Threads | Time (s) | Speedup | Comments |
1 | 140.4 | 1.0 | |
2 | 71.6 | 2.0 | 2 threads is 2x faster than 1 thread |
3 | 48.5 | 2.9 | |
4 | 37.2 | 3.8 | |
5 | 31.0 | 4.5 | |
6 | 26.6 | 5.3 | |
7 | 23.8 | 5.9 | |
8 | 21.6 | 6.5 | |
9 | 20.0 | 7.0 | |
10 | 18.7 | 7.5 | 10 threads is 7.5x faster than 1 thread |
11 | 17.7 | 7.9 | |
12 | 16.8 | 8.4 | There are exactly 12 CPU cores (logical) on test machine |
13 | 16.7 | 8.4 | |
14 | 16.7 | 8.4 | |
15 | 16.8 | 8.4 | |
20 | 16.8 | 8.4 | |
30 | 16.7 | 8.4 | |
40 | 16.8 | 8.4 | |
50 | 16.7 | 8.4 |
Observations
- Performance increases nearly linearly with increasing thread count, up to the number of CPU cores.
- Performance stops increasing when the number of threads equals the number of CPU cores (expected).
- Verbose statistics show that each individual request takes around 0.14 seconds, regardless of thread count (but with multiple CPU cores, requests are really done in parallel)
Data sheets