Planning and Tuning compute resources for ODL scalability

jhartley / at - luminanetworks \ com – for any questions

The information compiled here is based on months of collected production data across a wide range of use-cases and a vast number of Java Garbage Collection scenarios/outages.
As this evolves over time, additional updates will be made aligned to future CCSDK/SDNC/related project releases.


The general formula for ODL resource planning is as follows:
cores = 0.5 * GB.RAM,
heap = 0.75 * GB.RAM,
parallel GC threads = 0.75 * cores.

For example:
32GB RAM allocated to VM
...thus cores = 16,
Java MAX & MIN memory = 24G,
parallel GC threads = 12.

You can always ADD cores for CPU-intensive applications, but avoid reducing cores else it will take to long for GC cycles to complete.

Similarly, you can always REDUCE the percentage of MAX&MIN Heap compared to total RAM down from 0.75 to 0.50, if you have a heavy app load.


Using this formula, guidelines for instance sizing are as follows:

Small:
RAM: 8
CPU: 4
Heap: 6
GC p.threads: 3
example EXTRA_JAVA_OPTS: "-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=3 -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -Xloggc:/{{your/path}}/gc-%t.log"

Medium:
RAM: 16
CPU: 8
Heap: 12
GC p.threads: 6
example EXTRA_JAVA_OPTS: "-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=6 -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -Xloggc:/{{your/path}}/gc-%t.log"

Large:
RAM: 32
CPU: 16
Heap: 24
GC p.threads: 12
example EXTRA_JAVA_OPTS: "-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=12 -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -Xloggc:/{{your/path}}/gc-%t.log"

XL:
RAM: 64
CPU: 32
Heap: 48
GC p.threads: 24
example EXTRA_JAVA_OPTS: "-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=24 -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -Xloggc:/{{your/path}}/gc-%t.log"


Storage needs to be persistent in all cases, otherwise controller restarts take much longer, even for transient-data use-cases.
The exact speed and throughput characteristics of storage needs to match the workload; 500MBps+ is a good rule of thumb for highly active clustered MD-SAL applications but this scales up and down in a highly use-case-specific fashion. Non-contentious storage path/bus is the 2nd most important characteristic (ex: TOR/TOA switches where the storage path and VNF dataplane path share a LAG is an example of a disaster in the making -- VNFs can often survive this; controllers and DBs cannot.)


In virtualized environments, it is critically important to prioritize ODL (and any network controller) at the same "dedicated non-blocking" resource allocation priority as VNFs and other network devices. Shared/oversubscribed cores, RAM, and network interfacecs will result in failures and inconsistency.
For VMs (KVM, OS, ESXi, etc.), use:
CPU pinning
1GB hugepages
SR-IOV VFs for high-throughput use-cases
NUMA node alignment for the above
(Ex: using these hypervisor features, we have examples of ODL instances in production far larger than XL above.)
For containers, use:
resource requests AT the sizing levels, not below
resource limits can optionally be higher, but limit=request is fine
Do not allocate more containers in the same pod without also increasing requests and limits measured to those workloads
...otherwise consider multi-pod arch instead of potentially oversubscribing the controller with separate app containers.


Look in the controller/bin directory for the "setenv.pre" (depending on ODL release version) to edit the JAVA_MAX_MEM, JAVA_MIN_MEM, and EXTRA_JAVA_OPTS (and others). These are then consumed by the various start/stop/... scripts.   As of ODL Fluorine, you should also set ENABLE_PREFERRED_GC=false to ensure the default CMS is not automatically configured by ODL (which performs poorly with very large heaps).

The setenv.post can be used to unset JAVA_MAX_MEM and JAVA_MIN_MEM so commands like "stop" and "status" don't throw bogus errors with large heap in use.

[optional Java option to preallocate all RAM, not just reserve it: -XX:+AlwaysPreTouch -- makes start times slow, but guarantees host is healthy and actually has the RAM]

NOTES RE: Logging and I/O-amplified full-GC slowdowns:
Be sure that your GC logging (as with all ODL logging) is redirected to a separate partition/mount as your programs and data, so GC/Java/Karaf are not waiting on a system-level write() call.
In general, you should never leave any ODL-related logging in TRACE mode in Production systems due to the intense I/O that can be created.
Same logic applies to any other background I/O on each system; good to monitor via sar.

In Java10, more parallelization was added to the FullGC cycles, however Java10 has not been vetted by the ODL community. Plans are to start vetting Java11 during/after the Sodium release timeframe.