Tips on tuning akka.conf and the ...datastore.cfg files for clustering
2019-05-13: Original post
jhartley / at - luminanetworks \ com – for any questions
Goals for akka.conf:
Specify THIS cluster member's resolvable FQDN or IP address. (Tip: Use FQDNs, and ensure they're resolvable in your env.)
Name the list of all cluster members in the seed-nodes list.
Tune optional variables, noting that the defaults for many of these are far too low.
Keep this file ~identical on all instances; only the "roles" and "hostname" are unique to this member.
Example of a 3-node configuration, tuned:
odl-cluster-data {
akka {
loglevel = ""
remote {
netty.tcp {
hostname = "odl1.region.customer.com"
port = 2550
},
use-passive-connections = off
}
actor {
debug {
autoreceive = on
lifecycle = on
unhandled = on
fsm = on
event-stream = on
}
}
cluster {
seed-nodes = [
"akka.tcp://opendaylight-cluster-data@odl1.region.customer.com:2550",
"akka.tcp://opendaylight-cluster-data@odl2.region.customer.com:2550",
"akka.tcp://opendaylight-cluster-data@odl3.region.customer.com:2550"
]
seed-node-timeout = 15s
roles = ["member-1"]
}
persistence {
journal-plugin-fallback {
circuit-breaker {
max-failures = 10
call-timeout = 90s
reset-timeout = 30s
}
recovery-event-timeout = 90s
}
snapshot-store-plugin-fallback {
circuit-breaker {
max-failures = 10
call-timeout = 90s
reset-timeout = 30s
}
recovery-event-timeout = 90s
}
}
}
}
Goals for org.opendaylight.controller.cluster.datastore.cfg:
This is a HOCON-style config file, so subsequent entries replace earlier entries.
The goal here is to significantly reduce the race-condition that is present when starting all members of a cluster, and the race-condition any freshly restarted or "cleaned" member has when rejoining.
### Note: Some sites use batch-size of 1, not reflecting that here###
persistent-actor-restart-min-backoff-in-seconds=10
persistent-actor-restart-max-backoff-in-seconds=40
persistent-actor-restart-reset-backoff-in-seconds=20
shard-transaction-commit-timeout-in-seconds=120
shard-isolated-leader-check-interval-in-millis=30000
operation-timeout-in-seconds=120
Goals for module-shards.conf:
Name which members retain copies of which data shards.
These shard name fields are the 'friendly" names assigned to the explicit namespaces in the modules.conf.
In a K8S/Swarm environment, it's easiest to keep this identical on all members. Unique shard replication (or isolation) strategies are for another document/discussion, and require non-trivial planning.
module-shards = [
{
name = "default"
shards = [
{
name="default"
replicas = [
"member-1"
"member-2"
"member-3"
]
}
]
},
{
name = "topology"
shards = [
{
name="topology"
replicas = [
"member-1"
"member-2"
"member-3"
]
}
]
},
{
name = "inventory"
shards = [
{
name="inventory"
replicas = [
"member-1"
"member-2"
"member-3"
]
}
]
},
]
...thus, for example, it would be legitimate to have a single simple entry that ONLY includes "default" if desired. Thus there would only be default-config and default-operational, plus some of the auto-created shards.