SDN-C Site Health Determination
Casablanca
This functionality was introduced in the Casablanca release.
In Beijing, Kubernetes dashboard was suggested for monitoring the general health of a site (see 5. Install and Use Kubernetes UI).
Overview
In order for either an operator or PROM to make proper decisions as to whether one site should be made active over another, the ability for a particular site to process messaging needs to be ascertained.
Manually checking site health
In order to manually check the health of a site, the operator can run the sdnc.monitor script from the Kubernetes master in the site they are concerned with. Release name is a required argument, namespace defaults to onap if not specified.
sdnc.monitor
ubuntu@k8s-s2-master:~/oom/kubernetes/sdnc/resources/geo/bin$ ./sdnc.monitor dev
healthy
This version of the script is actually a wrapper that utilizes kubectl to remotely access the PROM pod in order to run the sdnc.monitor script that actually performs the health checks on components in the site.
Alternatively, the sdnc.monitor script available in the PROM pod can be run directly:
sdnc.monitor
root@dev-prom-6485f566fb-hdhzs:/app/bin# ./sdnc.monitor
healthy
Advanced health reporting
To help troubleshoot an unhealthy site, include the --debug argument which will show which health checks are passing and failing, and for failing checks the health check output to help identify the root cause.
The use of consul in component health checks
The consul health checks that are selected for site health are specified in the prom pod's values.yaml file, e.g. ~/oom/kubernetes/sdnc/prom/values.yaml
.
prom values.yaml
config:
...
healthChecks:
# All top-level checks must pass
- "Health Check: SDNC - SDN Host"
- "Health Check: SDNC"
- "Health Check: SDNC ODL Cluster"
- "Health Check: SDNC Portal"
# Within nested lists, only one must pass
- - "Health Check: SDNC-SDN-CTL-DB-01"
- "Health Check: SDNC-SDN-CTL-DB-02"
The above example, the first four health checks (three for OpenDaylight and one for admin portal) must all pass, as well at at least one MySQL port check. Short-circuit evaluation is used to determine site health in as few consul queries as possible.