SDN-C Geo-Redundancy Notifications

Casablanca

This functionality first introduced in the Casablanca release.

Overview

If DMaaP message router is available and configured in the ONAP system, PROM can make use of it for sending health and failover-oriented notifications to be consumed by other applications or operators.  This can ensure that the operations team is immediately aware of any problems encountered. While SDN-C will auto-failover to the standby site, someone will still want to look into the failed site. In addition, if a catastrophic failure (one that requires reconfigure the ODL) is experienced, manual intervention will always be required once the failed site is restored.

Health notifications

When PROM detects that the state of the local site has transitioned from one state (e.g. 'healthy') to another (e.g. 'unhealthy'), a health-change notification is generated and delivered to the SDNC-GEO-REDUNDANCY topic in DMaaP:

health-change
[ "{\"site\":\"sdnc01\",\"type\":\"health-change\",\"status\":\"unhealthy\",\"timestamp\":\"2018-05-08 16:47:53.137170\",\"deployment\":\"test_onap\"}" ]

The notification shows which site that health is being reported for, the new health of the site, when this new state was discovered and the name of the ONAP deployment the site is part of.

Additionally, PROM will send a "health-initial" type message when the PROM pod is started:

health-initial
[ "{\"site\":\"sdnc01\",\"type\":\"health-initial\",\"status\":\"healthy\",\"timestamp\":\"2018-05-08 16:44:48.873850\",\"deployment\":\"test_onap\"}" ]


Failover event notifications

When a site failover occurs, either through operator intervention or automatically via PROM (see SDN-C Site Failover), PROM generates a failover notification and delivers it the SDNC-GEO-REDUNDANCY topic in DMaaP:

failover
[ "{\"site\":\"sdnc02\",\"type\":\"failover\",\"status\":\"success\",\"deployment\":\"test_onap\",\"timestamp\":\"2018-05-08 15:00:04\"}" ]

The notification shows which site was made active due to failover, whether or not the failover activity was successful, the name of the ONAP deployment the site is part of and the time at which the failover occurred.

Catastrophic failover

A similar event is generated in the case where a catastrophic failure has affected a site, resulting in the need to drastically reconfigure the ODL cluster:

catastrophic failover