Related Jira(s)

CPS-1415 - Getting issue details... STATUS

Assumptions

	Assumption	Notes
1	When a DMI restarts all cm-handles related to that DMI wil consider to have trust-level Complete	Temporary assumption until the 'Audit' function has been implemented

#	Issue	Notes	Decision
1	how fast should CPS (and DB) be able to process max heart beat failures	is 60K really realistic if ENM goes down we should get a notification for each node do we ?!	PoC has shown 60 seconds is reasonable
2	restart of NCMP	should/can this be handled
3	Does DMI Plugin provide NCMP with a health check url during registration? Or just rely on the default one provided with Spring boot actuator?	Document the contract. Its just the interface that matters and not the implementation.
4	Look for the dmi data service (dmiDataPlugin) for the healthcheck.

Description

What might trigger a cmhandle to go to STALE?

dmi plugin identifies that the device is no longer contactable
dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself.

#	Interface	Requirement	Additional Information
1	CPS-NCMP-E-05	The 'trustlevel' can is visible) on the methods as currently the 'cm handle state'	can be new or existing (preferred) endpoint
2	CPS-NCMP-E-05	CM Handles can be queried (filter condition) on 'trustlevel'	using a new 'trustLevel' condition (cannot use cpsPath condition)
3	CPS-NCMP-E-05	Once a CM Handle is registered the trust -level for that CM Handle should be reported to be 'COMPLETE'
4	CPS-NCMP-E-05	Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be reported to be 'NONE'	It might not need to be persisted....
5	CPS-NCMP-I-01.e	DMI plugin can report the current trustlevel of a single cm handle id	ie. the DMI can tell NCMP the trustlevel is 'NONE' when a node heartbeat failure is detected and 'COMPLETE' once it is restored

#

Error Scenario

Expected behavior

1

NCMP restart (all instances)

To be discussed, not suer if it can/should be handled

Trustlevels should be 'NONE' and need to be restored using an audit-request (not in scope)

#	Parameter	Expectation	Notes
1	dmi-down detection speed	30 seconds
2	device heartbeat frequency (message emitted by DMI plugin for each device)	60 seconds
3	maximum supported devices (by NCMP)	60,0000	Given #2 and #3 this means NCMP needs to process 60,000 message / minute!
4	maximum number of cm-handles down report by DMI in one request and/or per minute	30,000 / minute	a peak can be processed within 60 seconds
5	processing of all trustlevel time for DMI-Down and/or peak load by DMI	1 second
6	If we incorporate into searches endpoints the speed should not be impacted

This epic will only introduce trustlevels NONE and COMPLETE. PARTIAL and POOR may be added later as below.
Re-registration ie. resolving trutslevel degradation is not in scope of this epic
NCMP wil not send notification on trustlevel changes for external consumers

Interface	Name	Trigger	Description	Type	Endpoint or Topic	Schema
1	HealthCheck	30 second interval (configurable)	NCMP is to perform a health check against each of the DMI Plugins	REST	http://'$1'/manage/health/readiness This endpoint will be the standard heath check endpoint provided by spring boot actuator. We don't store it anywhere. We just document it for now.
2	CMHandle trust level change	A CMHandle managed by DMI Plugin's trust level has changed	data contains {trustLevel: ENUM} event id is cmhandle id	Kafka	TBD	<cloudEvents-header> id : <cmhandleId> type : org.onap.cm.events.trustlevel-notification data : { trustlevel : "COMPLETE" }
3	TrustLevel Request	Client Request	TrustLevel is to be returned based on the values in above Maps	REST	TBD
4	Document the health check endpoint	30 second interval (configurable)	Document the standard healthcheck endpoint url provided by the dmi plugins. We rely on the standard urls and not store it anywhere.	REST

NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the DMI Trust Map
IF a DMI Plugin goes down, that DMI Plugin's trust level is updated to NONE in the DMI Trust Map
IF a DMI Plugin comes back up, Trust level is set back to COMPLETE.

More details of health check URL can be accessed via:
CPS-1857 Document watchdog job impl. with health check URL

It is the responsibility of the DMI Plugins to update NCMP about the HBs of CMHandles
Through interface 4, DMI Plugins will provide a kafka event on the changing of trustworthiness state of a CMHandle.
1. NCMP receives this event and updates the Untrustworthy CMHandles Set accordingly
Needs to be able to handle a throughput of 60,000 State changes per minute for 2 instances

Body of request to be discussed, Will the request provide a DMI or a list of CMHandles?
Interface 3
NCMP will first check DMI Trust Map for the CMHandle
1. If that DMI which is managing the CMHandle is marked as untrustworthy then we return NONE without checking the Untrustworthy CMHandles Map
2. If that DMI is trustworthy, we check the individual CMHandles Map, if the CMHandle is in the Map then return NONE.
Logically IF (DMITrustMap.getDMIPlugin.getTrustLevel == NONE) Return NONE
1. ELSE (IF UntrustworthyCMHandlesMap.getDMIPlugin.contains(CMHandle) RETURN NONE
2. ELSE return COMPLETE