Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Related Jira(s)

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId4733707d-2057-3a0f-ae5e-4fd8aff50176
keyCPS-1415

Assumptions


AssumptionNotes 
1When a DMI restarts all cm-handles related to that DMI wil consider to have trust-level CompleteTemporary assumption until the 'Audit' function has been implemented

Issues and Decisions

#IssueNotes Decision
1how fast should CPS (and DB) be able to process max heart beat failuresis 60K really realistic if ENM goes down we should get a notification for each node do we ?!PoC has shown 60 seconds is reasonable
2restart of NCMPshould/can this be handled

Description

  1. Define scenarios which cause a CM Handle to go stale
  2. Implement changes to support tracking of CM Handle Freshness/Staleness

What might trigger a cmhandle to go to STALE?

  1. dmi plugin identifies that the device is no longer contactable
  2. dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself. 

Requirements

Functional

#InterfaceRequirementAdditional InformationSign-off
1CPS-NCMP-E-05The 'trustlevel' can is visible) on the methods as currently the 'cm handle state'can be new or existing (preferred) endpoint 
2CPS-NCMP-E-05CM Handles can be queried (filter condition) on  'trustlevel'  using a new 'trustLevel' condition (cannot use cpsPath condition)
3CPS-NCMP-E-05Once a CM Handle is registered the trust -level for that CM Handle should be reported to be 'COMPLETE'

4CPS-NCMP-E-05Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be reported to be 'NONE'It might not need to be persisted....
5CPS-NCMP-I-01.eDMI plugin can report the current trustlevel of a single cm handle idie. the DMI can tell NCMP the trustlevel is 'NONE' when a  node heartbeat failure is detected and 'COMPLETE' once it is restored

Error Handling

#Error ScenarioExpected behavior
1NCMP restart (all instances)

To be discussed, not suer if it can/should be handled

Trustlevels should be 'NONE' and need to be restored using an audit-request (not in scope)

Characteristics

#ParameterExpectationNotesSign-off
1dmi-down detection speed30 seconds

2device heartbeat frequency (message emitted by DMI plugin for each device)60 seconds

3maximum supported devices (by NCMP)60,0000Given #2 and #3 this means NCMP needs to process 60,000 message / minute!
4maximum number of cm-handles down report by DMI in one request and/or per minute30,000 / minute a peak can be processed within 60 seconds
5processing of all trustlevel time for DMI-Down and/or peak load by DMI 1 second

6If we incorporate into searches endpoints the speed should not be impacted



Out-of-Scope

  1. This epic will only introduce trustlevels NONE and COMPLETE. PARTIAL and POOR may be added later as below.
  2. Re-registration ie. resolving trutslevel degradation is not in scope of this epic
  3. NCMP wil not send notification on trustlevel changes for external consumers

High Level Interactions

Drawio
bordertrue
diagramNameStaleness Freshness Overview
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth940
revision3

InterfaceNameTriggerDescriptionTypeEndpoint or TopicSchema
1HealthCheck30 second interval (configurable)NCMP is to perform a health check against each of the DMI PluginsRESThttp://'$1'/manage/health/readiness
2CMHandle trust level changeA CMHandle managed by DMI Plugin's trust level has changed

data contains {trustLevel: ENUM} 

event id is cmhandle id

KafkaTBD

<cloudEvents-header>

  id : <cmhandleId>

  type : org.onap.cm.events.trustlevel-notification


  data : {

                                trustlevel : "COMPLETE"

  }

3TrustLevel RequestClient RequestTrustLevel is to be returned based on the values in above MapsRESTTBD

Managing TrustLevels

DMI Plugins

  1. NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the DMI Trust Map
  2. IF a DMI Plugin goes down, that DMI Plugin's trust level is updated to NONE in the DMI Trust Map
  3. IF a DMI Plugin comes back up, Trust level is set back to COMPLETE.

CMHandles HB

  1. It is the responsibility of the DMI Plugins to update NCMP about the HBs of CMHandles
  2. Through interface 4, DMI Plugins will provide a kafka event on the changing of trustworthiness state of a CMHandle.
    1. NCMP receives this event and updates the Untrustworthy CMHandles Set accordingly
  3. Needs to be able to handle a throughput of 60,000 State changes per minute for 2 instances

Reading Trust Level

  1. Body of request to be discussed, Will the request provide a DMI or a list of CMHandles?
  2. Interface 3
  3. NCMP will first check DMI Trust Map for the CMHandle
    1. If that DMI which is managing the CMHandle is marked as untrustworthy then we return NONE without checking the Untrustworthy CMHandles Map
    2. If that DMI is trustworthy, we check the individual CMHandles Map, if the CMHandle is in the Map then return NONE.
  4. Logically IF (DMITrustMap.getDMIPlugin.getTrustLevel ==  NONE) Return NONE
    1. ELSE (IF UntrustworthyCMHandlesMap.getDMIPlugin.contains(CMHandle) RETURN NONE
    2. ELSE return COMPLETE