Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Related Jira(s)

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId4733707d-2057-3a0f-ae5e-4fd8aff50176
keyCPS-1415

Jira Legacy
serverSystem Jira
columnIdsissuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId4733707d-2057-3a0f-ae5e-4fd8aff50176
keyCPS-1638

Partial Demo

This demo includes the functional requirements Req. 2 and Req. 6 fully and Req. 4 is partially included.

CPS User Story Demos

Assumptions


AssumptionNotes 
1

Issues and Decisions

#Issue
Notes 
NotesDecision
1
how
How fast should CPS (and DB) be able to process max heart beat failures?is 60K really realistic if ENM goes down we should get a notification for each node do we ?!PoC has shown 60 seconds is reasonable
2Restart of NCMPShould/Can this be handled?As of now, there is no such case is being considered.
3Does DMI Plugin provide NCMP with a health check URL during registration? Either, just rely on the default one provided with Spring boot actuator?Document the contract. Its just the interface that matters and not the implementation.Spring boot actuator interface
4Error during cmHandle registration 

If an error occurs during registration what trustlevel should the cmHandle be set to? IN eth following scenarios

  1. When the user has provided an initial trustlevel of 'COMPLETE' (this information could be minuets old!) 
  2. When the user has provided an initial trustlevel of 'NONE'
  3. When the user has NOT provided a (valid) initial trustlevel



Agreed to Leave as is, if notification for a node already registered, we can process the other notification separately 


Team Notes: 

[Team]

When state was provided to 'COMPLETE' or 'NONE' and the registration fails , state if trust level is still set to the provided state regardless of the current state of the cm handle (deleted/deleting, advise, ready, locked)

5Module sync watchdog issues/error scenarios

If cmHandle is set to none/incomplete module sync will automatically retry (Is this acceptable?)

If the module sync fails we will still send a Complete message (Is this acceptable?)

Registering all cmHandles could take up to 20 mins, what should happen if the last sync fails as the notification would have been sent 20 mins ago?

When CMLevel is in:

DELETING/DELETED - No Truslevel notification update

ADVISE - No trustLevel notification update

READY - Truslevel notification update

LOCKED -Truslevel notification update


 


Team Notes: 

12/10/2023 [Team]

Notification SHALL only be sent when the Cm handle is set to Ready and locked regardless of the report from DMI 

Do we still update the cache? Yes.


6

When cm handle trustLevel state stays the same

Do we include that cm handle ID or not for notifications?

No you don't if no changes if it stays the the same


 


Team Notes:

12/10/2023 

Scenario: DMI plugin up/down

the previous state of the cm handle (trsutLevel) should be considered for notifications


Description

  1. Define scenarios which cause a CM Handle to go stale.
  2. Implement changes to support tracking of CM Handle Freshness/Staleness.

What might trigger a cmhandle cmHandle to go to STALE?

  1. dmi plugin identifies that the device is no longer contactable.
  2. dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself. 

Requirements

Functional

#InterfaceRequirementAdditional InformationSign-off
1CPS-NCMP-
I
E-
01
05The 'trustlevel'
can be queried (
is visible
) on the methos as currently 3
on all REST methods that currently include the 'cm handle state'existing endpoints 

 

2CPS-NCMP-E-05CM Handles can be queried (filter condition) on  'trustlevel'  

using a new 'trustLevel' condition (cannot use cpsPath condition)

 

3CPS-NCMP-I-01
Once a CM Handle is registered (TBD which state exactly?) the trust -level for that CM Handle should be  reported to be 'COMPLETE'

During registration, DMI plugin can report initial trustlevel.

If the state is not 'complete', it should be considered as 'Trustlevel change' (See req 5)

Initial trust level will be backward compatible if not set, we assume trustlevel is 'complete'

For a new cm-handle where the trustlevel is 'complete' this is NOT considered a chance and no notifications should be sent

  

4CPS-NCMP-
I
E-
01
05Once DMI (plugin) is detected to be down the trust-level for all affected CM Handles should be
reported
set to be 'NONE'
It might not need to be persisted....4
. This wil also lead to many notifcations as per req. #5

this might lead to a high level (20K) of notifications 

(need to discuss capabilities)

 

5CPS-NCMP-E-05.e

NCMP notification shall be sent when the trustlevel changes

Notification be sent externally based on Kafka 

many small or bulk: Agreed Many notifications, one for each cm-handle

  

6CPS-NCMP-I-01

REST or ASYNC TBD
.e

It shall be possible to report any trustlevel of one CM Handle

DMI plugin can report the current

trustlevel

trustLevel of a single

(or collection?) of

cm handle

(

id

)s

ie
i.e. the DMI can tell NCMP the
trustlevel
trustLevel is 'NONE' when a  node heartbeat failure is detected and 'COMPLETE' once it is restored
5Notification on trustlevel changes ?!
.
Again this should lead to notifications on the external interface as per req #5 

 

Error Handling

#Error ScenarioExpected behaviorSign-off
1NCMP restart
Options:
  • Trustlevels should as they were before the restart? (might depend on how much time has elapsed)
  • (preferred) Trustlevels
    (all instances)

    To be discussed, not sure if it can/should be handled

    TrustLevels should be 'NONE' and need to be restored using an audit-request (not in scope)

    2

    If we restart, it should go into COMPLETE STATE. No way of getting out of NONE State 


    Audit was agreed to be handled in a separate epic - Prioritise audit epic 

     


    Team Notes: 

    12/10/2023 

    **If all instances of NCMP restarts [fresh start], there would be nothing in the cache

    Characteristics

    #ParameterExpectationNotesSign-off
    1dmi-down detection speed60 seconds
    (TBD)23
    It's a configurable value. Agreed - Should be in parallel with device heartbeat. 

     

    2device heartbeat frequency (message emitted by DMI plugin for each device)60 secondsCan be removed - out of scope for this epic
    3maximum supported devices (by NCMP)60,0000Given #2 and #3 this means NCMP needs to process 60,000 message / minute! - Can be removed, separate epic - out of scope for this epic
    4maximum number of cm-handles down report by DMI in one request and/or per minute30,000 / minute 
    This looks like an 'ENM down' not sure if that should be handled this way
    a peak can be processed within 60 seconds

     

    5processing of all
    trustlevel
    trustLevel time for DMI-Down and/or peak load by DMI 1 secondAgreed to go with 30,000 / minute  as no 4

     

    6If we incorporate into searches endpoints the speed should not be impacted30 seconds

    Speed shouldn't be affected - 

    Agreed - It's across 60,0000 cmHandle

    Open for improvement in respect to performance

     


    Out-of-Scope

    1. This epic will only introduce trustlevels trustLevel NONE and COMPLETE. PARTIAL and POOR may be added later as below.
    2. Re-registration ie. resolve trutslevel i.e. resolving trutsLevel degradation is not in scope of this epic

    High Level Interactions

    Drawio
    bordertrue
    diagramNameStaleness Freshness Overview
    simpleViewerfalse
    width
    linksauto
    tbstyletop
    lboxtrue
    diagramWidth940939
    revision310

    InterfaceNameTriggerDescriptionTypeEndpoint or TopicSchema
    1HealthCheck30 second interval (configurable)NCMP is to perform a health check against each of the DMI PluginsREST

    http://

    '$1'

    <dmiPluginServiceName>/manage/health

    /readiness

    This endpoint will be the standard heath check endpoint provided by spring boot actuator. We don't store it anywhere. We just document it for now.


    2CMHandle trust level changeA CMHandle managed by DMI Plugin's trust level has changed

    data contains {trustLevel: ENUM} 

    event id is cmhandle id in kafka header

    Kafka
    TBD

    kafka topic:

    dmi-device-heartbeat

    <cloudEvents-header>

      id : <cmhandleId>

      type : org.onap.cm.events.trustlevel-notification

      data

    data : {

                                   

          trustlevel : "COMPLETE/NONE"

     

    }

    3
    TrustLevel Request
    CMHandle Query API with trustLevel Query ConditionClient Request
    TrustLevel

    CmHandle is to be returned based on the values in above

    MapsRESTTBD

    ...

    CMHandle Trust Map

    REST
    1. http://<host>:<port>/ncmp/v1/ch/id-searches
    2. http://<host>:<port>/v1/ch/searches 

    {
      "cmHandleQueryParameters": [
        {
            "conditionName""cmHandleWithTrustLevel",
            "conditionParameters": [ {"trustLevel""COMPLETE"} ]
        }
      ]
    }

    4Notification on Trust Level ChangeNCMP

    NCMP sends notification upon trust level changes

    Kafka

    kafka-topic:

    cm-events

    <cloudEvents-headers>

    "data": {
       "attributeValueChange": [  # Mandatory
            { 
             "attributeName"     : "trustLevel",
             "oldAttributeValue" : "NONE",
             "newAttributeValue" : "COMPLETE"
            }
        ]
    }

    Managing TrustLevel

    DMI Plugins

    1. NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the Trust Level DMI Trust Map
    2. IF a DMI Plugin goes down, that DMI Plugin's trust level health status is updated to NONE in the Trust Level DMI Trust MapMap
      1. The CM handles corresponding to DMI should be set to NONE.
    3. IF a DMI Plugin comes back up, Trust level Heath status is set back to COMPLETE .

    ...

    1. for that DMI plugin only.

      More details of health check URL can be accessed via:
      CPS-1857 Document watchdog job impl. with health check URL

    CMHandle Heartbeat

    1. It is the responsibility of the DMI Plugins to update NCMP about the HBs heartbeat of CMHandlesCMHandle.
    2. Through interface 42, DMI Plugins will provide a kafka Kafka event on the changing of trustworthiness state of a CMHandle.
      1. NCMP receives this event and updates the Untrustworthy CMHandles Set accordinglyCM Handle Trust Map accordingly
    3. Needs to be able to handle a throughput of 60,000 State changes per minute for 2 instances

    ...

    Query CM Handle with Trust Level

    1. Body of request

      to be discussed, Will the request provide a DMI or a list of CMHandles?

      will be in the format as below:

      Code Block
      languagetext
      titleSearch Trust Level Request Body
      {
        "cmHandleQueryParameters": [
          {
              "conditionName": "cmHandleWithTrustLevel",
              "conditionParameters": [ {"trustLevel": "COMPLETE"} ]
          }
        ]
      }


      There are two end points will be subject to query:
      http://<host>:<port>/ncmp/v1/ch/id-searches
      http://<host>:<port>/v1/ch/searches 

    2. Interface 3
    3. NCMP will first check DMI Trust Map for the CMHandle
      1. If that DMI which is managing the CMHandle is marked as untrustworthy then we return NONE without checking the Untrustworthy CMHandles Map
      2. If that DMI is trustworthy, we check the individual CMHandles Map, if the CMHandle is in the Map then return NONE.
      Logically IF (DMITrustMap.getDMIPlugin.getTrustLevel ==  NONE) Return NONE
    4. ELSE (IF UntrustworthyCMHandlesMap.getDMIPlugin.contains(CMHandle) RETURN NONE
    5. ELSE return COMPLETEtrust level query parameters to determine which trust level (NONE, COMPLETE) is being searched.
      1. Then, the trust level for both DMI and CM Handle should be compared, and minimum of two (effective trust level)
        must be selected.
      2. If the target trust level (comes from the request) is equal to effective trust level (obtained in step a.),
        then cm handle should be included in the response.

    Notifications on Trust Level Changes

    NCMP will send timely notifications in case of any alterations in a device's trust level via Kafka interface.

    Proposal for Notification's Schema

    kafka-key : cmHandleId ( *Note : when publishing the notification , use the cmHandleId as the key of the message. This will enable clients to read the most updated message/state when the compaction is triggered)

    Cloud event Definition


    Element

    Name

    Parent

    Type

    Mandatory

    Description

    Format

    (example) Value

    1Headerid
    StringYesrandom id for cloud event header. UUID is suggested

    2source
    StringYessource of informationncmp.<cmhandle-id>ncmp.12ac34e43556e
    3specversion
    StringYescloud event version specfixed value1.0
    4type
    StringYestype of eventfixed valuetrustLevelChangeEvent
    5dataschema
    StringYesdata schemafixed valueorg.onap.cps.ncmp.events.cmhandle.TrustLevelChangeEvent:1.0.0
    6correlationid
    StringYesThe cmHandle which is been notified. The value will be similar as we have in the source field.<cmhandle-id>
    7Payloaddata
    ObjectYesThe actual data payload. Details will be provided below.3GPP TS 28.532 standard
    8attributeNamedataStringYesThe attribute which has changed.<name>trustLevel
    9oldAttributeValuedataStringNoThe old value of the attribute which has changed.
    COMPLETE
    10newAttributeValuedataStringNoThe new value of the attribute which has changed.
    NONE