Related Jira(s)
Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.
Issues and Decisions
# | Issue | Notes | Decision |
---|---|---|---|
1 | how fast should CPS (and DB) be able to process max heart beat failures | is 60K really realistic if ENM goes down we should get a notification for each node do we ?! |
Description
- Define scenarios which cause a CM Handle to go stale
- Implement changes to support tracking of CM Handle Freshness/Staleness
What might trigger a cmhandle to go to STALE?
- dmi plugin identifies that the device is no longer contactable
- dmi plugin identifies that an underlying device manager managing the device (node) is out of sync with the device itself.
Requirements
Functional
# | Interface | Requirement | Additional Information |
---|---|---|---|
1 | CPS-NCMP-I-01 | A Rest endpoint to allow DMI Plugin Reregistration, A kafka interface for DMI Plugin to provide trust level state changes for a CM Handle | Reregistration is to reregister all CMHandles managed by a CMHandle. Kafka interface schema allows for CMHandle as id and trust level as only value in data |
2 | DMI-I-01 | A Rest endpoint to trigger DMI Plugin Reregistration | Asynchronous interaction to trigger DMI Plugin to hit endpoint in CPS-NCMP-I-01 with reregistration |
Error Handling
# | Error Scenario | Expected behavior |
---|---|---|
1 | DMI Plugin goes down | CMHandles managed by that DMI have NONE trust level, when the DMI comes back up, a reregistration process occurs, CMHandles are individually assessed for trust level then. |
2 | Node goes down | DMI Plugin informs NCMP of the trust level state change. DMI will update on changes to a cmhandles trust level change. |
Capabilities
- re-registration, once a day, same requirement as first time registration
- single node heart beat failures 30,000 / minute per instance
- Kafka should be able handle 2 instances i.e. 60K notifications in one minute
- What is expected/realistic here from CPS-NCMP see open issue #1
Scope
- Currently only supporting NONE and COMPLETE. PARTIAL and POOR may be added later as below.
Reregistration
- This process occurs when the DMI Plugin Availability is down and then comes back up.
- NCMP makes a synchronous call to the DMI Plugin (New Audit Endpoint) to trigger a reregistration
- DMI Plugin then reregisters its CMHandles with NCMP (new reregistaration Endpoint?)
- NCMP then compares the CMHandles which are being reregistered with the CMHandles which already exist.
- CMHandles which are in NCMP but not in DMI reregistration request are kept as trust level none
- What happens if there is conflict between the old and new properties of a CMHandle, just take the new properties?
- New CMHandles could be registered
High Level Interactions
Interface | Name | Trigger | Description | Type | Endpoint or Topic | Schema |
---|---|---|---|---|---|---|
1 | HealthCheck | 30 second interval (configurable) | NCMP is to perform a health check against each of the DMI Plugins | REST | http://'$1'/manage/health/readiness | |
2 | Reregistration request | DMI Plugin has gone down and comes back up | NCMP makes a call to that DMI Plugin telling it to reregister | REST | TBD | |
3 | Reregistration | DMI Plugin received a reregistration request | DMI Plugin makes a call to NCMP to reregister its CM Handles | REST | /v1/ch/reregistration | { "dmiPlugin": "my-dmi-plugin", "dmiModelPlugin": "my-dmi-model-plugin", "cmHandles": [ { "cmHandle": "my-cm-handle", "publicCmHandleProperties": { "key": "my-property" }, "cmHandleProperties": { "key": "my-property" } }, { "cmHandle": "my-cm-handle", "publicCmHandleProperties": { "key": "my-property" }, "cmHandleProperties": { "key": "my-property" } } ], "dmiDataPlugin": "my-dmi-data-plugin" } |
4 | CMHandle trust level change | A CMHandle managed by DMI Plugin's trust level has changed | data contains {trustLevel: ENUM} event id is cmhandle id | Kafka | TBD | <cloudEvents-header> id : <cmhandleId> type : org.onap.cm.events.trustlevel-notification data : { trustlevel : "COMPLETE" } |
5 | TrustLevel Request | Client Request | TrustLevel is to be returned based on the values in above Maps | REST | TBD |
Managing TrustLevels
DMI Plugins
- NCMP is checking every DMI Plugin for health at interface 1 every 30 seconds using the DMI Trust Map
- IF a DMI Plugin goes down, that DMI Plugin's trust level is updated to NONE in the DMI Trust Map
- IF a DMI Plugin comes back up, NCMP requests that DMI Plugin to do a reregistration via interface 2
- That DMI reregisters itself using interface 3
- NCMP analyses the registration and compares the CMHandles it knows about to the CMHandles which have been reregistered, any CMHandles which did exist and are not reregistered are now added to the untrustworthy CMHandles Map. Existing CMHandles are updated and new ones are created.
- After the reregistration is complete, the DMI Plugin TrustMap is updated to set the trust level for that DMI to complete.
CMHandles HB
- It is the responsibility of the DMI Plugins to update NCMP about the HBs of CMHandles
- Through interface 4, DMI Plugins will provide a kafka event on the changing of trustworthiness state of a CMHandle.
- NCMP receives this event and updates the Untrustworthy CMHandles Map accordingly
- Needs to be able to handle a throughput of 30,000 State changes per minute
Reading Trust Level
- Body of request to be discussed, Will the request provide a DMI or a list of CMHandles?
- Interface 5
- NCMP will first check DMI Trust Map for the CMHandle
- If that DMI which is managing the CMHandle is marked as untrustworthy then we return NONE without checking the Untrustworthy CMHandles Map
- If that DMI is trustworthy, we check the individual CMHandles Map, if the CMHandle is in the Map then return NONE.
- Logically IF (DMITrustMap.getDMIPlugin.getTrustLevel == NONE) Return NONE
- ELSE (IF UntrustworthyCMHandlesMap.getDMIPlugin.contains(CMHandle) RETURN NONE
- ELSE return COMPLETE