Remote K8s cluster deployment of DCAE MS

Overview

In its early releases, ONAP DCAE deployed all data collection and analysis components into a single Kubernetes cluster in a single central site.  Since some operators will use ONAP to deploy network functions at multiple geographically distinct sites, and since  it will often make sense to deploy DCAE data collection and analysis components close to the network functions the tools are monitoring,  it will necessary for DCAE to be able to deploy components to multiple Kubernetes clusters at multiple sites.  

Several things are needed to support deployment of DCAE components to remote sites.

  1. DCAE deploys components using blueprints that are processed by the Cloudify Manager orchestration tools.  The blueprints must provide a way to specify the target site for a deployment.  (Typically the site will be set at the time of deployment using an input to the blueprint, rather than being hardcoded into the blueprint.)

  2. The DCAE Kubernetes plugin that Cloudify Manager invokes must be able to extract the target site from a blueprint and obtain information about the Kubernetes cluster at the target site (such as the address of the Kubernetes API server and the credentials needed to access the API server).

  3. Because we are not deploying separate, independent instances of the DCAE platform into each site, DCAE components running in remote locations will need a way to access services running at the central site.  These include the config binding service, which provides components with their configurations, and, in some scenarios, the Consul server and the DMaaP DR and MR servers.

Dublin Scope

In the Dublin release, we are providing an interim solution for remote deployment that requires no development outside of the DCAE project.   Ideally, there would be an ONAP-wide approach to support multiple sites, including a mechanism for bringing up new clusters and adding information about them to an ONAP-wide data store.  We do not have this in Dublin–instead, DCAE keeps its own store of data about remote Kubernetes clusters.

In Dublin, we delivered the following changes into DCAE:

  • We modified the Cloudify node type definitions for Kubernetes components to include a new node property, called location_id, that specifies the name of the site where the component should be deployed.

  • We modified the DCAE Kubernetes plugin for Cloudify to read the location_id for a component from the blueprint and to use the location_id to find the target Kubernetes cluster and to deploy the component into the target cluster.

  • We adopted the Kubernetes "kubeconfig" file format for storing information about the Kubernetes clusters available as deployment targets.   During the initial deployment of DCAE using Helm, we create a Kubernetes ConfigMap to hold the cluster information and automatically populate it with the data for the central site.   In the Dublin release, the ConfigMap must be edited manually to add clusters.  (As noted above, we believe there should be an ONAP-wide store for this data, and we hope that when we have such a store, the process of adding data for a cluster can be automated.)

  • We allow components deployed into remote sites to access central site services through proxies (using nginx as the server).  We created a Helm chart to deploy and configure the proxy.

There is more information about these changes in this presentation.

The remaining sections of this document describe how to add information to the cluster ConfigMap and how to use the Helm chart to deploy the proxy into remote sites.

Changes for Frankfurt Release (R6)

The proxy server for remote sites relies on having access from the remote site to the config-binding-service server at the central site. Prior to R6, we accomplished this by configuring a NodePort service on the central site exposing the config-binding-service http port (10000) and the https (10443) port. In R6, by default, we configure a ClusterIP service for config-service-service. This prevents the http port from being exposed outside the central site Kubernetes Cluster.

In addition, R6 changed how components get certificate for TLS. In prior releases, components that needed a certificate (a server certificate or just a CA certificate to use to validate servers) got the certificate using an init container (org.onap.dcaegen2.deployments.tls-init-container, version 1.0.3) that has the certificates "baked in" to the container image. In R6, the init container (org.onap.dcaegen2.deployments.tls-init-container, version 2.1.0) executes code that pulls a certificate from AAF. This will not work from a remote site because the necessary AAF services are not exposed there. We expect that work will be done for R7 to remedy this.

In the meantime, to use a remote, it will be necessary to deploy DCAE at the central site with these changes:

  1. Override dcaegen2.dcae-config-binding-service.service.type. Set it to "NodePort", overriding the current setting of "ClusterIP".

  2. Override global.tlsImage. Set it to "onap/org.onap.dcaegen2.deployments.tls-init-container:1.0.3". This will use the container with "baked in" certificates.

  3. Make sure all blueprints import "https://nexus.onap.org/service/local/repositories/raw/content/org.onap.dcaegen2.platform.plugins/R6/k8splugin/1.7.2/k8splugin_types.yaml",i.e., they need to use version 1.7.2 of the k8s plugin. (The blueprints loaded into inventory at deployment time currently meet this requirement.)

We expect significant changes to multi-site support in R7.

Note that as of this update (2020-03-09), there has been no testing of multi-site support in R6.

Additional References

Multisite init-container : https://git.onap.org/dcaegen2/deployments/tree/multisite-init-container/README.md

DCAE remote site setup charts : https://git.onap.org/dcaegen2/deployments/tree/dcae-remote-site