Overview
DataLake is a software component of ONAP that can systematically persist the events in DMaaP into supported Big Data storage systems. It has a Admin UI, where a system administrator configures which Topics to be monitored, and to which data storage to store the data. It is also used to manage the settings of the storage and associated data analytics tool. The second part is the Feeder, which does the data transfer work and is horizontal scalable. In the next release, R7, we will add the third component, Data Exposure Service (EDS), which will expose the data in the data storage via REST API for other ONAP components and external systems to consume. Each data exposure only requires simple configurations.
Architecture Diagram
Data Exposure Service will be available in R7.
Artifacts
Βlueprint (deployment artifact) :
Input file (deployment input) :
Docker image : nexus3.onap.org:10001/onap/<>
Deployment Prerequisite/dependencies
In R6, the following storage are supported:
MongoDB
Couchbase
Elasticsearch and Kibana
HDFS
To use DataLake, you need to have at least one of these systems ready. Once DataLake is deployed, you can configure Topic and storage in the DataLake Admin UI.
Deployment Steps
Build helm repository
Preconfiguration of tiller and helm
DL-handler consists of two pods- the feeder and admin UI. It can be deployed by using cloudify Helm plug-in. Since the tiller IP address and port should be exposed and accessed by Helm, some additional configuration steps are needed. the detailed pre-configuration steps can be found in the Cloudify Helm Plugin wiki page.
Transfer blueprint component inputs file in DCAE bootstrap POD under / directory
Next, the cloudify input file of datalake should be placed into bootstrap pod. The input file can be found in ONAP git repository. Once you clone the repository, the blueprint file could be copied to the DCAE bootstrap pod through the command line.
kubectl cp <source directory>/components/datalake-handler/dpo/blueprint/k8s-datalake-helm-input.yaml <DCAE bootstrap pod>:/blueprint -n onap
Log-in to the DCAE bootstrap POD's main container
The following command lets you log in to the DCAE bootstrap pod.
kubectl exec -it <DCAE bootstrap pod> /bin/bash -n onap
Validate blueprint
cfy blueprints validate /blueprints/k8s-dl-handler.yaml
Upload the blueprint to cloudify manager.
cfy blueprint upload -b datalake /bluerint/k8s-helm.yaml
Verify Plugin versions in target Cloudify instance match to blueprint imports
If the version of plugin used is different, update the blueprint import to match.
cfy plugins list
Customization of Blueprint Input File
Before deployment, the input file should be edited to point to your tiller service and helm repository. The input file should be placed in /blueprints.
tiller-server-ip: <YOUR_CLUSTER_IP> tiller-server-port: <TILLER_EXPOSED_PORT> namespace: onap chart-repo-url: <YOUR_HELM_REPO> stable-repo-url: <YOUR_STABLE_HELM_REPO> chart-version: 1.0.0 component-name: dcae-datalake
Deploy Service
cfy install -b datalake -d datalake-deployment -i /blueprints/k8s-datalake-helm-input.yaml /blueprints/k8s-helm.yaml
To Un-deploy
Uninstall running component and delete deployment
cfy uninstall datalake-deployment
Delete blueprint
cfy blueprints delete datalake
Initial Validation
After deployment, verify if dl-handler POD and mongoDB pod are running correctly
root@k8s-rancher:~# kubectl get pods -n onap | egrep "dl-handler"
And then check the logs to see if it can connect to DMaaP, polling for events.
Functional tests
Following default configuration is loaded into dl-handler (set in blueprint configuration)
<Add below steps to configure DL-Handler to subscribe and feed into external DL with step-by-step procedure>
Dynamic Configuration Update
As the dl-handler service periodically polls Consul KV using configbindingService api's - the run time configuration of dl-handler service can be updated dynamically without having to redeploy/restart the service. The updates to configuration can be triggered either from Policy (or CLAMP) or made directly in Consul.
Locate the servicename by executing into dl-handler Service pod and getting env HOSTNAME value
root@k8s-rancher:~# kubectl exec -it -n onap dep-s78f36f2daf0843518f2e25184769eb8b-dcae-dl-handler-servithzx2 /bin/bash Defaulting container name to s78f36f2daf0843518f2e25184769eb8b-dcae-dl-handler-service. Use 'kubectl describe pod/dep-s78f36f2daf0843518f2e25184769eb8b-dcae-dl-handler-servithzx2 -n onap' to see all of the containers in this pod. misshtbt@s78f36f2daf0843518f2e25184769eb8b-dcae-dl-handler-service:~/bin$ env | grep HOSTNAME HOSTNAME=s78f36f2daf0843518f2e25184769eb8b-dcae-dl-handler-service
Change the configuration for Service in KV-store through UI and verify if updates are picked
http://<k8snodeip>:30270/ui/#/dc1/kv/
Consul Snapshot <>