DES
Overview
DataLake is a software component of ONAP that can systematically persist the events in DMaaP into supported Big Data storage systems. It has a Admin UI, where a system administrator configures which Topics to be monitored, and to which data storage to store the data. It is also used to manage the settings of the storage and associated data analytics tool. The second part is the Feeder, which does the data transfer work and is horizontal scalable. In R7, a third component, Data Extraction Service (DES), which will expose the data in the data storage via REST API for other ONAP components and external systems to consume. Each data exposure only requires simple configurations.
Architecture Diagram
DES Architecture
Data Extraction Service exposes the data in Big Data DBs to outside via REST API
Each data exposure is defined via a set of configurations, which contains URL, from which DB to retrieve the data, query sql template, etc
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.
Deployment Prerequisite/dependencies
Since datalake can log the message from the DMaap to several different external databases, such as Elasticsearch, Couch Base, MongoDB, Relational databases...etc. Once Datalake is successfully deployed, you can start to configure the external databases through our admin UI.
Make sure there is a mariadb-galera cluster has deployed in onap.
If there is a database "datalake" existing in maradb, drop it.
Deployment Steps
0. If mariadb-galera has installed, skip this step, follow the way to install maridb below:
helm install local/mariadb-galera --namespace onap --name dev-mariadb-galera --set global.pullPolicy=IfNotPresent --set global.masterPassword=onap
Intall mongodb service for testing:docker run -itd --restart=always --name dl-mongo -p 27017:27017 mongo
Build presto image and push the images to a exsting repository:
the package of presto version using is v0.0.2:presto-v0.0.2.tar.gz
docker build -t presto:v0.0.2 .
docker tag presto:v0.0.2 registry.baidubce.com/onap/presto:v0.0.2
docker push registry.baidubce.com/onap/presto:v0.0.2
Note: Replace the repository path with your own repository.
3. Install presto service:
kubectl -n onap run dl-presto --image=registry.baidubce.com/onap/presto:v0.0.2 --env="MongoDB_IP=192.168.235.11" --env="MongoDB_PORT=27017"
kubectl -n onap expose deployment dl-presto --port=9000 --target-port=9000 --type=NodePort
Note: MonoDB_IP and Mongo_PORT you can replace this two values with your own configuration.
4. Check presto service:
5. Like other services in DCAE, des also can be easily deployed through DCAE cloudify manager. The following steps guides you launch des though cloudify manager.
cfy blueprints upload -b des /blueprints/k8s-datalake-des.yaml
cfy deployments create -b des des
cfy executions start -d des install
Functional tests
1. create a database "datalake" in mongodb
2. insert some data sample in datalake
3. create a sql query template you want to query, insert into maridb table: data_exposure. sample like below:
you can refer :https://prestosql.io/docs/current/language/types.html#structural for sql query design.
insert into data_exposure(id, note,sql_template,db_id) values ('test','test_des', ' select event.commonEventHeader.sourceName as name, event.perf3gppFields.measDataCollection.measuredEntityDn as entity from datalake where event.commonEventHeader.sourceName = ''${name}'' ',3);
4. result query via postman