This page uses 2 VMs in China Mobile lab as an example to demonstrate how to setup a development environment.
We have 2 VMs:
datalake01: 172.30.1.74
datalake02: 172.30.1.75
- Setup host names on both VMs and your local PC
On both VMs and your local PC, sudo vi /etc/hosts, add these lines: (In windows, the file is C:\Windows\System32\drivers\etc\hosts.)172.30.1.74 message-router-zookeeper message-router-kafka dl_couchbase dl_mariadb dl_mongodb
172.30.1.75 dl_es dl_druid dl_superset - Install JDK 8 and Docker on both VMs and local
sudo apt install openjdk-8-jdk-headless
Docker install document: https://docs.docker.com/install/linux/docker-ce/ubuntu/
I install Docker on a Linux VM running in my local Windows.
Install Docker Compose: https://docs.docker.com/compose/install/ - Setup ONAP development environment
(Ref Setting Up Your Development Environment)
On your local PC,cd ~/.m2 (On Windows, it is C:\Users\your_name\.m2)
mv settings.xml settings.xml-old
wget https://raw.githubusercontent.com/onap/oparent/master/settings.xml - Check out source code
Check out DataLake source code from https://gerrit.onap.org/r/#/admin/projects/dcaegen2/services to C:\git\onap\dcaegen2\services2 or ~/git/onap/dcaegen2/services2. Currently DataLake Feeder is hosted in ONAP repo as a DCAE component handler. - Setup MariaDB
(Ref https://mariadb.com/kb/en/library/installing-and-using-mariadb-via-docker/)
On datalake01,
sudo docker run -p 3306:3306 --name mariadb -e MYSQL_ROOT_PASSWORD=mypass -d mariadb/server:10.3
Connect to database as root with the password as above, then runGRANT ALL PRIVILEGES ON *.* TO dl@"%" IDENTIFIED BY 'dl1234' WITH GRANT OPTION;
and scripts in C:\git\onap\dcaegen2\services2\components\datalake-handler\feeder\src\assembly\scripts\init_db.sql - Setup Kafka
(Ref https://kafka.apache.org/quickstart)
This and the following 2 steps describe setting up and using your own Kafka for development and testing. For using ONAP DMaaP, see step "Use DMaaP as data source".
On datalake01,mkdir ~/kafka
cd ~/kafka
wget http://archive.apache.org/dist/kafka/2.0.0/kafka_2.11-2.0.0.tgz
tar -xzf kafka_2.11-2.0.0.tgz
cd ~/kafka/kafka_2.11-2.0.0
sudo vi config/server.properties
add this line:listeners=PLAINTEXT://172.30.1.74:9092
To start Zookeeper and Kafka:
nohup bin/zookeeper-server-start.sh config/zookeeper.properties > zk.log &
nohup bin/kafka-server-start.sh config/server.properties > kf.log &Btw, here are commands to stop them:
bin/zookeeper-server-stop.sh
bin/kafka-server-stop.sh - Create test Kafka topics
On datalake01cd ~/kafka/kafka_2.11-2.0.0
./bin/kafka-topics.sh --create --zookeeper message-router-zookeeper:2181 --replication-factor 1 --partitions 1 --topic AAI-EVENT
In case you want to reset the topics, here are the scripts to delete them:
./bin/kafka-topics.sh --create --zookeeper message-router-zookeeper:2181 --replication-factor 1 --partitions 1 --topic unauthenticated.DCAE_CL_OUTPUT
./bin/kafka-topics.sh --create --zookeeper message-router-zookeeper:2181 --replication-factor 1 --partitions 1 --topic unauthenticated.SEC_FAULT_OUTPUT
./bin/kafka-topics.sh --create --zookeeper message-router-zookeeper:2181 --replication-factor 1 --partitions 1 --topic msgrtr.apinode.metrics.dmaap./bin/kafka-topics.sh --zookeeper message-router-zookeeper:2181 --delete --topic AAI-EVENT
./bin/kafka-topics.sh --zookeeper message-router-zookeeper:2181 --delete --topic unauthenticated.DCAE_CL_OUTPUT
./bin/kafka-topics.sh --zookeeper message-router-zookeeper:2181 --delete --topic unauthenticated.SEC_FAULT_OUTPUT
./bin/kafka-topics.sh --zookeeper message-router-zookeeper:2181 --delete --topic msgrtr.apinode.metrics.dmaap - Load test data to Kafka
The test data files are checked out from source depot in previous step "Check out source code".
On datalake01
cd ~/kafka/kafka_2.11-2.0.0./bin/kafka-console-producer.sh --broker-list message-router-kafka:9092 --topic AAI-EVENT < ~/git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/AAI-EVENT-100.json
./bin/kafka-console-producer.sh --broker-list message-router-kafka:9092 --topic unauthenticated.DCAE_CL_OUTPUT < ~/git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/DCAE_CL_OUTPUT-100.json
./bin/kafka-console-producer.sh --broker-list message-router-kafka:9092 --topic unauthenticated.SEC_FAULT_OUTPUT < ~/git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/SEC_FAULT_OUTPUT-100.json
./bin/kafka-console-producer.sh --broker-list message-router-kafka:9092 --topic msgrtr.apinode.metrics.dmaap < ~/git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/msgrtr.apinode.metrics.dmaap-100.json
To check if the data is successfully loaded, one can read the data:bin/kafka-console-consumer.sh --bootstrap-server message-router-kafka:9092 --topic AAI-EVENT --from-beginning
bin/kafka-console-consumer.sh --bootstrap-server message-router-kafka:9092 --topic unauthenticated.DCAE_CL_OUTPUT --from-beginning
bin/kafka-console-consumer.sh --bootstrap-server message-router-kafka:9092 --topic unauthenticated.SEC_FAULT_OUTPUT --from-beginning
bin/kafka-console-consumer.sh --bootstrap-server message-router-kafka:9092 --topic msgrtr.apinode.metrics.dmaap --from-beginning - Setup MongoDB
On datalake01,
sudo docker run -d -p 27017:27017 --name mongodb mongo - Setup Couchbase
On datalake01,- Start docker
sudo docker pull couchbase/server-sandbox:6.0.0
sudo docker run -d --name couchbase -p 8091-8094:8091-8094 -p 11210:11210 couchbase/server-sandbox:6.0.0 - Create user and bucket
Access http://dl_couchbase:8091/ , use login: "Administrator/password".
Create bucket "datalake", with memory quota 100MB.
Create user dl/dl1234 , with “Application Access” and "Views Admin" roles to bucket "datalake".
- Start docker
- Setup ElasticSearch & Kibana
(Ref https://docs.swiftybeaver.com/article/33-install-elasticsearch-kibana-via-docker)
On datalake02,sudo docker pull docker.elastic.co/elasticsearch/elasticsearch:6.6.1
sudo docker run -d --rm -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elastic docker.elastic.co/elasticsearch/elasticsearch:6.6.1
sudo docker pull docker.elastic.co/kibana/kibana:6.6.1
sudo docker run -d --rm --link elastic:dl_es -e "ELASTICSEARCH_URL=http://dl_es:9200" -p 5601:5601 --name kibana docker.elastic.co/kibana/kibana:6.6.1 - Create test Indices in ElasticSearch
Indices should be auto created by Feeder.
To access Kibana: http://dl_es:5601/ .
In case you want to reset the Indices, here are the scripts to delete them:curl -X DELETE "dl_es:9200/aai-event?pretty"
curl -X DELETE "dl_es:9200/unauthenticated.dcae_cl_output?pretty"
curl -X DELETE "dl_es:9200/unauthenticated.sec_fault_output?pretty"
curl -X DELETE "dl_es:9200/msgrtr.apinode.metrics.dmaap?pretty"
- Setup Druid
(Ref http://druid.io/docs/latest/tutorials/index.html)
On datalake02, (This has to be on datalake02, since: 1. Druid uses port 8091, which is also used by Couchbase; 2. Druid uses its own Zookeeper, and we already installed one on datalake01. )mkdir ~/druid
cd ~/druid
wget https://www-us.apache.org/dist/incubator/druid/0.14.0-incubating/apache-druid-0.14.0-incubating-bin.tar.gz
tar -xzf apache-druid-0.14.0-incubating-bin.tar.gz
cd ~/druid/apache-druid-0.14.0-incubating
vi ~/druid/apache-druid-0.14.0-incubating/quickstart/tutorial/conf/druid/middleManager/runtime.properties, update:
druid.worker.capacity=30
vi ~/druid/apache-druid-0.14.0-incubating/quickstart/tutorial/conf/druid/middleManager/jvm.config, update:
-Xmx640mcurl https://archive.apache.org/dist/zookeeper/zookeeper-3.4.11/zookeeper-3.4.11.tar.gz -o zookeeper-3.4.11.tar.gz
tar -xzf zookeeper-3.4.11.tar.gz
mv zookeeper-3.4.11 zk - Run Druid
cd ~/druid/apache-druid-0.14.0-incubating
nohup perl bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf > log.txt & - Submit Druid Kafka indexing service supervisors
(Ref http://druid.io/docs/latest/tutorials/tutorial-kafka.html)
We use the Druid Kafka indexing service to load data from Kafka. For each topic, we will need to submit a supervisor spec to Druid:cd ~/
curl -XPOST -H'Content-Type: application/json' -d @git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/AAI-EVENT-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
Druid tasks: http://dl_druid:8090
curl -XPOST -H'Content-Type: application/json' -d @git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/DCAE_CL_OUTPUT-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
curl -XPOST -H'Content-Type: application/json' -d @git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/SEC_FAULT_OUTPUT-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
curl -XPOST -H'Content-Type: application/json' -d @git/onap/dcaegen2/services2/components/datalake-handler/feeder/src/main/resources/druid/msgrtr.apinode.metrics.dmaap-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
Windows' version:
curl -XPOST -H"Content-Type: application/json" -d @C:\git\onap\dcaegen2\services2\components\datalake-handler\feeder\src\main\resources\druid\AAI-EVENT-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
curl -XPOST -H"Content-Type: application/json" -d @C:\git\onap\dcaegen2\services2\components\datalake-handler\feeder\src\main\resources\druid\DCAE_CL_OUTPUT-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
curl -XPOST -H"Content-Type: application/json" -d @C:\git\onap\dcaegen2\services2\components\datalake-handler\feeder\src\main\resources\druid\SEC_FAULT_OUTPUT-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
curl -XPOST -H"Content-Type: application/json" -d @C:\git\onap\dcaegen2\services2\components\datalake-handler\feeder\src\main\resources\druid\msgrtr.apinode.metrics.dmaap-kafka-supervisor.json http://dl_druid:8090/druid/indexer/v1/supervisor
Druid datasource: http://dl_druid:8081/#/datasources - Setup Superset
(Ref https://superset.incubator.apache.org/installation.html#start-with-docker)
On datalake02,
mkdir ~/superset
cd ~/superset
git clone https://github.com/apache/incubator-superset/
cd ~/superset/incubator-superset/contrib/docker
vi docker-compose.yml, add the external host dl_druid to service 'superset':extra_hosts:
- "dl_druid:172.30.1.75"sudo docker-compose run --rm superset ./docker-init.sh (You will be asked to provide a new username and password.)
- Run Superset
cd ~/superset/incubator-superset/contrib/docker
sudo docker-compose up -d
Setup Druid as data source
Open http://dl_superset:8088/ , using the login created in step 'Setup Superset', go to Sources → Druid Clusters → Add a new record (the '+' sign), and set:
Verbose Name=dl_druid
Broker Host=dl_druid
Cluster=dl_druid - Setup HDFS
If you already have a Hadoop cluster, set 'dlhdfs' to its NameNode IP in /etc/hosts. Otherwise, install a Cloudera QuickStart VM in Docker or other VM formats on datalake01, update /etc/hosts with the VM's IP.
Download image from http://www.cloudera.com/content/support/en/downloads/quickstart_vms.html.
For Docker, (Ref. https://www.cloudera.com/documentation/enterprise/5-13-x/topics/quickstart_docker_container.html)
gunzip cloudera-quickstart-vm-5.13.0-0-beta-docker.tar.gz
tar -xvf cloudera-quickstart-vm-5.13.0-0-beta-docker.tar
cd cloudera-quickstart-vm-5.13.0-0-beta-docker
sudo docker import cloudera-quickstart-vm-5.13.0-0-beta-docker.tar
sudo docker images
sudo docker run --name=hadoop --hostname=quickstart.cloudera --privileged=true -t -i -p 7180:7180 -p 8020:8020 -p 50075:50075 -p 50010:50010 8e5e161945f7_replace_with_yours /usr/bin/docker-quickstart
/home/cloudera/cloudera-manager --express
On the VM, create HDFS folder '/datalake', where the data is stored, and assign it to user 'dl':
sudo -u hdfs hadoop fs -mkdir /datalake
sudo -u hdfs hadoop fs -chown dl /datalake - Run DataLake Feeder in IDE
The Feeder is a Spring boot application. The entry point is org.onap.datalake.feeder.Application. Run the project in Eclipse as "Spring Boot App", once started, the app reads the topic list from Zookeeper, and starts pulling data from these Kafka topics, and insert the data to MongoDB, Couchbase and Elasticsearch. The data loaded to Kafka in step 'Load test data to Kafka' should appears in all the databases, and you should be able to use UI tools to view it.
The REST APIs provided by controllers are documented on Swagger page: http://localhost:1680/swagger-ui.html
- Create Docker image for deployment
To create Docker image in your local development environment, it is required to install Docker in local.
cd ~/git/onap/dcaegen2/services2/components/datalake-handler/feeder
mvn clean package
sudo docker build -t moguobiao/datalake-feeder -f src/assembly/Dockerfile . (Replace 'moguobiao' with your name)
Push docker image to dockerhubsudo docker login -u moguobiao -p password
sudo docker push moguobiao/datalake-feeder - Deploy Docker image
On datalake01,
sudo docker pull moguobiao/datalake-feeder
sudo docker run --rm -p 1680:1680 --name dl_feeder --add-host=message-router-kafka:172.30.1.74 --add-host=message-router-zookeeper:172.30.1.74 --add-host=dl_couchbase:172.30.1.74 --add-host=dl_mariadb:172.30.1.74 --add-host=dl_mongodb:172.30.1.74 --add-host=dlhdfs:172.30.1.74 --add-host=dl_es:172.30.1.75 moguobiao/datalake-feeder - Use DMaaP as data source
Add datalake01 to Kubernetes cluster
ONAP at China Mobile Lab is deployed as a Kubernetes cluster. For DL Feeder to connect to DMaaP’s Kafka and Zookeeper, we need to add VM datalake01 to the cluster. This is done by installing Rancher containers on the VM.Find Zookeeper and Kafka hosts
kubectl -n onap get pod -o wide | grep dmaap-message-router
In our instance, it returns
dev-dmaap-message-router-58cb7f9644-v5qvq 1/1 Running 0 53d 10.42.97.241 mr01-node3 <none>
dev-dmaap-message-router-kafka-6685877dc4-xkvrk 1/1 Running 0 53d 10.42.243.183 mr01-node2 <none>
dev-dmaap-message-router-zookeeper-bc76c44f4-6sfbx 1/1 Running 0 53d 10.42.13.227 mr01-node1 <none>So we update /etc/hosts on datalake01 with
10.42.13.227 message-router-zookeeper
10.42.243.183 message-router-kafkaRun Feeder
We are not able to run the docker container like in step “Deploy Docker image”, because even though VM datalake01 is within the Kubernetes cluster, the Feeder container is not. One way to solve it is to deploy the image into the Kubernetes cluster, which is illustrated in step “Deploy Docker image to Kubernetes cluster”. There is a simple way to run the Feeder for development and testing:Copy jar file C:\git\onap\dcaegen2\services2\components\datalake-handler\feeder\target\feeder-1.0.0-SNAPSHOT.jar to datalake01. This jar file was created in step “Create Docker image for deployment”, when running the Maven command.
Then run
nohup java -jar feeder-1.0.0-SNAPSHOT.jar > feeder.log &
- Deploy Docker image to Kubernetes cluster
TODO - ...