Cloud Native Deployment
Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-SingleVM
ONAP on deployed by or RKE managed by on | ||||
VMs | Microsoft Azure | Google Cloud Platform | OpenStack | |
Managed | Amazon EKS | AKS | ||
Sponsor | Amazon (384G/m - 201801 to 201808) - thank you Michael O'Brien (201705-201905) Amdocs - 201903+ | Microsoft (201801+) Amdocs | Intel/Windriver (2017-) |
This is a private page under daily continuous modification to keep it relevant as a live reference (don't edit it unless something is really wrong) https://twitter.com/_mikeobrien | https://www.linkedin.com/in/michaelobrien-developer/ For general support consult the official documentation at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html and https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_cloud_setup_guide.html and raise DOC JIRA's for any modifications required to them. |
---|
This page details deployment of ONAP on any environment that supports Kubernetes based containers.
Chat: http://onap-integration.eastus.cloudapp.azure.com:3000/group/onap-integration
Separate namespaces - to avoid the 1MB configmap limit - or just helm install/delete everything (no helm upgrade)
https://kubernetes.slack.com/messages/C09NXKJKA/?
https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf
Deployment Profile
28 pods, 196 pods including vvp without the filebeat sidecars - 20181130 - this number is when all replicaSets and DaemonSets are set to 1 - which is 241 instances in the clustered case
Docker images currently size up to 75G as of 20181230
After a docker_prepull.sh |
---|
/dev/sda1 389255816 77322824 311916608 20% / |
Type | VMs | Total RAM vCores HD | VM Flavor | K8S/Rancher Idle RAM | Deployed | Deployed ONAP RAM | Pods | Containers | Max vCores | Idle vCores | HD/VM | HD NFS only | IOPS | Date | Cost | branch | Notes deployment post 75min |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Full Cluster (14 + 1) - recommended | 15 | 224G 112 vC 100G/VM | 16G, 8 vCores C5.2xLarge | 187Gb | 102Gb | 28 | 248 total 241 onap 217 up 0 error 24 config | 18 | 6+G master | 8.1G | 20181106 | $1.20 US/hour using the spot market | C | ||||
Single VM (possible - not recommended) | 1 | 432G 64 vC 180G | 256G+ 32+ vCores | Rancher: 13G Kubernetes: 8G Top: 10G | 165Gb (after 24h) | 141Gb | 28 | 240 total 196 if RS and DS are set to 1 | 55 | 22 | 131G (including 75G dockers) | n/a | Max: 550/sec Idle: 220/sec | 20181105 20180101 | C | Tested on 432G/64vCore azure VM - R 1.6.22 K8S 1.11 updated 20190101 | |
Developer 1-n pods | 1 | 16G | 16/32G 4-16 vCores | 14Gb | 10Gb | 3+ | 120+G | n/a | C | AAI+robot only |
Security
The VM should be open with no CIDR rules - but lock down 10249-10255 with RBAC
If you get an issue connecting to your rancher server "dial tcp 127.0.0.1:8880: getsockopt: connection refused" - this is usually security related - this line is the first to fail for example
https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh#n117
check the server first - either of these - but if the helm version hangs on "server" - the ports have an issue - run with all tcp/udp ports open 0.0.0.0/0 and ::/0 - and lock down the API on 10249-10255 via oauth github security from the rancher console to keep out crypto miners.
Example 15 node (1 master + 14 nodes) OOM Deployment
Rancher 1.6.25, Kubernetes 1.11.5, Docker 17.03, Helm 2.9.1
empty
With ONAP deployed
Throughput and Volumetrics
Cloudwatch CPU Average
Specific to logging - we have a problem on any VM that contains AAI - the logstash container is being saturated there - see the 30+ percent VM - - LOG-376Getting issue details... STATUS
NFS Throughput for /dockerdata-nfs
Cloudwatch Network In Max
Cost
Using the spot market on AWS - we ran a bill of $10 for 8 hours of 15 VM's of C5.2xLarge - (includes EBS but not DNS, EFS/NFS)
Details: 20181106:1800 EDT master
ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | wc -l 248 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | wc -l 241 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 217 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '0/|1/2' | wc -l 24 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces -o wide | grep onap | grep -E '0/|1/2' onap onap-aaf-aaf-sms-preload-lvqx9 0/1 Completed 0 4h 10.42.75.71 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-distcenter-ql5f8 0/1 Completed 0 4h 10.42.75.223 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-testca-7rzcd 0/1 Completed 0 4h 10.42.18.37 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-aai-aai-graphadmin-create-db-schema-26pfs 0/1 Completed 0 4h 10.42.14.14 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aai-aai-traversal-update-query-data-qlk7w 0/1 Completed 0 4h 10.42.88.122 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-gmmvj 0/1 Completed 0 4h 10.42.111.99 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-n6fw4 0/1 Error 0 4h 10.42.21.12 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-nc8ww 0/1 Error 0 4h 10.42.109.156 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-xcxds 0/1 Error 0 4h 10.42.152.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-node-6496d8f55b-jfvrm 0/1 Init:0/1 28 4h 10.42.95.32 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-prov-86f79c47f9-tldsp 0/1 CrashLoopBackOff 59 4h 10.42.76.248 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-oof-music-cassandra-job-config-7mb5f 0/1 Completed 0 4h 10.42.38.249 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-oof-oof-has-healthcheck-rpst7 0/1 Completed 0 4h 10.42.241.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-oof-oof-has-onboard-5bd2l 0/1 Completed 0 4h 10.42.205.75 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-qshzn 0/2 Completed 0 4h 10.42.112.46 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-rk4m2 0/2 Init:Error 0 4h 10.42.57.79 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-2vw2q 0/1 Completed 0 4h 10.42.87.181 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-k57lh 0/1 Init:Error 0 4h 10.42.148.79 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-sdc-sdc-cs-config-cassandra-vgnz2 0/1 Completed 0 4h 10.42.111.187 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-sdc-sdc-es-config-elasticsearch-lkb9m 0/1 Completed 0 4h 10.42.20.202 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-onboarding-be-cassandra-init-7zv5j 0/1 Completed 0 4h 10.42.218.1 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-sdc-sdc-wfd-be-workflow-init-q8t7z 0/1 Completed 0 4h 10.42.255.91 ip-172-31-41-30.us-east-2.compute.internal <none> onap onap-vid-vid-galera-config-4f274 0/1 Completed 0 4h 10.42.80.200 ip-172-31-33-223.us-east-2.compute.internal <none> onap onap-vnfsdk-vnfsdk-init-postgres-lf659 0/1 Completed 0 4h 10.42.238.204 ip-172-31-38-194.us-east-2.compute.internal <none> ubuntu@ip-172-31-40-250:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-223.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.222.148.116 18.222.148.116 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-111.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 3.16.37.170 3.16.37.170 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-207.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.225.32.201 18.225.32.201 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-36-163.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.58.189.251 13.58.189.251 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-24.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.224.180.26 18.224.180.26 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-59.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.191.248.14 18.191.248.14 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-194.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.217.45.91 18.217.45.91 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-95.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 52.15.39.21 52.15.39.21 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-39-138.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.224.199.40 18.224.199.40 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-110.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.223.151.180 18.223.151.180 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-229.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.218.252.13 18.218.252.13 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-30.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 3.16.113.3 3.16.113.3 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-42-33.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.59.2.86 13.59.2.86 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-45-152.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.219.56.50 18.219.56.50 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ubuntu@ip-172-31-40-250:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-33-223.us-east-2.compute.internal 852m 10% 13923Mi 90% ip-172-31-34-111.us-east-2.compute.internal 1160m 14% 11643Mi 75% ip-172-31-34-207.us-east-2.compute.internal 1101m 13% 7981Mi 51% ip-172-31-36-163.us-east-2.compute.internal 656m 8% 13377Mi 87% ip-172-31-37-24.us-east-2.compute.internal 401m 5% 8543Mi 55% ip-172-31-37-59.us-east-2.compute.internal 711m 8% 10873Mi 70% ip-172-31-38-194.us-east-2.compute.internal 1136m 14% 8195Mi 53% ip-172-31-38-95.us-east-2.compute.internal 1195m 14% 9127Mi 59% ip-172-31-39-138.us-east-2.compute.internal 296m 3% 10870Mi 70% ip-172-31-41-110.us-east-2.compute.internal 2586m 32% 10950Mi 71% ip-172-31-41-229.us-east-2.compute.internal 159m 1% 9138Mi 59% ip-172-31-41-30.us-east-2.compute.internal 180m 2% 9862Mi 64% ip-172-31-42-33.us-east-2.compute.internal 1573m 19% 6352Mi 41% ip-172-31-45-152.us-east-2.compute.internal 1579m 19% 10633Mi 69%
Quickstart
Undercloud Install - Rancher/Kubernetes/Helm/Docker
Ubuntu 16.04 Host VM Configuration
key | value |
---|---|
Redhat 7.6 Host VM Configuration
see https://gerrit.onap.org/r/#/c/77850/
key | value |
---|---|
firewalld off | systemctl disable firewalld |
git, make, python | yum install git yum groupinstall 'Development Tools' |
IPv4 forwarding | add to /etc/sysctl.conf net.ipv4.ip_forward = 1 |
Networking enabled | sudo vi /etc/sysconfig/network-scripts/ifcfg-ens33 with ONBOOT=yes" |
General Host VM Configuration
Follow https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh
Run the following script on a clean Ubuntu 16.04 or Redhat RHEL 7.x (7.6) VM anywhere - it will provision and register your kubernetes system as a collocated master/host.
Ideally you install a clustered set of hosts away from the master VM - you can do this by deleting the host from the cluster after it is installed below and run the (docker, nfs and the rancher agent docker on each host)/
vm.max_map_count 64 to 256kb limit
The cd.sh script will fix your VM for this limitation first found in - LOG-334Getting issue details... STATUS . If you don't run the cd.sh script - run the following command manually on each VM so that any elasticsearch container comes up properly - this is a base OS issue.
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n49
# fix virtual memory for onap-log:elasticsearch under Rancher 1.6.11 - OOM-431 sudo sysctl -w vm.max_map_count=262144
Scripted RKE Kubernetes Cluster install
Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - Single VM
Prerequisites
Create a single VM - 256G+
See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS
Create a 0.0.0.0/0 ::/O open security group
Use github to OAUTH authenticate your cluster just after installing it.
Last test 20190305 using 3.0.1-ONAP
ONAP Development#Changemax-podsfromdefault110podlimit
# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # to be save edit/make sure dns resolution is setup to the host ubuntu@ld:~$ sudo cat /etc/hosts 127.0.0.1 cd.onap.info # 1 - configure combined master/host VM - 26 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # to deploy more than 110 pods per vm before the environment (1a7) is created from the kubernetes template (1pt2) - at the waiting 3 min mark - edit it via https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Changemax-podsfromdefault110podlimit --max-pods=900 https://lists.onap.org/g/onap-discuss/topic/oom_110_kubernetes_pod/25213556?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,25213556 in "additional kubelet flags" --max-pods=500 # on a 244G R4.8xlarge vm - 26 min later k8s cluster is up NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-5pq45 1/1 Running 0 10m kube-system kube-dns-75c8cb4ccb-7dlsh 3/3 Running 0 10m kube-system kubernetes-dashboard-6f4c8b9cd5-v625c 1/1 Running 0 10m kube-system monitoring-grafana-76f5b489d5-zhrjc 1/1 Running 0 10m kube-system monitoring-influxdb-6fc88bd58d-9494h 1/1 Running 0 10m kube-system tiller-deploy-8b6c5d4fb-52zmt 1/1 Running 0 2m # 3 - secure via github oauth the master - immediately to lock out crypto miners http://cd.onap.info:8880 # check the master cluster ubuntu@ip-172-31-14-89:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-8-245.us-east-2.compute.internal 179m 2% 2494Mi 4% ubuntu@ip-172-31-14-89:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-8-245.us-east-2.compute.internal Ready <none> 13d v1.10.3-rancher1 172.17.0.1 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 # 7 - after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script - a clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp dev.yaml dev0.yaml sudo vi dev0.yaml sudo cp dev0.yaml dev1.yaml sudo cp logging-analytics/deploy/cd.sh . # this does a prepull (-p), clones 3.0.0-ONAP, managed install -f true sudo ./cd.sh -b 3.0.0-ONAP -e onap -p true -n nexus3.onap.org:10001 -f true -s 300 -c true -d true -w false -r false # check around 55 min (on a 256G single node - with 32 vCores) pods/failed/up @ min and ram 161/13/153 @ 50m 107g @55 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 152 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-deployment-handler-5789b89d4b-s6fzw 1/2 Running 0 8m onap dep-service-change-handler-76dcd99f84-fchxd 0/1 ContainerCreating 0 3m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 53m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 9 53m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 53m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 7 53m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 9 53m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 9 53m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 7 52m Note: DCAE has 2 sets of orchestration after the initial k8s orchestration - another at 57 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-prh-6b5c6ff445-pr547 0/2 ContainerCreating 0 2m onap dep-dcae-tca-analytics-7dbd46d5b5-bgrn9 0/2 ContainerCreating 0 1m onap dep-dcae-ves-collector-59d4ff58f7-94rpq 0/2 ContainerCreating 0 1m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 57m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 57m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 57m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 57m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 57m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 Error 10 57m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 57m at 1 hour ubuntu@ip-172-31-20-218:~$ free total used free shared buff/cache available Mem: 251754696 111586672 45000724 193628 95167300 137158588 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | wc -l 164 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 155 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l 8 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-ves-collector-59d4ff58f7-94rpq 1/2 Running 0 4m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 59m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 59m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 59m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 59m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 59m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 10 59m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 59m ubuntu@ip-172-31-20-218:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 125869392 0 125869392 0% /dev tmpfs 25175472 54680 25120792 1% /run /dev/xvda1 121914320 91698036 30199900 76% / tmpfs 125877348 30312 125847036 1% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 125877348 0 125877348 0% /sys/fs/cgroup tmpfs 25175472 0 25175472 0% /run/user/1000 todo: verify the release is there after a helm install - as the configMap size issue is breaking the release for now
Prerequisites
Create a single VM - 256G+
20181015
ubuntu@a-onap-dmz-nodelete:~$ ./oom_deployment.sh -b master -s att.onap.cloud -e onap -r a_ONAP_CD_master -t _arm_deploy_onap_cd.json -p _arm_deploy_onap_cd_z_parameters.json # register the IP to DNS with route53 for att.onap.info - using this for the ONAP academic summit on the 22nd 13.68.113.104 = att.onap.cloud
Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - clustered
Prerequisites
Add an NFS (EFS on AWS) share
Create a 1 + N cluster
See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS
Create a 0.0.0.0/0 ::/O open security group
Use github to OAUTH authenticate your cluster just after installing it.
Last tested on ld.onap.info 20181029
# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # 1 - configure master - 15 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # on a 64G R4.2xlarge vm - 23 min later k8s cluster is up kubectl get pods --all-namespaces kube-system heapster-76b8cd7b5-g7p6n 1/1 Running 0 8m kube-system kube-dns-5d7b4487c9-jjgvg 3/3 Running 0 8m kube-system kubernetes-dashboard-f9577fffd-qldrw 1/1 Running 0 8m kube-system monitoring-grafana-997796fcf-g6tr7 1/1 Running 0 8m kube-system monitoring-influxdb-56fdcd96b-x2kvd 1/1 Running 0 8m kube-system tiller-deploy-54bcc55dd5-756gn 1/1 Running 0 2m # 2 - secure via github oauth the master - immediately to lock out crypto miners http://ld.onap.info:8880 # 3 - delete the master from the hosts in rancher http://ld.onap.info:8880 # 4 - create NFS share on master https://us-east-2.console.aws.amazon.com/efs/home?region=us-east-2#/filesystems/fs-92xxxxx # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # 5 - create NFS share and register each node - do this for all nodes sudo git clone https://gerrit.onap.org/r/logging-analytics # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # it takes about 1 min to run the script and 1 minute for the etcd and healthcheck containers to go green on each host # check the master cluster kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-19-9.us-east-2.compute.internal 9036m 56% 53266Mi 43% ip-172-31-21-129.us-east-2.compute.internal 6840m 42% 47654Mi 38% ip-172-31-18-85.us-east-2.compute.internal 6334m 39% 49545Mi 40% ip-172-31-26-114.us-east-2.compute.internal 3605m 22% 25816Mi 21% # fix helm on the master after adding nodes to the master - only if the server helm version is less than the client helm version (rancher 1.6.18 does not have this issue) ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"} ubuntu@ip-172-31-14-89:~$ sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} # 7a - manual: follow the helm plugin page # https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins sudo git clone https://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm cd oom/kubernetes sudo helm serve & sudo make all sudo make onap sudo helm deploy onap local/onap --namespace onap fetching local/onap release "onap" deployed release "onap-aaf" deployed release "onap-aai" deployed release "onap-appc" deployed release "onap-clamp" deployed release "onap-cli" deployed release "onap-consul" deployed release "onap-contrib" deployed release "onap-dcaegen2" deployed release "onap-dmaap" deployed release "onap-esr" deployed release "onap-log" deployed release "onap-msb" deployed release "onap-multicloud" deployed release "onap-nbi" deployed release "onap-oof" deployed release "onap-policy" deployed release "onap-pomba" deployed release "onap-portal" deployed release "onap-robot" deployed release "onap-sdc" deployed release "onap-sdnc" deployed release "onap-sniro-emulator" deployed release "onap-so" deployed release "onap-uui" deployed release "onap-vfc" deployed release "onap-vid" deployed release "onap-vnfsdk" deployed # 7b - automated: after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script # clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w true # rerun install - no delete of oom repo sudo ./cd.sh -b master -e onap -c false -d true -w true
Deployment Integrity based on Pod Dependencies
20181213 running 3.0.0-ONAP
Links
- LOG-899Getting issue details... STATUS
- LOG-898Getting issue details... STATUS
- OOM-1547Getting issue details... STATUS
- OOM-1543Getting issue details... STATUS
Patches
Windriver openstack heat template 1+13 vms
https://gerrit.onap.org/r/#/c/74781/
docker prepull script – run before cd.sh - https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh
https://gerrit.onap.org/r/#/c/74780/
Not merged with the heat template until the following nexus3 slowdown is addressed
https://jira.onap.org/browse/TSC-79
Base Platform First
Bring up dmaap and aaf first and the rest of the pods in the following order.
Every 2.0s: helm list Fri Dec 14 15:19:49 2018 NAME REVISION UPDATED STATUS CHART NAMESPACE onap 2 Fri Dec 14 15:10:56 2018 DEPLOYED onap-3.0.0 onap onap-aaf 1 Fri Dec 14 15:10:57 2018 DEPLOYED aaf-3.0.0 onap onap-dmaap 2 Fri Dec 14 15:11:00 2018 DEPLOYED dmaap-3.0.0 onap onap onap-aaf-aaf-cm-5c65c9dc55-snhlj 1/1 Running 0 10m onap onap-aaf-aaf-cs-7dff4b9c44-85zg2 1/1 Running 0 10m onap onap-aaf-aaf-fs-ff6779b94-gz682 1/1 Running 0 10m onap onap-aaf-aaf-gui-76cfcc8b74-wn8b8 1/1 Running 0 10m onap onap-aaf-aaf-hello-5d45dd698c-xhc2v 1/1 Running 0 10m onap onap-aaf-aaf-locate-8587d8f4-l4k7v 1/1 Running 0 10m onap onap-aaf-aaf-oauth-d759586f6-bmz2l 1/1 Running 0 10m onap onap-aaf-aaf-service-546f66b756-cjppd 1/1 Running 0 10m onap onap-aaf-aaf-sms-7497c9bfcc-j892g 1/1 Running 0 10m onap onap-aaf-aaf-sms-preload-vhbbd 0/1 Completed 0 10m onap onap-aaf-aaf-sms-quorumclient-0 1/1 Running 0 10m onap onap-aaf-aaf-sms-quorumclient-1 1/1 Running 0 8m onap onap-aaf-aaf-sms-quorumclient-2 1/1 Running 0 6m onap onap-aaf-aaf-sms-vault-0 2/2 Running 1 10m onap onap-aaf-aaf-sshsm-distcenter-27ql7 0/1 Completed 0 10m onap onap-aaf-aaf-sshsm-testca-mw95p 0/1 Completed 0 10m onap onap-dmaap-dbc-pg-0 1/1 Running 0 17m onap onap-dmaap-dbc-pg-1 1/1 Running 0 15m onap onap-dmaap-dbc-pgpool-c5f8498-fn9cn 1/1 Running 0 17m onap onap-dmaap-dbc-pgpool-c5f8498-t9s27 1/1 Running 0 17m onap onap-dmaap-dmaap-bus-controller-59c96d6b8f-9xsxg 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-db-557c66dc9d-gvb9f 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-node-6496d8f55b-ffgfr 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-prov-86f79c47f9-zb8p7 1/1 Running 0 17m onap onap-dmaap-message-router-5fb78875f4-lvsg6 1/1 Running 0 17m onap onap-dmaap-message-router-kafka-7964db7c49-n8prg 1/1 Running 0 17m onap onap-dmaap-message-router-zookeeper-5cdfb67f4c-5w4vw 1/1 Running 0 17m onap-msb 2 Fri Dec 14 15:31:12 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 4m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 4m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 4m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 4m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 4m onap-aai 1 Fri Dec 14 15:30:59 2018 DEPLOYED aai-3.0.0 onap onap onap-aai-aai-54b7bf7779-bfbmg 1/1 Running 0 2m onap onap-aai-aai-babel-6bbbcf5d5c-sp676 2/2 Running 0 13m onap onap-aai-aai-cassandra-0 1/1 Running 0 13m onap onap-aai-aai-cassandra-1 1/1 Running 0 12m onap onap-aai-aai-cassandra-2 1/1 Running 0 9m onap onap-aai-aai-champ-54f7986b6b-wql2b 2/2 Running 0 13m onap onap-aai-aai-data-router-f5f75c9bd-l6ww7 2/2 Running 0 13m onap onap-aai-aai-elasticsearch-c9bf9dbf6-fnj8r 1/1 Running 0 13m onap onap-aai-aai-gizmo-5f8bf54f6f-chg85 2/2 Running 0 13m onap onap-aai-aai-graphadmin-9b956d4c-k9fhk 2/2 Running 0 13m onap onap-aai-aai-graphadmin-create-db-schema-s2nnw 0/1 Completed 0 13m onap onap-aai-aai-modelloader-644b46df55-vt4gk 2/2 Running 0 13m onap onap-aai-aai-resources-745b6b4f5b-rj7lm 2/2 Running 0 13m onap onap-aai-aai-search-data-559b8dbc7f-l6cqq 2/2 Running 0 13m onap onap-aai-aai-sparky-be-75658695f5-z2xv4 2/2 Running 0 13m onap onap-aai-aai-spike-6778948986-7h7br 2/2 Running 0 13m onap onap-aai-aai-traversal-58b97f689f-jlblx 2/2 Running 0 13m onap onap-aai-aai-traversal-update-query-data-7sqt5 0/1 Completed 0 13m onap-msb 5 Fri Dec 14 15:51:42 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 18m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 18m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 18m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 18m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 18m onap-esr 3 Fri Dec 14 15:51:40 2018 DEPLOYED esr-3.0.0 onap onap onap-esr-esr-gui-6c5ccd59d6-6brcx 1/1 Running 0 2m onap onap-esr-esr-server-5f967d4767-ctwp6 2/2 Running 0 2m onap-robot 2 Fri Dec 14 15:51:48 2018 DEPLOYED robot-3.0.0 onap onap onap-robot-robot-ddd948476-n9szh 1/1 Running 0 11m onap-multicloud 1 Fri Dec 14 15:51:43 2018 DEPLOYED multicloud-3.0.0 onap
Tiller requires wait states between deployments
There is a patch going into 3.0.1 to delay deployments to not overload tiller 3+ seconds
sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm sudo vi ~/.helm/plugins/deploy/deploy.sh
Use public-cloud.yaml override
Note: your HD/SSD, ram and cpu configuration will drastically affect deployment. For example if you are cpu starved - the idle state of onap will delay pods as more come in - additionally network bandwidth to pull docker containers will be significant - and PV creation is sensitive to FS throughput/lag.
Some of the internal pod timings are optimized for certain azure deployment
https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/public-cloud.yaml
Optimizing Docker Image Pulls
Verify if the integration docker csv manifest is the truth or the oom repo values.yaml (no override required?)
- TSC-86Getting issue details... STATUS
Nexus Proxy
Soleil, Alain (Deactivated) pointed out the proxy page (was using commercial nexus3) - ONAP OOM Beijing - Hosting docker images locally - I had about 4 jiras on this and forgot about them.
20190121:
Answered John Lotoski for EKS and his other post on nexus3 proxy failures - looks like an issue with a double proxy between dockerhub - or an issue specific to the dockerhub/registry:2 container - https://lists.onap.org/g/onap-discuss/topic/registry_issue_few_images/29285134?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29285134 |
Running
- LOG-355Getting issue details... STATUS
nexus3.onap.info:5000 - my private AWS nexus3 proxy of nexus3.onap.org:10001
nexus3.onap.cloud:5000 - azure public proxy - filled with casablanca (will retire after Jan 2)
nexus4.onap.cloud:5000 - azure public proxy - filled with master - and later casablanca
nexus3windriver.onap.cloud:5000 - windriver/openstack lab inside the firewall to use only for the lab - access to public is throttled
Nexus3 proxy setup - host
# from a clean ubuntu 16.04 VM # install docker sudo curl https://releases.rancher.com/install-docker/17.03.sh | sh sudo usermod -aG docker ubuntu # install nexus mkdir -p certs openssl req -newkey rsa:4096 -nodes -sha256 -keyout certs/domain.key -x509 -days 365 -out certs/domain.crt Common Name (e.g. server FQDN or YOUR name) []:nexus3.onap.info sudo nano /etc/hosts sudo docker run -d --restart=unless-stopped --name registry -v `pwd`/certs:/certs -e REGISTRY_HTTP_ADDR=0.0.0.0:5000 -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key -e REGISTRY_PROXY_REMOTEURL=https://nexus3.onap.org:10001 -p 5000:5000 registry:2 sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7f9b0e97eb7f registry:2 "/entrypoint.sh /e..." 8 seconds ago Up 7 seconds 0.0.0.0:5000->5000/tcp registry # test it sudo docker login -u docker -p docker nexus3.onap.info:5000 Login Succeeded # get images from https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=casablanca # use for example the first line onap/aaf/aaf_agent,2.1.8 # or the prepull script in https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 819d6de9e493: Downloading [======================================> ] 770.7 kB/1.012 MB # list sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry 2 2e2f252f3c88 3 months ago 33.3 MB # prepull to cache images on the server - in this case casablanca branch sudo wget https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo chmod 777 docker_prepull.sh # prep - same as client vms - the cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp certs/domain.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 # prepull sudo nohup ./docker_prepull.sh -b casablanca -s nexus3.onap.cloud:5000 &
Nexus3 proxy usage per cluster node
Cert is on - TSC-79Getting issue details... STATUS
# on each host # Cert is on TSC-79 sudo wget https://jira.onap.org/secure/attachment/13127/domain_nexus3_onap_cloud.crt # or if you already have it scp domain_nexus3_onap_cloud.crt ubuntu@ld3.onap.cloud:~/ # to avoid sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Error response from daemon: Get https://nexus3.onap.cloud:5000/v1/users/: x509: certificate signed by unknown authority # cp cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp domain_nexus3_onap_cloud.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Login Succeeded # testing # vm with the image existing - 2 sec ubuntu@ip-172-31-33-46:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # vm with layers existing except for last 5 - 5 sec ubuntu@a-cd-master:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Already exists .. 20 49e90af50c7d: Already exists .... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # clean AWS VM (clean install of docker) - no pulls yet - 45 sec for everything ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 0addb6fece63: Pulling fs layer 78e58219b215: Pulling fs layer eb6959a66df2: Pulling fs layer 321bd3fd2d0e: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 ubuntu@ip-172-31-14-34:~$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE nexus3.onap.cloud:5000/onap/aaf/aaf_agent 2.1.8 090b326a7f11 5 weeks ago 1.14 GB # going to test a same size image directly from the LF - with minimal common layers nexus3.onap.org:10001/onap/testsuite 1.3.2 c4b58baa95e8 3 weeks ago 1.13 GB # 5 min in we are still at 3% - numbers below are a min old ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.org:10001/onap/testsuite:1.3.2 1.3.2: Pulling from onap/testsuite 32802c0cfa4d: Downloading [=============> ] 8.416 MB/32.1 MB da1315cffa03: Download complete fa83472a3562: Download complete f85999a86bef: Download complete 3eca7452fe93: Downloading [=======================> ] 8.517 MB/17.79 MB 9f002f13a564: Downloading [=========================================> ] 8.528 MB/10.24 MB 02682cf43e5c: Waiting .... 754645df4601: Waiting # in 5 min we get 3% 35/1130Mb - which comes out to 162 min for 1.13G for .org as opposed to 45 sec for .info - which is a 200X slowdown - some of this is due to the fact my nexus3.onap.info is on the same VPC as my test VM - testing on openlab # openlab - 2 min 40 sec which is 3.6 times slower - expected than in AWS - (25 min pulls vs 90min in openlab) - this makes nexus.onap.org 60 times slower in openlab than a proxy running from AWS (2 vCore/16G/ssd VM) ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 #pulling smaller from nexus3.onap.info 2 min 20 - for 36Mb = 0.23Mb/sec - extrapolated to 1.13Gb for above is 5022 sec or 83 min - half the rough calculation above ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.org:10001/onap/aaf/sms:3.0.1 3.0.1: Pulling from onap/aaf/sms c67f3896b22c: Pull complete ... 76eeb922b789: Pull complete Digest: sha256:d5b64947edb93848acacaa9820234aa29e58217db9f878886b7bafae00fdb436 Status: Downloaded newer image for nexus3.onap.org:10001/onap/aaf/sms:3.0.1 # conclusion - nexus3.onap.org is experiencing a routing issue from their DC outbound causing a 80-100x slowdown over a proxy nexus3 - since 20181217 - as local jenkins.onap.org builds complete faster # workaround is to use a nexus3 proxy above
and adding to values.yaml
global: #repository: nexus3.onap.org:10001 repository: nexus3.onap.cloud:5000 repositoryCred: user: docker password: docker
windriver lab also has a network issue (for example if i pull from nexus3.onap.cloud:5000 (azure) into an aws EC2 instance - 45 sec for 1.1G - If I pull the same in an openlab VM - on the order of 10+ min) - therefore you need a local nexus3 proxy if you are inside the openstack lab - I have registered nexus3windriver.onap.cloud:5000 to a nexus3 proxy in my logging tenant - cert above
Docker Prepull
https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh
using
via
https://gerrit.onap.org/r/#/c/74780/
- LOG-905Getting issue details... STATUS
git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics cd logging-analytics git pull ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics refs/changes/80/74780/1 ubuntu@onap-oom-obrien-rancher-e0:~$ sudo nohup ./docker_prepull.sh & [1] 14488 ubuntu@onap-oom-obrien-rancher-e0:~$ nohup: ignoring input and appending output to 'nohup.out'
POD redeployment/undeploy/deploy
If you need to redeploy a pod due to a job timeout, failure or to pickup a config/code change - delete the /dockerdata-nfs/*-aai for example subdirectory - so that a db restart for example does not run into existing data issues.
sudo chmod -R 777 /dockerdata-nfs sudo rm -rf /dockerdata-nfs/onap-aai
Casablanca Deployment Examples
Deploy to 13+1 cluster
Deploy as one with deploy.sh delays and public.cloud.yaml - single 500G server AWS
sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml where dev.yaml is the same as in resources but with all components turned on and IfNotPresent instead of Always
Deploy in sequence with validation on previous pod before proceeding - single 500G server AWS
we are not using the public-cloud.yaml override here - to verify just timing between deploys in this case - each pod waits for the previous to complete so resources are not in contention
see update to
https://git.onap.org/logging-analytics/tree/deploy/cd.sh
https://gerrit.onap.org/r/#/c/75422
DEPLOY_ORDER_POD_NAME_ARRAY=('robot consul aaf dmaap dcaegen2 msb aai esr multicloud oof so sdc sdnc vid policy portal log vfc uui vnfsdk appc clamp cli pomba vvp contrib sniro-emulator') # don't count completed pods DEPLOY_NUMBER_PODS_DESIRED_ARRAY=(1 4 13 11 13 5 15 2 6 17 10 12 11 2 8 6 3 18 2 5 5 5 1 11 11 3 1) # account for podd that have varying deploy times or replicaset sizes # don't count the 0/1 completed pods - and skip most of the ResultSet instances except 1 # dcae boostrap is problematic DEPLOY_NUMBER_PODS_PARTIAL_ARRAY=(1 2 11 9 13 5 11 2 6 16 10 12 11 2 8 6 3 18 2 5 5 5 1 9 11 3 1)
Deployment in sequence to Windriver Lab
Note: the Windriver Openstack lab requires that host registration occurs against the private network 10.0.0.0/16 not the 10.12.0.0/16 public network - this is fine in Azure/AWS but not in openstack
The docs will be adjusted - OOM-1550Getting issue details... STATUS
This is bad - public IP based cluster
This is good - private IP based cluster
Openstack/Windriver HEAT template for 13+1 kubernetes cluster
https://jira.onap.org/secure/attachment/13010/logging_openstack_13_16g.yaml
- LOG-324Getting issue details... STATUS
see
https://gerrit.onap.org/r/74781
obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack stack create -t logging_openstack_13_16g.yaml -e logging_openstack_oom.env OOM20181216-13 +---------------------+-----------------------------------------+ | Field | Value | +---------------------+-----------------------------------------+ | id | ed6aa689-2e2a-4e75-8868-9db29607c3ba | | stack_name | OOM20181216-13 | | description | Heat template to install OOM components | | creation_time | 2018-12-16T19:42:27Z | | updated_time | 2018-12-16T19:42:27Z | | stack_status | CREATE_IN_PROGRESS | | stack_status_reason | Stack CREATE started | +---------------------+-----------------------------------------+ obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack server list +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | 7695cf14-513e-4fea-8b00-6c2a25df85d3 | onap-oom-obrien-rancher-e13 | ACTIVE | oam_onap_RNa3=10.0.0.23, 10.12.7.14 | ubuntu-16-04-cloud-amd64 | | 1b70f179-007c-4975-8e4a-314a57754684 | onap-oom-obrien-rancher-e7 | ACTIVE | oam_onap_RNa3=10.0.0.10, 10.12.7.36 | ubuntu-16-04-cloud-amd64 | | 17c77bd5-0a0a-45ec-a9c7-98022d0f62fe | onap-oom-obrien-rancher-e2 | ACTIVE | oam_onap_RNa3=10.0.0.9, 10.12.6.180 | ubuntu-16-04-cloud-amd64 | | f85e075f-e981-4bf8-af3f-e439b7b72ad2 | onap-oom-obrien-rancher-e9 | ACTIVE | oam_onap_RNa3=10.0.0.6, 10.12.5.136 | ubuntu-16-04-cloud-amd64 | | 58c404d0-8bae-4889-ab0f-6c74461c6b90 | onap-oom-obrien-rancher-e6 | ACTIVE | oam_onap_RNa3=10.0.0.19, 10.12.5.68 | ubuntu-16-04-cloud-amd64 | | b91ff9b4-01fe-4c34-ad66-6ffccc9572c1 | onap-oom-obrien-rancher-e4 | ACTIVE | oam_onap_RNa3=10.0.0.11, 10.12.7.35 | ubuntu-16-04-cloud-amd64 | | d9be8b3d-2ef2-4a00-9752-b935d6dd2dba | onap-oom-obrien-rancher-e0 | ACTIVE | oam_onap_RNa3=10.0.16.1, 10.12.7.13 | ubuntu-16-04-cloud-amd64 | | da0b1be6-ec2b-43e6-bb3f-1f0626dcc88b | onap-oom-obrien-rancher-e1 | ACTIVE | oam_onap_RNa3=10.0.0.16, 10.12.5.10 | ubuntu-16-04-cloud-amd64 | | 0ffec4d0-bd6f-40f9-ab2e-f71aa5b9fbda | onap-oom-obrien-rancher-e5 | ACTIVE | oam_onap_RNa3=10.0.0.7, 10.12.6.248 | ubuntu-16-04-cloud-amd64 | | 125620e0-2aa6-47cf-b422-d4cbb66a7876 | onap-oom-obrien-rancher-e8 | ACTIVE | oam_onap_RNa3=10.0.0.8, 10.12.6.249 | ubuntu-16-04-cloud-amd64 | | 1efe102a-d310-48d2-9190-c442eaec3f80 | onap-oom-obrien-rancher-e12 | ACTIVE | oam_onap_RNa3=10.0.0.5, 10.12.5.167 | ubuntu-16-04-cloud-amd64 | | 7c248d1d-193a-415f-868b-a94939a6e393 | onap-oom-obrien-rancher-e3 | ACTIVE | oam_onap_RNa3=10.0.0.3, 10.12.5.173 | ubuntu-16-04-cloud-amd64 | | 98dc0aa1-e42d-459c-8dde-1a9378aa644d | onap-oom-obrien-rancher-e11 | ACTIVE | oam_onap_RNa3=10.0.0.12, 10.12.6.179 | ubuntu-16-04-cloud-amd64 | | 6799037c-31b5-42bd-aebf-1ce7aa583673 | onap-oom-obrien-rancher-e10 | ACTIVE | oam_onap_RNa3=10.0.0.13, 10.12.6.167 | ubuntu-16-04-cloud-amd64 | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+
# 13+1 vms on openlab available as of 20181216 - running 2 separate clusters # 13+1 all 16g VMs # 4+1 all 32g VMs # master undercloud sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s 10.12.7.13 -e onap # master nfs sudo wget https://jira.onap.org/secure/attachment/12887/master_nfs_node.sh sudo chmod 777 master_nfs_node.sh sudo ./master_nfs_node.sh 10.12.5.10 10.12.6.180 10.12.5.173 10.12.7.35 10.12.6.248 10.12.5.68 10.12.7.36 10.12.6.249 10.12.5.136 10.12.6.167 10.12.6.179 10.12.5.167 10.12.7.14 #sudo ./master_nfs_node.sh 10.12.5.162 10.12.5.198 10.12.5.102 10.12.5.4 # slaves nfs sudo wget https://jira.onap.org/secure/attachment/12888/slave_nfs_node.sh sudo chmod 777 slave_nfs_node.sh sudo ./slave_nfs_node.sh 10.12.7.13 #sudo ./slave_nfs_node.sh 10.12.6.125 # test it ubuntu@onap-oom-obrien-rancher-e4:~$ sudo ls /dockerdata-nfs/ test.sh # remove client from master node ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e0 Ready <none> 5m v1.11.5-rancher1 ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-7b48b696fc-2z47t 1/1 Running 0 5m kube-system kube-dns-6655f78c68-gn2ds 3/3 Running 0 5m kube-system kubernetes-dashboard-6f54f7c4b-sfvjc 1/1 Running 0 5m kube-system monitoring-grafana-7877679464-872zv 1/1 Running 0 5m kube-system monitoring-influxdb-64664c6cf5-rs5ms 1/1 Running 0 5m kube-system tiller-deploy-6f4745cbcf-zmsrm 1/1 Running 0 5m # after master removal from hosts - expected no nodes ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes error: the server doesn't have a resource type "nodes" # slaves rancher client - 1st node # register on the private network not the public IP # notice the CATTLE_AGENT sudo docker run -e CATTLE_AGENT_IP="10.0.0.7" --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.11 http://10.0.16.1:8880/v1/scripts/5A5E4F6388A4C0A0F104:1514678400000:9zpsWeGOsKVmWtOtoixAUWjPJs ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 0s v1.11.5-rancher1 # add the other nodes # the 4 node 32g = 128g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 4m v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 3m v1.11.5-rancher1 # the 13 node 16g = 208g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% onap-oom-obrien-rancher-e1 208m 2% 2693Mi 16% onap-oom-obrien-rancher-e10 38m 0% 1083Mi 6% onap-oom-obrien-rancher-e11 36m 0% 1104Mi 6% onap-oom-obrien-rancher-e12 57m 0% 1070Mi 6% onap-oom-obrien-rancher-e13 116m 1% 1017Mi 6% onap-oom-obrien-rancher-e2 73m 0% 1361Mi 8% onap-oom-obrien-rancher-e3 62m 0% 1099Mi 6% onap-oom-obrien-rancher-e4 74m 0% 1370Mi 8% onap-oom-obrien-rancher-e5 37m 0% 1104Mi 6% onap-oom-obrien-rancher-e6 55m 0% 1125Mi 7% onap-oom-obrien-rancher-e7 42m 0% 1102Mi 6% onap-oom-obrien-rancher-e8 53m 0% 1090Mi 6% onap-oom-obrien-rancher-e9 52m 0% 1072Mi 6%
Installing ONAP via cd.sh
The cluster hosting kubernetes is up with 13+1 nodes and 2 network interfaces (the private 10.0.0.0/16 subnet and the 10.12.0.0/16 public subnet)
Verify kubernetes hosts are ready
ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 2h v1.11.5-rancher1 onap-oom-obrien-rancher-e10 Ready <none> 25m v1.11.5-rancher1 onap-oom-obrien-rancher-e11 Ready <none> 20m v1.11.5-rancher1 onap-oom-obrien-rancher-e12 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e13 Ready <none> 1m v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 2h v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e5 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e6 Ready <none> 46m v1.11.5-rancher1 onap-oom-obrien-rancher-e7 Ready <none> 40m v1.11.5-rancher1 onap-oom-obrien-rancher-e8 Ready <none> 37m v1.11.5-rancher1 onap-oom-obrien-rancher-e9 Ready <none> 26m v1.11.5-rancher1
Openstack parameter overrides
# manually check out 3.0.0-ONAP (script is written for branches like casablanca) sudo git clone -b 3.0.0-ONAP http://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm # fix tiller bug sudo nano ~/.helm/plugins/deploy/deploy.sh # modify dev.yaml with logging-rc file openstack parameters - appc, sdnc and sudo cp logging-analytics/deploy/cd.sh . sudo cp oom/kubernetes/onap/resources/environments/dev.yaml . sudo nano dev.yaml ubuntu@onap-oom-obrien-rancher-0:~/oom/kubernetes/so/resources/config/mso$ echo -n "Whq..jCLj" | openssl aes-128-ecb -e -K `cat encryption.key` -nosalt | xxd -c 256 -p bdaee....c60d3e09 # so server configuration config: openStackUserName: "michael_o_brien" openStackRegion: "RegionOne" openStackKeyStoneUrl: "http://10.12.25.2:5000" openStackServiceTenantName: "service" openStackEncryptedPasswordHere: "bdaee....c60d3e09"
Deploy all or a subset of ONAP
# copy dev.yaml to dev0.yaml # bring up all onap in sequence or adjust the list for a subset specific for the vFW - assumes you already cloned oom sudo nohup ./cd.sh -b 3.0.0-ONAP -e onap -p false -n nexus3.onap.org:10001 -f true -s 900 -c false -d true -w false -r false & #sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml
The load is distributed across the cluster even for individual pods like dmaap
Verify the ONAP installation
vFW vFirewall Workarounds
From Alexis Chiarello currently verifying 20190125 - these are for the heat environment - not the kubernetes one - following Casablanca Stability Testing Instructions currently |
---|
20181213 - thank you Alexis and Beejal Shah Something else I forgot to mention, I did change the heat templates to adapt for our Ubuntu images in our env (to enable additional NICs, eth2 / eth3) and also disable gateway by default on the 2 additional subnets created. See attached for the modified files. Cheers, Alexis. sudo chmod 777 master_nfs_node.sh I reran the vFWCL use case in my re-installed Casablanca lab and here is what I had to manually do post-install : - fix Robot "robot-eteshare-configmap" config map and adjust values that did not my match my env (onap_private_subnet_id, sec_group, dcae_collector_ip, Ubuntu image names, etc...). - make sure to push the policies from pap (PRELOAD_POLICIES=true then run config/push-policies.sh from /tmp/policy-install folder) (the following are for heat not kubernetes)
That's it, in my case, with the above the vFWCL closed loop works just fine and able to see APP-C processing the modifyConfig event and change the number of streams using netconf to the packet generator. Cheers, Alexis. |
Full Entrypoint Install
Two choices, run the single oom_deployment.sh via your ARM, CloudFormation, Heat template wrapper as a oneclick or use the 2 step procedure above.
entrypoint aws/azure/openstack | Ubuntu 16 rancher install | oom deployment CD script | |
---|---|---|---|
Remove a Deployment
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n57
see also - OOM-1463Getting issue details... STATUS
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n57
required for a couple pods that leave left over resources and for the secondary cloudify out-of-band orchestration in DCAEGEN2
- OOM-1089Getting issue details... STATUS
- DCAEGEN2-1067Getting issue details... STATUS
- DCAEGEN2-1068Getting issue details... STATUS
sudo helm undeploy $ENVIRON --purge kubectl delete namespace onap sudo helm delete --purge onap kubectl delete pv --all kubectl delete pvc --all kubectl delete secrets --all kubectl delete clusterrolebinding --all sudo rm -rf /dockerdata-nfs/onap-<pod> # or for a single pod kubectl delete pod $ENVIRON-aaf-sms-vault-0 -n $ENVIRON --grace-period=0 --force
Using ONAP
Accessing the portal
Access the ONAP portal via the 8989 LoadBalancer Mandeep Khinda merged in for - OOM-633Getting issue details... STATUS and documented at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_user_guide.html#accessing-the-onap-portal-using-oom-and-a-kubernetes-cluster
ubuntu@a-onap-devopscd:~$ kubectl -n onap get services|grep "portal-app" portal-app LoadBalancer 10.43.145.94 13.68.113.105 8989:30215/TCP,8006:30213/TCP,8010:30214/TCP,8443:30225/TCP 20h In the case of connecting to openlab through the vpn from your mac - you would need the 2nd number - which will be something like 10.0.0.12 - but the public IP corresponding to this private network IP - which only for this case is the e1 instance with 10.12.7.7 as the external routable IP
add the following and prefix with the IP above to your client's /etc/hosts
in this case I am using the public 13... ip (elastic or generated public ip) - AWS in this example 13.68.113.105 portal.api.simpledemo.onap.org 13.68.113.105 vid.api.simpledemo.onap.org 13.68.113.105 sdc.api.fe.simpledemo.onap.org 13.68.113.105 portal-sdk.simpledemo.onap.org 13.68.113.105 policy.api.simpledemo.onap.org 13.68.113.105 aai.api.sparky.simpledemo.onap.org 13.68.113.105 cli.api.simpledemo.onap.org 13.68.113.105 msb.api.discovery.simpledemo.onap.org
launch
http://portal.api.simpledemo.onap.org:8989/ONAPPORTAL/login.htm
login with demo user
Accessing MariaDB portal container
kubectl n onap exec -it dev-portal-portal-db-b8db58679-q9pjq - mysql -D mysql -h localhost -e 'select * from user'
see
- PORTAL-399Getting issue details... STATUS and - PORTAL-498Getting issue details... STATUS
Running the vFirewall
Casablanca Stability Testing Instructions
# verifying on ld.onap.cloud 20190126 oom/kubernetes/robot/demo-k8s.sh onap init Initialize Customer And Models | FAIL | ConnectionError: HTTPConnectionPool(host='1.2.3.4', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efd0f8a4ad0>: Failed to establish a new connection: [Errno 110] Connection timed out',)) # push sample vFWCL policies PAP_POD=$(kubectl --namespace onap get pods | grep policy-pap | sed 's/ .*//') kubectl exec -it $PAP_POD -n onap -c pap -- bash -c 'export PRELOAD_POLICIES=true; /tmp/policy-install/config/push-policies.sh' # ete instantiateDemoVFWC /root/oom/kubernetes/robot/ete-k8s.sh onap instantiateDemoVFWCL # restart drools kubectl delete pod dev-policy-drools-0 -n onap # wait for policy to kick in sleep 20m # demo vfwclosedloop /root/oom/kubernetes/robot/demo-k8s.sh onap vfwclosedloop $PNG_IP # check the sink on 667
Deployment Profile
For a view of the system see Log Streaming Compliance and API
Minimum Single VM Deployment
A single 122g R4.4xlarge VM in progress
see also - LOG-630Getting issue details... STATUS
helm install will bring up everything without the configmap failure - but the release is busted - pods come up though ubuntu@ip-172-31-27-63:~$ sudo helm install local/onap -n onap --namespace onap -f onap/resources/environments/disable-allcharts.yaml --set aai.enabled=true --set dmaap.enabled=true --set log.enabled=true --set policy.enabled=true --set portal.enabled=true --set robot.enabled=true --set sdc.enabled=true --set sdnc.enabled=true --set so.enabled=true --set vid.enabled=true
deployment | containers | |
---|---|---|
minimum (no vfwCL) | ||
medium (vfwCL) | ||
full |
Container Issues
20180901
amdocs@ubuntu:~/_dev/oom/kubernetes$ kubectl get pods --all-namespaces | grep 0/1 onap onap-aai-champ-68ff644d85-mpkb9 0/1 Running 0 1d onap onap-pomba-kibana-d76b6dd4c-j4q9m 0/1 Init:CrashLoopBackOff 472 1d amdocs@ubuntu:~/_dev/oom/kubernetes$ kubectl get pods --all-namespaces | grep 1/2 onap onap-aai-gizmo-856f86d664-mf587 1/2 CrashLoopBackOff 568 1d onap onap-pomba-networkdiscovery-85d76975b7-w9sjl 1/2 CrashLoopBackOff 573 1d onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-rtdqc 1/2 CrashLoopBackOff 569 1d onap onap-vid-84c88db589-vbfht 1/2 CrashLoopBackOff 616 1d with clamp and pomba enabled (ran clamp first) amdocs@ubuntu:~/_dev/oom/kubernetes$ sudo helm upgrade -i onap local/onap --namespace onap -f dev.yaml Error: UPGRADE FAILED: failed to create resource: Service "pomba-kibana" is invalid: spec.ports[0].nodePort: Invalid value: 30234: provided port is already allocated
Full ONAP Cluster
see the AWS cluster install below
Requirements
Hardware Requirements
VMs | RAM | HD | vCores | Ports | Network |
---|---|---|---|---|---|
1 | 55-70G at startup | 40G per host min (30G for dockers) 100G after a week 5G min per NFS 4GBPS peak | (need to reduce 152 pods to 110) 8 min 60 peak at startup recommended 16-64 vCores | see list on PortProfile Recommend 0.0.0.0/0 (all open) inside VPC Block 10249-10255 outside secure 8888 with oauth | 170 MB/sec peak 1200 |
3+ | 85G Recommend min 3 x 64G class VMs Try for 4 | master: 40G hosts: 80G (30G of dockers) NFS: 5G | 24 to 64 | ||
This is snapshot of the CD system running on Amazon AWS at http://jenkins.onap.info/job/oom-cd-master/ It is a 1 + 4 node cluster composed of four 64G/8vCore R4.2xLarge VMs |
Amazon AWS
Account Provider: (2) Robin of Amazon and Michael O'Brien of Amdocs
Amazon has donated an allocation enough for 512G of VM space (a large 4 x 122G/16vCore cluster and a secondary 9 x 16G cluster) in order to run CD systems since Dec 2017 - at a cost savings of at least $500/month - thank you very much Amazon in supporting ONAP See example max/med allocations for IT/Finance in ONAP Deployment Specification for Finance and Operations#AmazonAWS |
Amazon AWS is currently hosting our RI for ONAP Continuous Deployment - this is a joint Proof Of Concept between Amazon and ONAP.
Auto Continuous Deployment via Jenkins and Kibana
AWS CLI Installation
Install the AWS CLI on the bastion VM
https://docs.aws.amazon.com/cli/latest/userguide/cli-install-macos.html
OSX
obrien:obrienlabs amdocs$ pip --version pip 9.0.1 from /Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg (python 2.7) obrien:obrienlabs amdocs$ curl -O https://bootstrap.pypa.io/get-pip.py obrien:obrienlabs amdocs$ python3 get-pip.py --user Requirement already up-to-date: pip in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages obrien:obrienlabs amdocs$ pip3 install awscli --upgrade --user Successfully installed awscli-1.14.41 botocore-1.8.45 pyasn1-0.4.2 s3transfer-0.1.13
Ubuntu
obrien:obrienlabs amdocs$ ssh ubuntu@<your domain/ip> $ sudo apt install python-pip $ pip install awscli --upgrade --user $ aws --version aws-cli/1.14.41 Python/2.7.12 Linux/4.4.0-1041-aws botocore/1.8.45
Windows Powershell
Configure Access Keys for your Account
$aws configure AWS Access Key ID [None]: AK....Q AWS Secret Access Key [None]: Dl....l Default region name [None]: us-east-1 Default output format [None]: json $aws ec2 describe-regions --output table || ec2.ca-central-1.amazonaws.com | ca-central-1 || ....
Option 0: Deploy OOM Kubernetes to a spot VM
Peak Performance MetricsWe hit a peak of 44 cores during startup, with an external network peak of 1.2Gbps (throttled nexus servers at ONAP), a peak SSD write rate of 4Gbps and 55G ram on a 64 vCore/256G VM on AWS Spot.
Kubernetes Installation via CLI
Allocate an EIP static public IP (one-time)
https://docs.aws.amazon.com/cli/latest/reference/ec2/allocate-address.html
$aws ec2 allocate-address { "PublicIp": "35.172..", "Domain": "vpc", "AllocationId": "eipalloc-2f743..."}
Create a Route53 Record Set - Type A (one-time)
$ cat route53-a-record-change-set.json {"Comment": "comment","Changes": [ { "Action": "CREATE", "ResourceRecordSet": { "Name": "amazon.onap.cloud", "Type": "A", "TTL": 300, "ResourceRecords": [ { "Value": "35.172.36.." }]}}]} $ aws route53 change-resource-record-sets --hosted-zone-id Z...7 --change-batch file://route53-a-record-change-set.json { "ChangeInfo": { "Status": "PENDING", "Comment": "comment", "SubmittedAt": "2018-02-17T15:02:46.512Z", "Id": "/change/C2QUNYTDVF453x" }} $ dig amazon.onap.cloud ; <<>> DiG 9.9.7-P3 <<>> amazon.onap.cloud amazon.onap.cloud. 300 IN A 35.172.36.. onap.cloud. 172800 IN NS ns-1392.awsdns-46.org.
Request a spot EC2 Instance
# request the usually cheapest $0.13 spot 64G EBS instance at AWS aws ec2 request-spot-instances --spot-price "0.25" --instance-count 1 --type "one-time" --launch-specification file://aws_ec2_spot_cli.json # don't pass in the the following - it will be generated for the EBS volume "SnapshotId": "snap-0cfc17b071e696816" launch specification json { "ImageId": "ami-c0ddd64ba", "InstanceType": "r4.2xlarge", "KeyName": "obrien_systems_aws_2015", "BlockDeviceMappings": [ {"DeviceName": "/dev/sda1", "Ebs": { "DeleteOnTermination": true, "VolumeType": "gp2", "VolumeSize": 120 }}], "SecurityGroupIds": [ "sg-322c4nnn42" ]} # results { "SpotInstanceRequests": [{ "Status": { "Message": "Your Spot request has been submitted for review, and is pending evaluation.", "Code": "pending-evaluation",
Get EC2 instanceId after creation
aws ec2 describe-spot-instance-requests --spot-instance-request-id sir-1tyr5etg "InstanceId": "i-02a653592cb748e2x",
Associate EIP with EC2 Instance
Can be done separately as long as it is in the first 30 sec during initialization and before rancher starts on the instance.
$aws ec2 associate-address --instance-id i-02a653592cb748e2x --allocation-id eipalloc-375c1d0x { "AssociationId": "eipassoc-a4b5a29x"}
Reboot EC2 Instance to apply DNS change to Rancher in AMI
$aws ec2 reboot-instances --instance-ids i-02a653592cb748e2x
Clustered Deployment
look at https://github.com/kubernetes-incubator/external-storage
EC2 Cluster Creation
EFS share for shared NFS
"From the NFS wizard"
Setting up your EC2 instance
- Using the Amazon EC2 console, associate your EC2 instance with a VPC security group that enables access to your mount target. For example, if you assigned the "default" security group to your mount target, you should assign the "default" security group to your EC2 instance. Learn more
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
If you're not using the EFS mount helper, install the NFS client on your EC2 instance:- On an Ubuntu instance:
sudo apt-get install nfs-common
- On an Ubuntu instance:
Mounting your file system
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
- Create a new directory on your EC2 instance, such as "efs".
- sudo mkdir efs
- Mount your file system. If you require encryption of data in transit, use the EFS mount helper and the TLS mount option. Mounting considerations
- Using the EFS mount helper:
sudo mount -t efs fs-43b2763a:/ efs - Using the EFS mount helper and encryption of data in transit:
sudo mount -t efs -o tls fs-43b2763a:/ efs - Using the NFS client:
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-43b2763a.efs.us-east-2.amazonaws.com:/ efs
- Using the EFS mount helper:
If you are unable to connect, see our troubleshooting documentation.
https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html
EFS/NFS Provisioning Script for AWS
https://git.onap.org/logging-analytics/tree/deploy/aws/oom_cluster_host_install.sh
ubuntu@ip-172-31-19-239:~$ sudo git clone https://gerrit.onap.org/r/logging-analytics Cloning into 'logging-analytics'... ubuntu@ip-172-31-19-239:~$ sudo cp logging-analytics/deploy/aws/oom_cluster_host_install.sh . ubuntu@ip-172-31-19-239:~$ sudo ./oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-0000001b -r us-west-1 -t 5EA8A:15000:MWcEyoKw -c true -v # fix helm after adding nodes to the master ubuntu@ip-172-31-31-219:~$ sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. ubuntu@ip-172-31-31-219:~$ sudo helm repo add local http://127.0.0.1:8879 "local" has been added to your repositories ubuntu@ip-172-31-31-219:~$ sudo helm repo list NAME URL stable https://kubernetes-charts.storage.googleapis.com local http://127.0.0.1:8879
4 Node Kubernetes Cluster on AWS
Notice that we are vCore bound Ideally we need 64 vCores for a minimal production system
Client Install
# setup the master sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # manually delete the host that was installed on the master - in the rancher gui for now # run without a client on the master sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true ls /dockerdata-nfs/ onap test.sh # run the script from git on each cluster nodes sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # check a node ls /dockerdata-nfs/ onap test.sh sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6e4a57e19c39 rancher/healthcheck:v0.3.3 "/.r/r /rancher-en..." 1 second ago Up Less than a second r-healthcheck-healthcheck-5-f0a8f5e8 f9bffc6d9b3e rancher/network-manager:v0.7.19 "/rancher-entrypoi..." 1 second ago Up 1 second r-network-services-network-manager-5-103f6104 460f31281e98 rancher/net:holder "/.r/r /rancher-en..." 4 seconds ago Up 4 seconds r-ipsec-ipsec-5-2e22f370 3e30b0cf91bb rancher/agent:v1.2.9 "/run.sh run" 17 seconds ago Up 16 seconds rancher-agent # On the master - fix helm after adding nodes to the master sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. sudo helm repo add local http://127.0.0.1:8879 # check the cluster on the master kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-16-85.us-west-1.compute.internal 129m 3% 1805Mi 5% ip-172-31-25-15.us-west-1.compute.internal 43m 1% 1065Mi 3% ip-172-31-28-145.us-west-1.compute.internal 40m 1% 1049Mi 3% ip-172-31-21-240.us-west-1.compute.internal 30m 0% 965Mi 3% # important: secure your rancher cluster by adding an oauth github account - to keep out crypto miners http://cluster.onap.info:8880/admin/access/github # now back to master to install onap # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w false -r false 136 pending > 0 at the 1th 15 sec interval ubuntu@ip-172-31-28-152:~$ kubectl get pods -n onap | grep -E '1/1|2/2' | wc -l 20 120 pending > 0 at the 39th 15 sec interval ubuntu@ip-172-31-28-152:~$ kubectl get pods -n onap | grep -E '1/1|2/2' | wc -l 47 99 pending > 0 at the 93th 15 sec interval after an hour most of the 136 containers should be up kubectl get pods --all-namespaces | grep -E '0/|1/2' onap onap-aaf-cs-59954bd86f-vdvhx 0/1 CrashLoopBackOff 7 37m onap onap-aaf-oauth-57474c586c-f9tzc 0/1 Init:1/2 2 37m onap onap-aai-champ-7d55cbb956-j5zvn 0/1 Running 0 37m onap onap-drools-0 0/1 Init:0/1 0 1h onap onap-nexus-54ddfc9497-h74m2 0/1 CrashLoopBackOff 17 1h onap onap-sdc-be-777759bcb9-ng7zw 1/2 Running 0 1h onap onap-sdc-es-66ffbcd8fd-v8j7g 0/1 Running 0 1h onap onap-sdc-fe-75fb4965bd-bfb4l 0/2 Init:1/2 6 1h # cpu bound - a small cluster has 4x4 cores - try to run with 4x16 cores ubuntu@ip-172-31-28-152:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-28-145.us-west-1.compute.internal 3699m 92% 26034Mi 85% ip-172-31-21-240.us-west-1.compute.internal 3741m 93% 3872Mi 12% ip-172-31-16-85.us-west-1.compute.internal 3997m 99% 23160Mi 75% ip-172-31-25-15.us-west-1.compute.internal 3998m 99% 27076Mi 88%
13 Node Kubernetes Cluster on AWS
Node: R4.large (2 cores, 16g)
Notice that we are vCore bound Ideally we need 64 vCores for a minimal production system - this runs with 12 x 4 vCores = 48
30 min after helm install start - DCAE containers come at at 55
ssh ubuntu@ld.onap.info # setup the master sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # manually delete the host that was installed on the master - in the rancher gui for now # get the token for use with the EFS/NFS share ubuntu@ip-172-31-8-245:~$ cat ~/.kube/config | grep token token: "QmFzaWMgTVVORk4wRkdNalF3UXpNNE9E.........RtNWxlbXBCU0hGTE1reEJVamxWTjJ0Tk5sWlVjZz09" # run without a client on the master ubuntu@ip-172-31-8-245:~$ sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s ld.onap.info -e fs-....eb -r us-east-2 -t QmFzaWMgTVVORk4wRkdNalF3UX..........aU1dGSllUVkozU0RSTmRtNWxlbXBCU0hGTE1reEJVamxWTjJ0Tk5sWlVjZz09 -c true -v true ls /dockerdata-nfs/ onap test.sh # run the script from git on each cluster node sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true ubuntu@ip-172-31-8-245:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-14-254.us-east-2.compute.internal 45m 1% 1160Mi 7% ip-172-31-3-195.us-east-2.compute.internal 29m 0% 1023Mi 6% ip-172-31-2-105.us-east-2.compute.internal 31m 0% 1004Mi 6% ip-172-31-0-159.us-east-2.compute.internal 30m 0% 1018Mi 6% ip-172-31-12-122.us-east-2.compute.internal 34m 0% 1002Mi 6% ip-172-31-0-197.us-east-2.compute.internal 30m 0% 1015Mi 6% ip-172-31-2-244.us-east-2.compute.internal 123m 3% 2032Mi 13% ip-172-31-11-30.us-east-2.compute.internal 38m 0% 1142Mi 7% ip-172-31-9-203.us-east-2.compute.internal 33m 0% 998Mi 6% ip-172-31-1-101.us-east-2.compute.internal 32m 0% 996Mi 6% ip-172-31-9-128.us-east-2.compute.internal 31m 0% 1037Mi 6% ip-172-31-3-141.us-east-2.compute.internal 30m 0% 1011Mi 6% # now back to master to install onap # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w false -r false after an hour most of the 136 containers should be up kubectl get pods --all-namespaces | grep -E '0/|1/2'
Amazon EKS Cluster for ONAP Deployment
- LOG-554Getting issue details... STATUS
- LOG-939Getting issue details... STATUS
follow
https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html
https://aws.amazon.com/getting-started/projects/deploy-kubernetes-app-amazon-eks/
follow the VPC CNI plugin - https://aws.amazon.com/blogs/opensource/vpc-cni-plugin-v1-1-available/
and 20190121 work with John Lotoskion https://lists.onap.org/g/onap-discuss/topic/aws_efs_nfs_and_rancher_2_2/29382184?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29382184
Network Diagram
Standard ELB and public/private VPC
Create EKS cluster
Provision access to EKS cluster
Kubernetes Installation via CloudFormation
ONAP Installation
SSH and upload OOM
oom_rancher_install.sh is in - OOM-715Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32019/
Run OOM
see - OOM-710Getting issue details... STATUS
cd.sh in - OOM-716Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32653/
Scenario: installing Rancher on clean Ubuntu 16.04 64g VM (single collocated server/host) and the master branch of onap via OOM deployment (2 scripts)
1 hour video of automated installation on an AWS EC2 spot instance
Run Healthcheck
Run Automated Robot parts of vFirewall VNF
Report Results
Stop Spot Instance
$ aws ec2 terminate-instances --instance-ids i-0040425ac8c0d8f6x { "TerminatingInstances": [ { "InstanceId": "i-0040425ac8c0d8f63", "CurrentState": { "Code": 32, "Name": "shutting-down" }, "PreviousState": { "Code": 16, "Name": "running" } } ]}
Verify Instance stopped
Video on Installing and Running the ONAP Demos#ONAPDeploymentVideos
WE can run ONAP on an AWS EC2 instance for $0.17/hour as opposed to Rackspace at $1.12/hour for a 64G Ubuntu host VM.
I have created an AMI on Amazon AWS under the following ID that has a reference 20170825 tag of ONAP 1.0 running on top of Rancher
ami-b8f3f3c3 : onap-oom-k8s-10
EIP 34.233.240.214 maps to http://dev.onap.info:8880/env/1a7/infra/hosts
A D2.2xlarge with 61G ram on the spot market https://console.aws.amazon.com/ec2sp/v1/spot/launch-wizard?region=us-east-1 at $0.16/hour for all of ONAP
It may take up to 3-8 min for kubernetes pods to initialize as long as you preload the docker images - OOM-328Getting issue details... STATUS
Workaround for the disk space error - even though we are running with a 1.9 TB NVMe SSD
https://github.com/kubernetes/kubernetes/issues/48703
Use a flavor that uses EBS like M4.4xLarge which is OK
Use a flavor that uses EBS like M4.4xLarge which is OK - except for AAI right now
Expected Monthly Billing
r4.2xlarge is the smallest and most cost effective 64g min instance to use for full ONAP deployment - it requires EBS stores. This is assuming 1 instance up at all times and a couple ad-hoc instances up a couple hours for testing/experimentation.
Option 1: Migrating Heat to CloudFormation
Resource Correspondence
ID | Type | Parent | AWS | Openstack |
---|---|---|---|---|
Using the CloudFormationDesigner
https://console.aws.amazon.com/cloudformation/designer/home?region=us-east-1#
Decoupling and Abstracting Southbound Orchestration via Plugins
Part of getting another infrastructure provider like AWS to work with ONAP will be in identifying and decoupling southbound logic from any particular cloud provider using an extensible plugin architecture on the SBI interface.
see Multi VIM/Cloud (5/11/17), VID project (5/17/17), Service Orchestrator (5/14/17), ONAP Operations Manager (5/10/17), ONAP Operations Manager / ONAP on Containers
Design Issues
DI 1: Refactor nested orchestration in DCAE
Replace the DCAE Controller
DI 2: Elastic IP allocation
DI 3: Investigate Cloudify plugin for AWS
Cloudify is Tosca based - https://github.com/cloudify-cosmo/cloudify-aws-plugin
DI 4: 20180803 Investigate ISTIO service mesh
https://istio.io/docs/setup/kubernetes/quick-start/
- LOG-592Getting issue details... STATUS
Links
Waiting for the EC2 C5 instance types under the C620 chipset to arrive at AWS so we can experiment under EC2 Spot - http://technewshunter.com/cpus/intel-launches-xeon-w-cpus-for-workstations-skylake-sp-ecc-for-lga2066-41771/ https://aws.amazon.com/about-aws/whats-new/2016/11/coming-soon-amazon-ec2-c5-instances-the-next-generation-of-compute-optimized-instances/
http://docs.aws.amazon.com/cli/latest/userguide/cli-install-macos.html
use
curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" unzip awscli-bundle.zip sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws aws --version aws-cli/1.11.170 Python/2.7.13 Darwin/16.7.0 botocore/1.7.28
EC2 VMs
AWS Clustered Deployment
AWS EC2 Cluster Creation
AWS EFS share for shared NFS
You need an NFS share between the VM's in your Kubernetes cluster - an Elastic File System share will wrap NFS
"From the NFS wizard"
Setting up your EC2 instance
- Using the Amazon EC2 console, associate your EC2 instance with a VPC security group that enables access to your mount target. For example, if you assigned the "default" security group to your mount target, you should assign the "default" security group to your EC2 instance. Learn more
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
If you're not using the EFS mount helper, install the NFS client on your EC2 instance:- On an Ubuntu instance:
sudo apt-get install nfs-common
- On an Ubuntu instance:
Mounting your file system
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
- Create a new directory on your EC2 instance, such as "efs".
- sudo mkdir efs
- Mount your file system. If you require encryption of data in transit, use the EFS mount helper and the TLS mount option. Mounting considerations
- Using the EFS mount helper:
sudo mount -t efs fs-43b2763a:/ efs - Using the EFS mount helper and encryption of data in transit:
sudo mount -t efs -o tls fs-43b2763a:/ efs - Using the NFS client:
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-43b2763a.efs.us-east-2.amazonaws.com:/ efs
- Using the EFS mount helper:
If you are unable to connect, see our troubleshooting documentation.
https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html
Automated
Manual
ubuntu@ip-10-0-0-66:~$ sudo apt-get install nfs-common ubuntu@ip-10-0-0-66:~$ cd / ubuntu@ip-10-0-0-66:~$ sudo mkdir /dockerdata-nfs root@ip-10-0-0-19:/# sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-43b2763a.efs.us-east-2.amazonaws.com:/ /dockerdata-nfs # write something on one vm - and verify it shows on another ubuntu@ip-10-0-0-8:~$ ls /dockerdata-nfs/ test.sh
Microsoft Azure
Subscription Sponsor: (1) Microsoft
VMs
Deliverables are deployment scripts, arm/cli templates for various deployment scenarios (single, multiple, federated servers)
In review - OOM-711Getting issue details... STATUS
Quickstart
Single collocated VM
Automation is currently only written for single VM that hosts both the rancher server and the deployed onap pods. Use the ARM template below to deploy your VM and provision it (adjust your config parameters)
Two choices, run the single oom_deployment.sh ARM wrapper - or use it to bring up an empty vm and run oom_entrypoint.sh manually. Once the VM comes up the oom_entrypoint.sh script will run - which will download the oom_rancher_setup.sh script to setup docker, rancher, kubernetes and helm - the entrypoint script will then run the cd.sh script to bring up onap based on your values.yaml config by running helm install on it.
# login to az cli, wget the deployment script, arm template and parameters file - edit the parameters file (dns, ssh key ...) and run the arm template wget https://git.onap.org/logging-analytics/plain/deploy/azure/oom_deployment.sh wget https://git.onap.org/logging-analytics/plain/deploy/azure/_arm_deploy_onap_cd.json wget https://git.onap.org/logging-analytics/plain/deploy/azure/_arm_deploy_onap_cd_z_parameters.json # either run the entrypoint which creates a resource template and runs the stack - or do those two commands manually ./oom_deployment.sh -b master -s azure.onap.cloud -e onap -r a_auto-youruserid_20180421 -t arm_deploy_onap_cd.json -p arm_deploy_onap_cd_z_parameters.json # wait for the VM to finish in about 75 min or watch progress by ssh'ing into the vm and doing root@ons-auto-201803181110z: sudo tail -f /var/lib/waagent/custom-script/download/0/stdout # if you wish to run the oom_entrypoint script yourself - edit/break the cloud init section at the end of the arm template and do it yourself below # download and edit values.yaml with your onap preferences and openstack tenant config wget https://jira.onap.org/secure/attachment/11414/values.yaml # download and run the bootstrap and onap install script, the -s server name can be an IP, FQDN or hostname wget https://git.onap.org/logging-analytics/plain/deploy/rancher/oom_entrypoint.sh chmod 777 oom_entrypoint.sh sudo ./oom_entrypoint.sh -b master -s devops.onap.info -e onap # wait 15 min for rancher to finish, then 30-90 min for onap to come up #20181015 - delete the deployment, recreate the onap environment in rancher with the template adjusted for more than the default 110 container limit - by adding --max-pods=500 # then redo the helm install
- OOM-714Getting issue details... STATUS see https://jira.onap.org/secure/attachment/11455/oom_openstack.yaml and https://jira.onap.org/secure/attachment/11454/oom_openstack_oom.env
- LOG-320Getting issue details... STATUS see https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_entrypoint.sh
customize your template (true/false for any components, docker overrides etc...)
https://jira.onap.org/secure/attachment/11414/values.yaml
Run oom_entrypoint.sh after you verified values.yaml - it will run both scripts below for you - a single node kubernetes setup running what you configured in values.yaml will be up in 50-90 min. If you want to just configure your vm without bringing up ONAP - comment out the cd.sh line and run that separately.
- LOG-325Getting issue details... STATUS see wget https://git.onap.org/logging-analytics/plain/deploy/rancher/oom_rancher_setup.sh
- LOG-326Getting issue details... STATUS see wget https://git.onap.org/logging-analytics/plain/deploy/cd.sh
Verify your system is up by doing a kubectl get pods --all-namespaces and checking the 8880 port to bring up the rancher or kubernetes gui.
Login to Azure CLI
https://portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Resources%2Fresources
Download Azure ONAP ARM template
see
- OOM-711Getting issue details... STATUS
Edit Azure ARM template environment parameters
Create Resource Group
az group create --name onap_eastus --location eastus
Run ARM template
az group deployment create --resource-group onap_eastus --template-file oom_azure_arm_deploy.json --parameters @oom_azure_arm_deploy_parameters.json
Wait for Rancher/Kubernetes install
The oom_entrypoint.sh script will be run as a cloud-init script on the VM - see
- LOG-320Getting issue details... STATUS
which runs
- LOG-325Getting issue details... STATUS
Wait for OOM ONAP install
see
- LOG-326Getting issue details... STATUS
Verify ONAP installation
kubectl get pods --all-namespaces # raise/lower onap components from the installed directory if using the oneclick arm template # amsterdam only root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick# ./createAll.bash -n onap
Azure CLI Installation
Requirements
Azure subscription
OSX
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest
Install homebrew first (reinstall if you are on the latest OSX 10.13.2 https://github.com/Homebrew/install because of 3718)
Will install Python 3.6
$brew update $brew install azure-cli
https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli?view=azure-cli-latest
$ az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E..D to authenticate. [ { "cloudName": "AzureCloud", "id": "f4...b", "isDefault": true, "name": "Pay-As-You-Go", "state": "Enabled", "tenantId": "bcb.....f", "user": { "name": "michael@....org", "type": "user" }}]
Bastion/Jumphost VM in Azure
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-apt?view=azure-cli-latest
# in root AZ_REPO=$(lsb_release -cs) echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | sudo tee /etc/apt/sources.list.d/azure-cli.list apt-key adv --keyserver packages.microsoft.com --recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893 apt-get install apt-transport-https apt-get update && sudo apt-get install azure-cli az login # verify root@ons-dmz:~# ps -ef | grep az root 1427 1 0 Mar17 ? 00:00:00 /usr/lib/linux-tools/4.13.0-1011-azure/hv_vss_daemon -n
Windows Powershell
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?view=azure-cli-latest
ARM Template
Follow https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-create-first-template
Create a Storage Account
$ az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E...Z to authenticate. $ az group create --name examplegroup --location "South Central US" { "id": "/subscriptions/f4b...e8b/resourceGroups/examplegroup", "location": "southcentralus", "managedBy": null, "name": "examplegroup", "properties": { "provisioningState": "Succeeded" }, "tags": null } obrien:obrienlabs amdocs$ vi azuredeploy_storageaccount.json obrien:obrienlabs amdocs$ az group deployment create --resource-group examplegroup --template-file azuredeploy_storageaccount.json { "id": "/subscriptions/f4...e8b/resourceGroups/examplegroup/providers/Microsoft.Resources/deployments/azuredeploy_storageaccount", "name": "azuredeploy_storageaccount", "properties": { "additionalProperties": { "duration": "PT32.9822642S", "outputResources": [ { "id": "/subscriptions/f4..e8b/resourceGroups/examplegroup/providers/Microsoft.Storage/storageAccounts/storagekj6....kk2w", "resourceGroup": "examplegroup" }], "templateHash": "11440483235727994285"}, "correlationId": "41a0f79..90c291", "debugSetting": null, "dependencies": [], "mode": "Incremental", "outputs": {}, "parameters": {}, "parametersLink": null, "providers": [ { "id": null, "namespace": "Microsoft.Storage", "registrationState": null, "resourceTypes": [ { "aliases": null, "apiVersions": null, "locations": [ "southcentralus" ], "properties": null, "resourceType": "storageAccounts" }]}], "provisioningState": "Succeeded", "template": null, "templateLink": null, "timestamp": "2018-02-17T16:15:11.562170+00:00" }, "resourceGroup": "examplegroup"}
Pick a region
az account list-locations northcentralus for example
Create a resource group
# create a resource group if not already there az group create --name obrien_jenkins_b_westus2 --location westus2
Create a VM
We need a 128G VM with at least 8vCores (peak is 60) and a 100+GB drive. The sizes are detailed on https://docs.microsoft.com/en-ca/azure/virtual-machines/windows/sizes-memory - we will use the Standard_D32s_v3 type
We need an "all open 0.0.0.0/0" security group and a reassociated data drive as boot drive - see the arm template in LOG-321
Get the ARM template
see open review in - OOM-711Getting issue details... STATUS
"ubuntuOSVersion": "16.04.0-LTS" "imagePublisher": "Canonical", "imageOffer": "UbuntuServer", "vmSize": "Standard_E8s_v3" "osDisk": {"createOption": "FromImage"},"dataDisks": [{"diskSizeGB": 511,"lun": 0, "createOption": "Empty" }]
Follow
https://github.com/Azure/azure-quickstart-templates/tree/master/101-acs-kubernetes
https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-template-deploy
https://github.com/Azure/azure-quickstart-templates/tree/master/101-vm-simple-linux
It needs a security group https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-create-nsg-arm-template
{ "apiVersion": "2017-03-01", "type": "Microsoft.Network/networkSecurityGroups", "name": "[variables('networkSecurityGroupName')]", "location": "[resourceGroup().location]", "tags": { "displayName": "NSG - Front End" }, "properties": { "securityRules": [ { "name": "in-rule", "properties": { "description": "All in", "protocol": "Tcp", "sourcePortRange": "*", "destinationPortRange": "*", "sourceAddressPrefix": "Internet", "destinationAddressPrefix": "*", "access": "Allow", "priority": 100, "direction": "Inbound" } }, { "name": "out-rule", "properties": { "description": "All out", "protocol": "Tcp", "sourcePortRange": "*", "destinationPortRange": "*", "sourceAddressPrefix": "Internet", "destinationAddressPrefix": "*", "access": "Allow", "priority": 101, "direction": "Outbound" } } ] } } , { "apiVersion": "2017-04-01", "type": "Microsoft.Network/virtualNetworks", "name": "[variables('virtualNetworkName')]", "location": "[resourceGroup().location]", "dependson": [ "[concat('Microsoft.Network/networkSecurityGroups/', variables('networkSecurityGroupName'))]" ], "properties": { "addressSpace": { "addressPrefixes": [ "[variables('addressPrefix')]" ] }, "subnets": [ { "name": "[variables('subnetName')]", "properties": { "addressPrefix": "[variables('subnetPrefix')]", "networkSecurityGroup": { "id": "[resourceId('Microsoft.Network/networkSecurityGroups', variables('networkSecurityGroupName'))]" } } } ] } },
# validate first (validate instead of create) az group deployment create --resource-group obrien_jenkins_b_westus2 --template-file oom_azure_arm_deploy.json --parameters @oom_azure_arm_cd_amsterdam_deploy_parameters.json
SSH into your VM and run the Kubernetes and OOM installation scripts
Use the entrypoint script in - OOM-710Getting issue details... STATUS
# clone the oom repo to get the install directory sudo git clone https://gerrit.onap.org/r/logging-analytics # run the Rancher RI installation (to install kubernetes) sudo logging-analytics/deploy/rancher/oom_rancher_install.sh -b master -s 192.168.240.32 -e onap # run the oom deployment script # get a copy of onap-parameters.yaml and place in this folder logging-analytics/deploy/cd.sh -b master -s 192.168.240.32 -e onap
oom_rancher_install.sh is in - OOM-715Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32019/
cd.sh in - OOM-716Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32653/
Delete the VM and resource group
# delete the vm and resources az group deployment delete --resource-group ONAPAMDOCS --name oom_azure_arm_deploy # the above deletion will not delete the actual resources - only a delete of the group or each individual resource works # optionally delete the resource group az group delete --name ONAPAMDOCS -y
Azure devops
create static IP
az network public-ip create --name onap-argon --resource-group a_ONAP_argon_prod_donotdelete --location eastus --allocation-method Static
ONAP on Azure Container Service
AKS Installation
Follow https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-deploy-cluster
Register for AKS preview via az cli
obrienbiometrics:obrienlabs michaelobrien$ az provider register -n Microsoft.ContainerService Registering is still on-going. You can monitor using 'az provider show -n Microsoft.ContainerService'
Create an AKS resource group
Raise your AKS vCPU quota - optional
http://aka.ms/corequotaincrease
https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest
Deployment failed. Correlation ID: 4b4707a7-2244-4557-855e-11bcced556de. Provisioning of resource(s) for container service onapAKSCluster in resource group onapAKS failed. Message: Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 10, Additional requested: 1. Please read more about quota increase at http://aka.ms/corequotaincrease.. Details:
Create AKS cluster
obrienbiometrics:obrienlabs michaelobrien$ az aks create --resource-group onapAKS --name onapAKSCluster --node-count 1 --generate-ssh-keys - Running .. "fqdn": "onapaksclu-onapaks-f4....3.hcp.eastus.azmk8s.io",
AKS cluster VM granularity
The cluster will start with a 3.5G VM before scaling
Resources for your AKS cluster
Bring up AAI only for now
Design Issues
Resource Group
A resource group makes it easier to package and remove everything for a deployment - essentially making the deployment stateless
Network Security Group
Global or local to the resource group?
Follow CSEC guidelines https://www.cse-cst.gc.ca/en/system/files/pdf_documents/itsg-22-eng.pdf
Static public IP
Register a CNAME for an existing domain and use the same IP address everytime the deployment comes up
Entrypoint cloud init script
How to attach the cloud init script to provision the VM
ARM template chaining
passing derived varialbles into the next arm template - for example when bringing up an entire federated set in one or more DCs
see script attached to
Troubleshooting
DNS propagation and caching
It takes about 2 min for DNS entries to propagate out from A record DNS changes. For example the following IP/DNS association took 2 min to appear in dig.
obrienbiometrics:onap_oom_711_azure michaelobrien$ dig azure.onap.info ; <<>> DiG 9.9.7-P3 <<>> azure.onap.info ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10599 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;azure.onap.info. IN A ;; ANSWER SECTION: azure.onap.info. 251 IN A 52.224.233.230 ;; Query time: 68 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue Feb 20 10:26:59 EST 2018 ;; MSG SIZE rcvd: 60 obrienbiometrics:onap_oom_711_azure michaelobrien$ dig azure.onap.info ; <<>> DiG 9.9.7-P3 <<>> azure.onap.info ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30447 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;azure.onap.info. IN A ;; ANSWER SECTION: azure.onap.info. 299 IN A 13.92.225.167 ;; Query time: 84 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue Feb 20 10:27:04 EST 2018
Corporate Firewall Access
Inside the corporate firewall - avoid it PS C:\> az login Please ensure you have network connection. Error detail: HTTPSConnectionPool(host='login.microsoftonline.com', port=443) : Max retries exceeded with url: /common/oauth2/devicecode?api-version=1.0 (Caused by NewConnectionError('<urllib3.conne ction.VerifiedHTTPSConnection object at 0x04D18730>: Failed to establish a new connection: [Errno 11001] getaddrinfo fai led',)) at home or cell hotspot PS C:\> az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E...2W to authenticate. [ { "cloudName": "AzureCloud", "id": "4...da1", "isDefault": true, "name": "Microsoft Azure Internal Consumption", "state": "Enabled", "tenantId": "72f98....47", "user": { "name": "fran...ocs.com", "type": "user" }] On corporate account (need permissions bump to be able to create a resource group prior to running an arm template https://wiki.onap.org/display/DW/ONAP+on+Kubernetes+on+Microsoft+Azure#ONAPonKubernetesonMicrosoftAzure-ARMTemplate PS C:\> az group create --name onapKubernetes --location eastus The client 'fra...s.com' with object id '08f98c7e-...ed' does not have authorization to per form action 'Microsoft.Resources/subscriptions/resourcegroups/write' over scope '/subscriptions/42e...8 7da1/resourcegroups/onapKubernetes'. try my personal = OK PS C:\> az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code EE...ULR to authenticate. Terminate batch job (Y/N)? y # hangs when first time login in a new pc PS C:\> az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E.PBKS to authenticate. [ { "cloudName": "AzureCloud", "id": "f4b...b", "isDefault": true "name": "Pay-As-You-Go", "state": "Enabled", "tenantId": "bcb...f4f", "user": "name": "michael@obrien...org", "type": "user" } }] PS C:\> az group create --name onapKubernetes2 --location eastus { "id": "/subscriptions/f4b....b/resourceGroups/onapKubernetes2", "location": "eastus", "managedBy": null, "name": "onapKubernetes2", "properties": { "provisioningState": "Succeeded" }, "tags": null}
Design Issues
20180228: Deployment delete does not delete resources without a resourceGroup delete
I find that a delete deployment deletes the deployment but not the actual resources. The workaround is to delete the resource group - but in some constrained subscriptions the cli user may not have the ability to create a resource group - and hence delete it.
see
https://github.com/Azure/azure-sdk-for-java/issues/1167
deleting the resources manually for now - is a workaround if you cannot create/delete resource groups
# delete the vm and resources az group deployment delete --resource-group ONAPAMDOCS --name oom_azure_arm_deploy
# the above deletion will not delete the actual resources - only a delete of the group or each individual resource works
# optionally delete the resource group az group delete --name ONAPAMDOCS -y
However modifying the template to add resources works well. For example adding a reference to a network security group
20180228: Resize the OS disk
ONAP requires at least 75g - the issue is than in most VM templates on Azure - the OS disk is 30g - we need to either switch to the data disk or resize the os disk.
# add diskSizeGB to the template "osDisk": { "diskSizeGB": 255, "createOption": "FromImage" }, ubuntu@oom-auto-deploy:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 65989400 0 65989400 0% /dev tmpfs 13201856 8848 13193008 1% /run /dev/sda1 259142960 1339056 257787520 1% / tmpfs 66009280 0 66009280 0% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 66009280 0 66009280 0% /sys/fs/cgroup none 64 0 64 0% /etc/network/interfaces.dynamic.d /dev/sdb1 264091588 60508 250592980 1% /mnt tmpfs 13201856 0 13201856 0% /run/user/1000 ubuntu@oom-auto-deploy:~$ free total used free shared buff/cache available Mem: 132018560 392336 131242164 8876 384060 131012328
20180301: Add oom_entrypoint.sh bootstrap script to install rancher and onap
in review under OOM-715
https://jira.onap.org/secure/attachment/11206/oom_entrypoint.sh
If using amsterdam - swap out the onap-parameters.yaml (the curl is hardcoded to a master branch version)
20180303: cloudstorage access on OSX via Azure Storage Manager
use this method instead of installing az cli directly - for certain corporate oauth configurations
https://azure.microsoft.com/en-us/features/storage-explorer/
Install AZM using the name and access key of a storage account created manually or by enabling the az cli on the browser
20180318: add oom_entrypoint.sh to cloud-init on the arm template
See https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/virtualmachines/extensions it looks like Azure has a similar setup to AWS ebextentions
Targetting
type | string | No | Specifies the type of the extension; an example is "CustomScriptExtension". |
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/extensions-customscript
deprecated { "apiVersion": "2015-06-15", "type": "Microsoft.Compute/virtualMachines/extensions", "name": "[concat(parameters('vmName'),'/onap')]", "location": "[resourceGroup().location]", "dependsOn": ["[concat('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"], "properties": { "publisher": "Microsoft.Azure.Extensions", "type": "CustomScript", "typeHandlerVersion": "1.9", "autoUpgradeMinorVersion": true, "settings": { "fileUris": [ "https://jira.onap.org/secure/attachment/11263/oom_entrypoint.sh" ], "commandToExecute": "[concat('./' , parameters('scriptName'), ' -b master -s dns/pub/pri-ip -e onap' )]" } } } use { "apiVersion": "2017-12-01", "type": "Microsoft.Compute/virtualMachines/extensions", "name": "[concat(parameters('vmName'),'/onap')]", "location": "[resourceGroup().location]", "dependsOn": ["[concat('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"], "properties": { "publisher": "Microsoft.Azure.Extensions", "type": "CustomScript", "typeHandlerVersion": "2.0", "autoUpgradeMinorVersion": true, "settings": { "fileUris": [ "https://jira.onap.org/secure/attachment/11281/oom_entrypoint.sh" ], "commandToExecute": "[concat('./' , parameters('scriptName'), ' -b master ', ' -s ', 'ons-auto-201803181110z', ' -e onap' )]" } } }
ubuntu@ons-dmz:~$ ./oom_deployment.sh
Deployment template validation failed: 'The template resource 'entrypoint' for type 'Microsoft.Compute/virtualMachines/extensions' at line '1' and column '6182' has incorrect segment lengths. A nested resource type must have identical number of segments as its resource name. A root resource type must have segment length one greater than its resource name. Please see https://aka.ms/arm-template/#resources for usage details.'.
ubuntu@ons-dmz:~$ ./oom_deployment.sh
Deployment failed. Correlation ID: 532b9a9b-e0e8-4184-9e46-6c2e7c15e7c7. {
"error": {
"code": "ParentResourceNotFound",
"message": "Can not perform requested operation on nested resource. Parent resource '[concat(parameters('vmName'),'' not found."
}
}
fixed 20180318:1600
Install runs - but I need visibility - checking /var/lib/waagent/custom-script/download/0/
progress
./oom_deployment.sh # 7 min to delete old deployment ubuntu@ons-dmz:~$ az vm extension list -g a_ONAP_auto_201803181110z --vm-name ons-auto-201803181110z .. "provisioningState": "Creating", "settings": { "commandToExecute": "./oom_entrypoint.sh -b master -s ons-auto-201803181110zons-auto-201803181110z.eastus.cloudapp.azure.com -e onap", "fileUris": [ "https://jira.onap.org/secure/attachment/11263/oom_entrypoint.sh" ubuntu@ons-auto-201803181110z:~$ sudo su - root@ons-auto-201803181110z:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 83458596d7a6 rancher/server:v1.6.14 "/usr/bin/entry /u..." 3 minutes ago Up 3 minutes 3306/tcp, 0.0.0.0:8880->8080/tcp rancher_server root@ons-auto-201803181110z:~# tail -f /var/log/azure/custom-script/handler.log time=2018-03-18T22:51:59Z version=v2.0.6/git@1008306-clean operation=enable seq=0 file=0 event="download complete" output=/var/lib/waagent/custom-script/download/0 time=2018-03-18T22:51:59Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="executing command" output=/var/lib/waagent/custom-script/download/0 time=2018-03-18T22:51:59Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="executing public commandToExecute" output=/var/lib/waagent/custom-script/download/0 root@ons-auto-201803181110z:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 539733f24c01 rancher/agent:v1.2.9 "/run.sh run" 13 seconds ago Up 13 seconds rancher-agent 83458596d7a6 rancher/server:v1.6.14 "/usr/bin/entry /u..." 5 minutes ago Up 5 minutes 3306/tcp, 0.0.0.0:8880->8080/tcp rancher_server root@ons-auto-201803181110z:~# ls -la /var/lib/waagent/custom-script/download/0/ total 31616 -rw-r--r-- 1 root root 16325186 Aug 31 2017 helm-v2.6.1-linux-amd64.tar.gz -rw-r--r-- 1 root root 4 Mar 18 22:55 kube_env_id.json drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 18 22:53 linux-amd64 -r-x------ 1 root root 2822 Mar 18 22:51 oom_entrypoint.sh -rwxrwxrwx 1 root root 7288 Mar 18 22:52 oom_rancher_setup.sh -rwxr-xr-x 1 root root 12213376 Mar 18 22:53 rancher -rw-r--r-- 1 root root 3736787 Dec 20 19:41 rancher-linux-amd64-v0.6.7.tar.gz drwxr-xr-x 2 root root 4096 Dec 20 19:39 rancher-v0.6.7
testing via http://jenkins.onap.cloud/job/oom_azure_deployment/
Need the ip address and not the domain name - via linked template
or
https://docs.microsoft.com/en-us/azure/templates/microsoft.network/publicipaddresses
https://github.com/Azure/azure-quickstart-templates/issues/583
Arm templates cannot specify a static ip - without a private subnet
reference(variables('publicIPAddressName')).ipAddress
for
reference(variables('nicName')).ipConfigurations[0].properties.privateIPAddress
Using the hostname instead of the private/public ip works (verify /etc/hosts though)
obrienbiometrics:oom michaelobrien$ ssh ubuntu@13.99.207.60 ubuntu@ons-auto-201803181110z:~$ sudo su - root@ons-auto-201803181110z:/var/lib/waagent/custom-script/download/0# cat stdout INFO: Running Agent Registration Process, CATTLE_URL=http://ons-auto-201803181110z:8880/v1 INFO: Attempting to connect to: http://ons-auto-201803181110z:8880/v1 INFO: http://ons-auto-201803181110z:8880/v1 is accessible INFO: Inspecting host capabilities INFO: Boot2Docker: false INFO: Host writable: true INFO: Token: xxxxxxxx INFO: Running registration INFO: Printing Environment INFO: ENV: CATTLE_ACCESS_KEY=9B0FA1695A3E3CFD07DB INFO: ENV: CATTLE_HOME=/var/lib/cattle INFO: ENV: CATTLE_REGISTRATION_ACCESS_KEY=registrationToken INFO: ENV: CATTLE_REGISTRATION_SECRET_KEY=xxxxxxx INFO: ENV: CATTLE_SECRET_KEY=xxxxxxx INFO: ENV: CATTLE_URL=http://ons-auto-201803181110z:8880/v1 INFO: ENV: DETECTED_CATTLE_AGENT_IP=172.17.0.1 INFO: ENV: RANCHER_AGENT_IMAGE=rancher/agent:v1.2.9 INFO: Launched Rancher Agent: b44bd62fd21c961f32f642f7c3b24438fc4129eabbd1f91e1cf58b0ed30b5876 waiting 7 min for host registration to finish 1 more min KUBECTL_TOKEN base64 encoded: QmFzaWMgUWpBNE5EWkdRlRNN.....Ukc1d2MwWTJRZz09 run the following if you installed a higher kubectl version than the server helm init --upgrade Verify all pods up on the kubernetes system - will return localhost:8080 until a host is added kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-76b8cd7b5-v5jrd 1/1 Running 0 5m kube-system kube-dns-5d7b4487c9-9bwk5 3/3 Running 0 5m kube-system kubernetes-dashboard-f9577fffd-cpwv7 1/1 Running 0 5m kube-system monitoring-grafana-997796fcf-s4sjm 1/1 Running 0 5m kube-system monitoring-influxdb-56fdcd96b-2mn6r 1/1 Running 0 5m kube-system tiller-deploy-cc96d4f6b-fll4t 1/1 Running 0 5m
20180318: Create VM image without destroying running VM
In AWS we can select the "no reboot" option and create an image from a running VM as-is with no effect on the running system.
Having issues with the Azure image creator - it is looking for the ubuntu pw even though I only use key based access
20180319: New Relic Monitoring
20180319: document devops flow
aka: travellers guide
20180319: Document Virtual Network Topology
20180429: Helm repo n/a after reboot - rerun helm serve
If you run into issues doing a make all - your helm server is not running
# rerun helm serve & helm repo add local http://127.0.0.1:8879
20180516: Clustered NFS share via Azure Files
Need a cloud native NFS wrapper like EFS(AWS) - looking at Azure files
Training
(Links below from Microsoft - thank you)
General Azure Documentation
Azure Site http://azure.microsoft.com
Azure Documentation Site https://docs.microsoft.com/en-us/azure/
Azure Training Courses https://azure.microsoft.com/en-us/training/free-online-courses/
Azure Portal http://portal.azure.com
Developer Documentation
Azure AD Authentication Libraries https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-authentication-libraries
Java Overview on Azure https://azure.microsoft.com/en-us/develop/java/
Java Docs for Azure https://docs.microsoft.com/en-us/java/azure/
Java SDK on GitHub https://github.com/Azure/azure-sdk-for-java
Python Overview on Azure https://azure.microsoft.com/en-us/develop/python/
Python Docs for Azure https://docs.microsoft.com/en-us/python/azure/
Python SDK on GitHub https://github.com/Azure/azure-sdk-for-python
REST Api and CLI Documentation
REST API Documentation https://docs.microsoft.com/en-us/rest/api/
CLI Documentation https://docs.microsoft.com/en-us/cli/azure/index
Other Documentation
Using Automation for VM shutdown & startup https://docs.microsoft.com/en-us/azure/automation/automation-solution-vm-management
Azure Resource Manager (ARM) QuickStart Templates https://github.com/Azure/azure-quickstart-templates
Known Forks
The code in this github repo has 2 month old copies of cd.sh and oom_rancher_install.sh
https://github.com/taranki/onap-azure
Use the official ONAP code in
https://gerrit.onap.org/r/logging-analytics
The original seed source from 2017 below is deprecated - use onap links above
https://github.com/obrienlabs/onap-root
Links
https://azure.microsoft.com/en-us/services/container-service/
https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/virtualmachines
https://kubernetes.io/docs/concepts/containers/images/#using-azure-container-registry-acr
https://azure.microsoft.com/en-us/features/storage-explorer/
https://docs.microsoft.com/en-ca/azure/virtual-machines/linux/capture-image
AKS
Google GCE
Account Provider: Michael O'Brien of Amdocs
2022: Contact me at fmichaelobrien@google.com for any technical discussions about deploying ONAP to or with parts of Google Cloud and I will see how we can help. I have experience with ONAP from Amsterdam until Casablanca (getting up to speed on Jakartha)
OOM Installation on a GCE VM
The purpose of this page is to detail getting ONAP on Kubernetes (OOM) setup on a GCE VM.
I recommend using the ONAP on Kubernetes on Amazon EC2 Amazon EC2 Spot API - as it runs around $0.12-0.25/hr at 75% off instead of the $0.60 below (33% off for reserved instances) - this page is here so we can support GCE and also work with the kubernetes open source project in a space it was originally designed in at Google.
Login to your google account and start creating a 128g Ubuntu 16.04 VM
Install google command line tools
?????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? Components ? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? Status ? Name ? ID ? Size ? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? Not Installed ? App Engine Go Extensions ? app-engine-go ? 97.7 MiB ? ? Not Installed ? Cloud Bigtable Command Line Tool ? cbt ? 4.0 MiB ? ? Not Installed ? Cloud Bigtable Emulator ? bigtable ? 3.5 MiB ? ? Not Installed ? Cloud Datalab Command Line Tool ? datalab ? < 1 MiB ? ? Not Installed ? Cloud Datastore Emulator ? cloud-datastore-emulator ? 17.7 MiB ? ? Not Installed ? Cloud Datastore Emulator (Legacy) ? gcd-emulator ? 38.1 MiB ? ? Not Installed ? Cloud Pub/Sub Emulator ? pubsub-emulator ? 33.2 MiB ? ? Not Installed ? Emulator Reverse Proxy ? emulator-reverse-proxy ? 14.5 MiB ? ? Not Installed ? Google Container Local Builder ? container-builder-local ? 3.7 MiB ? ? Not Installed ? Google Container Registry's Docker credential helper ? docker-credential-gcr ? 2.2 MiB ? ? Not Installed ? gcloud Alpha Commands ? alpha ? < 1 MiB ? ? Not Installed ? gcloud Beta Commands ? beta ? < 1 MiB ? ? Not Installed ? gcloud app Java Extensions ? app-engine-java ? 116.0 MiB ? ? Not Installed ? gcloud app PHP Extensions ? app-engine-php ? 21.9 MiB ? ? Not Installed ? gcloud app Python Extensions ? app-engine-python ? 6.2 MiB ? ? Not Installed ? kubectl ? kubectl ? 15.9 MiB ? ? Installed ? BigQuery Command Line Tool ? bq ? < 1 MiB ? ? Installed ? Cloud SDK Core Libraries ? core ? 5.9 MiB ? ? Installed ? Cloud Storage Command Line Tool ? gsutil ? 3.3 MiB ? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????? ==> Source [/Users/michaelobrien/gce/google-cloud-sdk/completion.bash.inc] in your profile to enable shell command completion for gcloud. ==> Source [/Users/michaelobrien/gce/google-cloud-sdk/path.bash.inc] in your profile to add the Google Cloud SDK command line tools to your $PATH. gcloud init obrienbiometrics:google-cloud-sdk michaelobrien$ source ~/.bash_profile obrienbiometrics:google-cloud-sdk michaelobrien$ gcloud components update All components are up to date.
Connect to your VM by getting a dynamic SSH key
obrienbiometrics:google-cloud-sdk michaelobrien$ gcloud compute ssh instance-1 WARNING: The public SSH key file for gcloud does not exist. WARNING: The private SSH key file for gcloud does not exist. WARNING: You do not have an SSH key for gcloud. WARNING: SSH keygen will be executed to generate a key. Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /Users/michaelobrien/.ssh/google_compute_engine. Your public key has been saved in /Users/michaelobrien/.ssh/google_compute_engine.pub. The key fingerprint is: SHA256:kvS8ZIE1egbY+bEpY1RGN45ruICBo1WH8fLWqO435+Y michaelobrien@obrienbiometrics.local The key's randomart image is: +---[RSA 2048]----+ | o=o+* o | | . .oo+*.= . | |o o ..=.=+. | |.o o ++X+o | |. . ..BoS | | + * . | | . . . | | . o o | | .o. *E | +----[SHA256]-----+ Updating project ssh metadata.../Updated [https://www.googleapis.com/compute/v1/projects/onap-184300]. Updating project ssh metadata...done. Waiting for SSH key to propagate. Warning: Permanently added 'compute.2865548946042680113' (ECDSA) to the list of known hosts. Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-37-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Get cloud support with Ubuntu Advantage Cloud Guest: http://www.ubuntu.com/business/services/cloud 0 packages can be updated. 0 updates are security updates. michaelobrien@instance-1:~$
Open up firewall rules or the entire VM
We need at least port 8880 for rancher
obrienbiometrics:20171027_log_doc michaelobrien$ gcloud compute firewall-rules create open8880 --allow tcp:8880 --source-tags=instance-1 --source-ranges=0.0.0.0/0 --description="8880" Creating firewall...|Created [https://www.googleapis.com/compute/v1/projects/onap-184300/global/firewalls/open8880]. Creating firewall...done. NAME NETWORK DIRECTION PRIORITY ALLOW DENY open8880 default INGRESS 1000 tcp:8880
Better to edit the existing internal firewall rule to the CIDR 0.0.0.0/0
Continue with ONAP on Kubernetes
ONAP on Kubernetes#QuickstartInstallation
Kubernetes
Kubernetes API
follow https://kubernetes.io/docs/reference/kubectl/jsonpath/
Take the ~/.kube/config server and token and retrofit a rest call like the curl below
curl -k -H "Authorization: Bearer $TOKEN" -H 'Accept: application/json' $K8S-server-and-6443-port/api/v1/pods | jq -r .items[0].metadata.name heapster-7b48b696fc-67qv6
Kubernetes v11 Curl examples
for validating raw kubernetes api calls (take the .kube/config server and token and create a curl call with optional json parsing) - like below
ubuntu@ip-172-31-30-96:~$ curl -k -H "Authorization: Bearer QmFzaWMgUVV........YW5SdGFrNHhNdz09" -H 'Accept: application/json' https://o...fo:8880/r/projects/1a7/kubernetes:6443/api/v1/pods | jq -r .items[0].spec.containers[0] { "name": "heapster", "image": "docker.io/rancher/heapster-amd64:v1.5.2", "command": [ "/heapster", "--source=kubernetes:https://$KUBERNETES_SERVICE_HOST:443?inClusterConfig=true&useServiceAccount=true", "--sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086?retention=0s", "--v=2" ], "resources": {}, "volumeMounts": [ { "name": "io-rancher-system-token-wf6d4", "readOnly": true, "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount" } ], "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "imagePullPolicy": "IfNotPresent" }
Kubernetes Best Practices
Local nexus proxy
in progress - needs values.yaml global override
ubuntu@a-onap-devopscd:~$ docker run -d -p 5000:5000 --restart=unless-stopped --name registry -e REGISTRY_PROXY_REMOTEURL=https://nexus3.onap.org:10001 registry:2 Unable to find image 'registry:2' locally 2: Pulling from library/registry Status: Downloaded newer image for registry:2 bd216e444f133b30681dab8b144a212d84e1c231cc12353586b7010b3ae9d24b ubuntu@a-onap-devopscd:~$ sudo docker ps | grep registry bd216e444f13 registry:2 "/entrypoint.sh /e..." 2 minutes ago Up About a minute 0.0.0.0:5000->5000/tcp registry
Verify your Kubernetes cluster is functioning properly - Tiller is up
Check the dashboard
http://dev.onap.info:8880/r/projects/1a7/kubernetes-dashboard:9090/#!/pod?namespace=_all
check kubectl
check tiller container is in state Running - not just tiller-deploy
ubuntu@a-onap-devops:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-9lqt2 1/1 Running 0 20d kube-system kube-dns-75c8cb4ccb-tw992 3/3 Running 0 20d kube-system kubernetes-dashboard-6f4c8b9cd5-rcbp2 1/1 Running 0 20d kube-system monitoring-grafana-76f5b489d5-r99rh 1/1 Running 0 20d kube-system monitoring-influxdb-6fc88bd58d-h875w 1/1 Running 0 20d kube-system tiller-deploy-645bd55c5d-bmxs7 1/1 Running 0 20d onap logdemonode-logdemonode-5c8bffb468-phbzd 2/2 Running 0 20d onap onap-log-elasticsearch-7557486bc4-72vpw 1/1 Running 0 20d onap onap-log-kibana-fc88b6b79-d88r7 1/1 Running 0 20d onap onap-log-logstash-9jlf2 1/1 Running 0 20d onap onap-portal-app-8486dc7ff8-tssd2 2/2 Running 0 5d onap onap-portal-cassandra-8588fbd698-dksq5 1/1 Running 0 5d onap onap-portal-db-7d6b95cd94-66474 1/1 Running 0 5d onap onap-portal-sdk-77cd558c98-6rsvq 2/2 Running 0 5d onap onap-portal-widget-6469f4bc56-hms24 1/1 Running 0 5d onap onap-portal-zookeeper-5d8c598c4c-hck2d 1/1 Running 0 5d onap onap-robot-6f99cb989f-kpwdr 1/1 Running 0 20d ubuntu@a-onap-devops:~$ kubectl describe pod tiller-deploy-645bd55c5d-bmxs7 -n kube-system Name: tiller-deploy-645bd55c5d-bmxs7 Namespace: kube-system Node: a-onap-devops/172.17.0.1 Start Time: Mon, 30 Jul 2018 22:20:09 +0000 Labels: app=helm name=tiller pod-template-hash=2016811718 Annotations: <none> Status: Running IP: 10.42.0.5 Controlled By: ReplicaSet/tiller-deploy-645bd55c5d Containers: tiller: Container ID: docker://a26420061a01a5791401c2519974c3190bf9f53fce5a9157abe7890f1f08146a Image: gcr.io/kubernetes-helm/tiller:v2.8.2 Image ID: docker-pullable://gcr.io/kubernetes-helm/tiller@sha256:9b373c71ea2dfdb7d42a6c6dada769cf93be682df7cfabb717748bdaef27d10a Port: 44134/TCP Command: /tiller --v=2 State: Running Started: Mon, 30 Jul 2018 22:20:14 +0000 Ready: True
LOGs
Helm Deploy plugin logs
Need these to triage helm deploys that do not show up in a helm list - as in there were errors before marking the deployment as failed
also use --verbose
ubuntu@a-ld0:~$ sudo ls ~/.helm/plugins/deploy/cache/onap/logs/onap- onap-aaf.log onap-cli.log onap-dmaap.log onap-multicloud.log onap-portal.log onap-sniro-emulator.log onap-vid.log onap-aai.log onap-consul.log onap-esr.log onap-oof.log onap-robot.log onap-so.log onap-vnfsdk.log onap-appc.log onap-contrib.log onap-log.log onap-policy.log onap-sdc.log onap-uui.log onap-vvp.log onap-clamp.log onap-dcaegen2.log onap-msb.log onap-pomba.log onap-sdnc.log onap-vfc.log
Monitoring
Grafana Dashboards
There is a built in grafana dashboard (thanks Mandeep Khinda and James MacNider) that once enabled can show more detail about the cluster you are running - you need to expose the nodeport and target the VM the pod is on.
The CD system one is running below http://master3.onap.info:32628/dashboard/db/cluster?orgId=1&from=now-12h&to=now
# expose the nodeport kubectl expose -n kube-system deployment monitoring-grafana --type=LoadBalancer --name monitoring-grafana-client service "monitoring-grafana-client" exposed # get the nodeport pod is running on kubectl get services --all-namespaces -o wide | grep graf kube-system monitoring-grafana ClusterIP 10.43.44.197 <none> 80/TCP 7d k8s-app=grafana kube-system monitoring-grafana-client LoadBalancer 10.43.251.214 18.222.4.161 3000:32628/TCP 15s k8s-app=grafana,task=monitoring # get the cluster vm DNS name ubuntu@ip-10-0-0-169:~$ kubectl get pods --all-namespaces -o wide | grep graf kube-system monitoring-grafana-997796fcf-7kkl4 1/1 Running 0 5d 10.42.84.138 ip-10-0-0-80.us-east-2.compute.internal
see also
- MSB-209Getting issue details... STATUS
Kubernetes DevOps
ONAP Development#KubernetesDevOps
Additional Tools
https://github.com/jonmosco/kube-ps1
https://github.com/ahmetb/kubectx
https://medium.com/@thisiskj/quickly-change-clusters-and-namespaces-in-kubernetes-6a5adca05615
https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
brew install kube-ps1 brew install kubectx
Openstack
Windriver Intel Lab
see - OOM-714Getting issue details... STATUS
Windriver/Openstack Lab Network Topology
Openlab VNC and CLI
The following is missing some sections and a bit out of date (v2 deprecated in favor of v3)
Get an openlab account - Integration / Developer Lab Access | |
Install openVPN - Using Lab POD-ONAP-01 Environment For OSX both Viscosity and TunnelBlick work fine | |
Login to Openstack | |
Install openstack command line tools | Tutorial: Configuring and Starting Up the Base ONAP Stack#InstallPythonvirtualenvTools(optional,butrecommended) |
get your v3 rc file | |
verify your openstack cli access (or just use the jumpbox) | obrienbiometrics:aws michaelobrien$ source logging-openrc.sh obrienbiometrics:aws michaelobrien$ openstack server list +--------------------------------------+---------+--------+-------------------------------+------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+---------+--------+-------------------------------+------------+ | 1ed28213-62dd-4ef6-bdde-6307e0b42c8c | jenkins | ACTIVE | admin-private-mgmt=10.10.2.34 | | +--------------------------------------+---------+--------+-------------------------------+------------+ |
get some elastic IP's | You may need to release unused IPs from other tenants - as we have 4 pools of 50 |
fill in your stack env parameters | to fill in your config (mso) settings in values.yaml follow https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html section "To generate openStackEncryptedPasswordHere" example ubuntu@ip-172-31-54-73:~/_dev/log-137-57171/oom/kubernetes/so/resources/config/mso$ cat encryption.key aa3871669d893c7fb8abbcda31b88b4f ubuntu@ip-172-31-54-73:~/_dev/log-137-57171/oom/kubernetes/so/resources/config/mso$ echo -n "55" | openssl aes-128-ecb -e -K aa3871669d893c7fb8abbcda31b88b4f -nosalt | xxd -c 256 -p a355b08d52c73762ad9915d98736b23b |
Run the HEAT stack to create the kubernetes undercloud VMs | [michaelobrien@obrienbiometrics onap_log-324_heat(keystone_michael_o_brien)]$ openstack stack list +--------------------------------------+--------------------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+--------------------------+-----------------+----------------------+----------------------+ | d6371a95-dc3d-4103-978e-bab1f378573a | OOM-obrien-20181223-13-0 | CREATE_COMPLETE | 2018-12-23T14:55:10Z | 2018-12-23T14:55:10Z | | 7f821906-2216-4a6e-8ef0-d46a97adf3fc | obrien-nexus3 | CREATE_COMPLETE | 2018-12-20T02:41:38Z | 2018-12-20T02:41:38Z | | 9c4d3ebb-b7c9-4428-9e44-7ef5fba08940 | OOM20181216 | CREATE_COMPLETE | 2018-12-16T18:28:21Z | 2018-12-16T18:28:21Z | | 52379aea-d0a9-48db-a13e-35ca00876768 | dcae | DELETE_FAILED | 2018-03-04T22:02:12Z | 2018-12-16T05:05:19Z | +--------------------------------------+--------------------------+-----------------+----------------------+----------------------+ [michaelobrien@obrienbiometrics onap_log-324_heat(keystone_michael_o_brien)]$ openstack stack create -t logging_openstack_13_16g.yaml -e logging_openstack_oom.env OOM-obrien-20181223-13-0 +---------------------+-----------------------------------------+ | Field | Value | +---------------------+-----------------------------------------+ | id | d6371a95-dc3d-4103-978e-bab1f378573a | | stack_name | OOM-obrien-20181223-13-0 | | description | Heat template to install OOM components | | creation_time | 2018-12-23T14:55:10Z | | updated_time | 2018-12-23T14:55:10Z | | stack_status | CREATE_IN_PROGRESS | | stack_status_reason | Stack CREATE started | +---------------------+-----------------------------------------+ |
ssh in | see clusters in Logging DevOps Infrastructure obrienbiometrics:onap_log-324_heat michaelobrien$ ssh ubuntu@10.12.6.151 ubuntu@onap-oom-obrien-rancher:~$ docker version Client: Version: 17.03.2-ce API version: 1.27 |
install Kubernetes stack (rancher, k8s, helm) | - LOG-325Getting issue details... STATUS sudo git clone https://gerrit.onap.org/r/logging-analytics cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . # 20190105 - master, casablanca and 3.0.0-ONAP are all at the same Rancher 1.6.25, Kubernetes 1.11.5, Helm 2.9.1 and docker 17.03 levels # ignore the docker warning - as the cloud init script in the heat template already installed docker and prepulled images sudo nohup ./oom_rancher_setup.sh -b master -s 10.0.16.1 -n onap & # wait 90 min kubectl get pods --all-namespaces kubectl get pods --all-namespaces | grep 0/ |
create the NFS share | Scripts from above 20181207 https://jira.onap.org/secure/attachment/12887/master_nfs_node.sh https://jira.onap.org/secure/attachment/12888/slave_nfs_node.sh #master ubuntu@onap-oom-obrien-rancher-0:~$ sudo ./master_nfs_node.sh 10.12.5.99 10.12.5.86 10.12.5.136 10.12.6.179 10.12.5.102 10.12.5.4 ubuntu@onap-oom-obrien-rancher-0:~$ sudo ls /dockerdata-nfs/ test.sh #slaves ubuntu@onap-oom-obrien-rancher-1:~$ sudo ./slave_nfs_node.sh 10.12.5.68 ubuntu@onap-oom-obrien-rancher-1:~$ sudo ls /dockerdata-nfs/ test.sh |
deploy onap | # note this will saturate your 64g vm unless you run a cluster or turn off parts of onap sudo vi oom/kubernetes/onap/values.yaml # rerun cd.sh # or # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w false -r false |
ONAP Usage
Accessing an external Node Port
Elasticsearch port example
# get pod names and the actual VM that any pod is on ubuntu@ip-10-0-0-169:~$ kubectl get pods --all-namespaces -o wide | grep log- onap onap-log-elasticsearch-756cfb559b-wk8c6 1/1 Running 0 2h 10.42.207.254 ip-10-0-0-227.us-east-2.compute.internal onap onap-log-kibana-6bb55fc66b-kxtg6 0/1 Running 16 1h 10.42.54.76 ip-10-0-0-111.us-east-2.compute.internal onap onap-log-logstash-689ccb995c-7zmcq 1/1 Running 0 2h 10.42.166.241 ip-10-0-0-111.us-east-2.compute.internal onap onap-vfc-catalog-5fbdfc7b6c-xc84b 2/2 Running 0 2h 10.42.206.141 ip-10-0-0-227.us-east-2.compute.internal # get nodeport ubuntu@ip-10-0-0-169:~$ kubectl get services --all-namespaces -o wide | grep log- onap log-es NodePort 10.43.82.53 <none> 9200:30254/TCP 2h app=log-elasticsearch,release=onap onap log-es-tcp ClusterIP 10.43.90.198 <none> 9300/TCP 2h app=log-elasticsearch,release=onap onap log-kibana NodePort 10.43.167.146 <none> 5601:30253/TCP 2h app=log-kibana,release=onap onap log-ls NodePort 10.43.250.182 <none> 5044:30255/TCP 2h app=log-logstash,release=onap onap log-ls-http ClusterIP 10.43.81.173 <none> 9600/TCP 2h app=log-logstash,release=onap # check nodeport outside container ubuntu@ip-10-0-0-169:~$ curl ip-10-0-0-111.us-east-2.compute.internal:30254 { "name" : "-pEf9q9", "cluster_name" : "onap-log", "cluster_uuid" : "ferqW-rdR_-Ys9EkWw82rw", "version" : { "number" : "5.5.0", "build_hash" : "260387d", "build_date" : "2017-06-30T23:16:05.735Z", "build_snapshot" : false, "lucene_version" : "6.6.0" }, "tagline" : "You Know, for Search" } # check inside docker container - for reference ubuntu@ip-10-0-0-169:~$ kubectl exec -it -n onap onap-log-elasticsearch-756cfb559b-wk8c6 bash [elasticsearch@onap-log-elasticsearch-756cfb559b-wk8c6 ~]$ curl http://127.0.0.1:9200 { "name" : "-pEf9q9",
ONAP Deployment Specification
Resiliency
Longest lived deployment so far
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-479mx 1/1 Running 7 59d kube-system kube-dns-75c8cb4ccb-sqxbr 3/3 Running 45 59d kube-system kubernetes-dashboard-6f4c8b9cd5-w5xr2 1/1 Running 8 59d kube-system monitoring-grafana-76f5b489d5-sj9tl 1/1 Running 6 59d kube-system monitoring-influxdb-6fc88bd58d-22vg2 1/1 Running 6 59d kube-system tiller-deploy-8b6c5d4fb-4rbb4 1/1 Running 7 19d
Performance
Cluster Performance
ONAP runs best on a large cluster. As of 20180508 there are 152 pods (above the 110 limit per VM). ONAP is also vCPU bound - therefore try to run with a minimum of 24 vCores, ideally 32 to 64.
Even though most replicaSets are set at 3 - try to have at least 4 nodes so we can survive a node failure and still be able to run all the pods. The memory profile is around 85g right now.
Security Profile
ONAP will require certain ports open by CIDR to several static domain names in order to deploy defined in a security group. At runtime the list is reduced.
Ideally these are all inside a private network.
It looks like we will need a standard public/private network locked down behind a combined ACL/SG for AWS VPC or a NSG for Azure where we only expose what we need outside the private network.
Still working on a list of ports but we should not need any of these exposed if we use a bastion/jumpbox + nat combo inside the network.
Known Security Vulnerabilities
https://medium.com/handy-tech/analysis-of-a-kubernetes-hack-backdooring-through-kubelet-823be5c3d67c
https://github.com/kubernetes/kubernetes/pull/59666 fixed in Kubernetes 1.10
ONAP Port Profile
ONAP on deployment will require the following incoming and outgoing ports. Note: within ONAP rest calls between components will be handled inside the Kubernetes namespace by the DNS server running as part of K8S.
port | protocol | incoming/outgoing | application | source | destination | Notes |
---|---|---|---|---|---|---|
22 | ssh | ssh | developer vm | host | ||
443 | tiller | client | host | |||
8880 | http | rancher | client | host | ||
9090 | http | kubernetes | host | |||
10001 | https | nexus3 | nexus3.onap.org | |||
10003 | https | nexus3 | nexus3.onap.org | |||
https | nexus | nexus.onap.org | ||||
https ssh | git | git.onap.org | ||||
30200-30399 | http/https | REST api | developer vm | host | ||
32628 | http | grafana | dashboard for the kubernetes cluster - must be enabled | |||
5005 | tcp | java debug port | developer vm | host | ||
Lockdown ports | ||||||
8080 | outgoing | |||||
10250-10255 | in/out | Lock these down via VPC or a source CIDR that equals only the server/client IP list https://medium.com/handy-tech/analysis-of-a-kubernetes-hack-backdooring-through-kubelet-823be5c3d67c |
Azure Security Group
AWS VPC + Security Group
OOM Deployment Specification - 20180507 Beijing/master
The generated host registration docker call is the same as the one generated by the wiki - minus server IP (currently single node cluster) | |
Cluster Stability
- OOM-1520Getting issue details... STATUS
Long Duration Clusters
Single Node Deployments
A 31 day Azure deployment eventually hits the 80% FS saturation barrier - fix: - LOG-853Getting issue details... STATUS
onap onap-vnfsdk-vnfsdk-postgres-0 1/1 Running 0 30d onap onap-vnfsdk-vnfsdk-postgres-1 1/1 Running 0 30d ubuntu@a-osn-cd:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 222891708 0 222891708 0% /dev tmpfs 44580468 4295720 40284748 10% /run /dev/sda1 129029904 125279332 3734188 98% /
TODO
https://docs.microsoft.com/en-us/windows/wsl/about
Links
https://kubernetes.io/docs/user-guide/kubectl-cheatsheet/
ONAP on Kubernetes#QuickstartInstallation
https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/
https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/
Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-SingleVM
ONAP on deployed by or RKE managed by on | ||||
VMs | Microsoft Azure | Google Compute | OpenStack | |
Managed | Amazon EKS | AKS | ||
Sponsor | Amazon (384G/m - 201801 to 201808) - thank you Michael O'Brien (201705-201905) Amdocs - 201903+ | Microsoft (201801+) Amdocs | michael 201905+ | Intel/Windriver (2017-) |
This is a private page under daily continuous modification to keep it relevant as a live reference (don't edit it unless something is really wrong) https://twitter.com/_mikeobrien | https://www.linkedin.com/in/michaelobrien-developer/ For general support consult the official documentation at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html and https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_cloud_setup_guide.html and raise DOC JIRA's for any modifications required to them. |
---|
This page details deployment of ONAP on any environment that supports Kubernetes based containers.
Chat: http://onap-integration.eastus.cloudapp.azure.com:3000/group/onap-integration
Separate namespaces - to avoid the 1MB configmap limit - or just helm install/delete everything (no helm upgrade)
https://kubernetes.slack.com/messages/C09NXKJKA/?
https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf
Deployment Profile
28 pods, 196 pods including vvp without the filebeat sidecars - 20181130 - this number is when all replicaSets and DaemonSets are set to 1 - which is 241 instances in the clustered case
Docker images currently size up to 75G as of 20181230
After a docker_prepull.sh |
---|
/dev/sda1 389255816 77322824 311916608 20% / |
Type | VMs | Total RAM vCores HD | VM Flavor | K8S/Rancher Idle RAM | Deployed | Deployed ONAP RAM | Pods | Containers | Max vCores | Idle vCores | HD/VM | HD NFS only | IOPS | Date | Cost | branch | Notes deployment post 75min |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Full Cluster (14 + 1) - recommended | 15 | 224G 112 vC 100G/VM | 16G, 8 vCores C5.2xLarge | 187Gb | 102Gb | 28 | 248 total 241 onap 217 up 0 error 24 config | 18 | 6+G master | 8.1G | 20181106 | $1.20 US/hour using the spot market | C | ||||
Single VM (possible - not recommended) | 1 | 432G 64 vC 180G | 256G+ 32+ vCores | Rancher: 13G Kubernetes: 8G Top: 10G | 165Gb (after 24h) | 141Gb | 28 | 240 total 196 if RS and DS are set to 1 | 55 | 22 | 131G (including 75G dockers) | n/a | Max: 550/sec Idle: 220/sec | 20181105 20180101 | C | Tested on 432G/64vCore azure VM - R 1.6.22 K8S 1.11 updated 20190101 | |
Developer 1-n pods | 1 | 16G | 16/32G 4-16 vCores | 14Gb | 10Gb | 3+ | 120+G | n/a | C | AAI+robot only |
Security
The VM should be open with no CIDR rules - but lock down 10249-10255 with RBAC
If you get an issue connecting to your rancher server "dial tcp 127.0.0.1:8880: getsockopt: connection refused" - this is usually security related - this line is the first to fail for example
https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh#n117
check the server first - either of these - but if the helm version hangs on "server" - the ports have an issue - run with all tcp/udp ports open 0.0.0.0/0 and ::/0 - and lock down the API on 10249-10255 via oauth github security from the rancher console to keep out crypto miners.
Example 15 node (1 master + 14 nodes) OOM Deployment
Rancher 1.6.25, Kubernetes 1.11.5, Docker 17.03, Helm 2.9.1
empty
With ONAP deployed
Throughput and Volumetrics
Cloudwatch CPU Average
Specific to logging - we have a problem on any VM that contains AAI - the logstash container is being saturated there - see the 30+ percent VM - - LOG-376Getting issue details... STATUS
NFS Throughput for /dockerdata-nfs
Cloudwatch Network In Max
Cost
Using the spot market on AWS - we ran a bill of $10 for 8 hours of 15 VM's of C5.2xLarge - (includes EBS but not DNS, EFS/NFS)
Details: 20181106:1800 EDT master
ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | wc -l 248 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | wc -l 241 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 217 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '0/|1/2' | wc -l 24 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces -o wide | grep onap | grep -E '0/|1/2' onap onap-aaf-aaf-sms-preload-lvqx9 0/1 Completed 0 4h 10.42.75.71 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-distcenter-ql5f8 0/1 Completed 0 4h 10.42.75.223 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-testca-7rzcd 0/1 Completed 0 4h 10.42.18.37 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-aai-aai-graphadmin-create-db-schema-26pfs 0/1 Completed 0 4h 10.42.14.14 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aai-aai-traversal-update-query-data-qlk7w 0/1 Completed 0 4h 10.42.88.122 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-gmmvj 0/1 Completed 0 4h 10.42.111.99 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-n6fw4 0/1 Error 0 4h 10.42.21.12 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-nc8ww 0/1 Error 0 4h 10.42.109.156 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-xcxds 0/1 Error 0 4h 10.42.152.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-node-6496d8f55b-jfvrm 0/1 Init:0/1 28 4h 10.42.95.32 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-prov-86f79c47f9-tldsp 0/1 CrashLoopBackOff 59 4h 10.42.76.248 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-oof-music-cassandra-job-config-7mb5f 0/1 Completed 0 4h 10.42.38.249 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-oof-oof-has-healthcheck-rpst7 0/1 Completed 0 4h 10.42.241.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-oof-oof-has-onboard-5bd2l 0/1 Completed 0 4h 10.42.205.75 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-qshzn 0/2 Completed 0 4h 10.42.112.46 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-rk4m2 0/2 Init:Error 0 4h 10.42.57.79 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-2vw2q 0/1 Completed 0 4h 10.42.87.181 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-k57lh 0/1 Init:Error 0 4h 10.42.148.79 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-sdc-sdc-cs-config-cassandra-vgnz2 0/1 Completed 0 4h 10.42.111.187 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-sdc-sdc-es-config-elasticsearch-lkb9m 0/1 Completed 0 4h 10.42.20.202 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-onboarding-be-cassandra-init-7zv5j 0/1 Completed 0 4h 10.42.218.1 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-sdc-sdc-wfd-be-workflow-init-q8t7z 0/1 Completed 0 4h 10.42.255.91 ip-172-31-41-30.us-east-2.compute.internal <none> onap onap-vid-vid-galera-config-4f274 0/1 Completed 0 4h 10.42.80.200 ip-172-31-33-223.us-east-2.compute.internal <none> onap onap-vnfsdk-vnfsdk-init-postgres-lf659 0/1 Completed 0 4h 10.42.238.204 ip-172-31-38-194.us-east-2.compute.internal <none> ubuntu@ip-172-31-40-250:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-223.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.222.148.116 18.222.148.116 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-111.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 3.16.37.170 3.16.37.170 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-207.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.225.32.201 18.225.32.201 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-36-163.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.58.189.251 13.58.189.251 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-24.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.224.180.26 18.224.180.26 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-59.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.191.248.14 18.191.248.14 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-194.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.217.45.91 18.217.45.91 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-95.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 52.15.39.21 52.15.39.21 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-39-138.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.224.199.40 18.224.199.40 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-110.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.223.151.180 18.223.151.180 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-229.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.218.252.13 18.218.252.13 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-30.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 3.16.113.3 3.16.113.3 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-42-33.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.59.2.86 13.59.2.86 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-45-152.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.219.56.50 18.219.56.50 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ubuntu@ip-172-31-40-250:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-33-223.us-east-2.compute.internal 852m 10% 13923Mi 90% ip-172-31-34-111.us-east-2.compute.internal 1160m 14% 11643Mi 75% ip-172-31-34-207.us-east-2.compute.internal 1101m 13% 7981Mi 51% ip-172-31-36-163.us-east-2.compute.internal 656m 8% 13377Mi 87% ip-172-31-37-24.us-east-2.compute.internal 401m 5% 8543Mi 55% ip-172-31-37-59.us-east-2.compute.internal 711m 8% 10873Mi 70% ip-172-31-38-194.us-east-2.compute.internal 1136m 14% 8195Mi 53% ip-172-31-38-95.us-east-2.compute.internal 1195m 14% 9127Mi 59% ip-172-31-39-138.us-east-2.compute.internal 296m 3% 10870Mi 70% ip-172-31-41-110.us-east-2.compute.internal 2586m 32% 10950Mi 71% ip-172-31-41-229.us-east-2.compute.internal 159m 1% 9138Mi 59% ip-172-31-41-30.us-east-2.compute.internal 180m 2% 9862Mi 64% ip-172-31-42-33.us-east-2.compute.internal 1573m 19% 6352Mi 41% ip-172-31-45-152.us-east-2.compute.internal 1579m 19% 10633Mi 69%
Quickstart
Undercloud Install - Rancher/Kubernetes/Helm/Docker
Ubuntu 16.04 Host VM Configuration
key | value |
---|---|
Redhat 7.6 Host VM Configuration
see https://gerrit.onap.org/r/#/c/77850/
key | value |
---|---|
firewalld off | systemctl disable firewalld |
git, make, python | yum install git yum groupinstall 'Development Tools' |
IPv4 forwarding | add to /etc/sysctl.conf net.ipv4.ip_forward = 1 |
Networking enabled | sudo vi /etc/sysconfig/network-scripts/ifcfg-ens33 with ONBOOT=yes" |
General Host VM Configuration
Follow https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh
Run the following script on a clean Ubuntu 16.04 or Redhat RHEL 7.x (7.6) VM anywhere - it will provision and register your kubernetes system as a collocated master/host.
Ideally you install a clustered set of hosts away from the master VM - you can do this by deleting the host from the cluster after it is installed below and run the (docker, nfs and the rancher agent docker on each host)/
vm.max_map_count 64 to 256kb limit
The cd.sh script will fix your VM for this limitation first found in - LOG-334Getting issue details... STATUS . If you don't run the cd.sh script - run the following command manually on each VM so that any elasticsearch container comes up properly - this is a base OS issue.
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n49
# fix virtual memory for onap-log:elasticsearch under Rancher 1.6.11 - OOM-431 sudo sysctl -w vm.max_map_count=262144
Scripted RKE Kubernetes Cluster install
Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - Single VM
Prerequisites
Create a single VM - 256G+
See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS
Create a 0.0.0.0/0 ::/O open security group
Use github to OAUTH authenticate your cluster just after installing it.
Last test 20190305 using 3.0.1-ONAP
ONAP Development#Changemax-podsfromdefault110podlimit
# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # to be save edit/make sure dns resolution is setup to the host ubuntu@ld:~$ sudo cat /etc/hosts 127.0.0.1 cd.onap.info # 1 - configure combined master/host VM - 26 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # to deploy more than 110 pods per vm before the environment (1a7) is created from the kubernetes template (1pt2) - at the waiting 3 min mark - edit it via https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Changemax-podsfromdefault110podlimit --max-pods=900 https://lists.onap.org/g/onap-discuss/topic/oom_110_kubernetes_pod/25213556?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,25213556 in "additional kubelet flags" --max-pods=500 # on a 244G R4.8xlarge vm - 26 min later k8s cluster is up NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-5pq45 1/1 Running 0 10m kube-system kube-dns-75c8cb4ccb-7dlsh 3/3 Running 0 10m kube-system kubernetes-dashboard-6f4c8b9cd5-v625c 1/1 Running 0 10m kube-system monitoring-grafana-76f5b489d5-zhrjc 1/1 Running 0 10m kube-system monitoring-influxdb-6fc88bd58d-9494h 1/1 Running 0 10m kube-system tiller-deploy-8b6c5d4fb-52zmt 1/1 Running 0 2m # 3 - secure via github oauth the master - immediately to lock out crypto miners http://cd.onap.info:8880 # check the master cluster ubuntu@ip-172-31-14-89:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-8-245.us-east-2.compute.internal 179m 2% 2494Mi 4% ubuntu@ip-172-31-14-89:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-8-245.us-east-2.compute.internal Ready <none> 13d v1.10.3-rancher1 172.17.0.1 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 # 7 - after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script - a clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp dev.yaml dev0.yaml sudo vi dev0.yaml sudo cp dev0.yaml dev1.yaml sudo cp logging-analytics/deploy/cd.sh . # this does a prepull (-p), clones 3.0.0-ONAP, managed install -f true sudo ./cd.sh -b 3.0.0-ONAP -e onap -p true -n nexus3.onap.org:10001 -f true -s 300 -c true -d true -w false -r false # check around 55 min (on a 256G single node - with 32 vCores) pods/failed/up @ min and ram 161/13/153 @ 50m 107g @55 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 152 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-deployment-handler-5789b89d4b-s6fzw 1/2 Running 0 8m onap dep-service-change-handler-76dcd99f84-fchxd 0/1 ContainerCreating 0 3m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 53m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 9 53m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 53m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 7 53m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 9 53m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 9 53m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 7 52m Note: DCAE has 2 sets of orchestration after the initial k8s orchestration - another at 57 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-prh-6b5c6ff445-pr547 0/2 ContainerCreating 0 2m onap dep-dcae-tca-analytics-7dbd46d5b5-bgrn9 0/2 ContainerCreating 0 1m onap dep-dcae-ves-collector-59d4ff58f7-94rpq 0/2 ContainerCreating 0 1m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 57m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 57m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 57m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 57m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 57m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 Error 10 57m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 57m at 1 hour ubuntu@ip-172-31-20-218:~$ free total used free shared buff/cache available Mem: 251754696 111586672 45000724 193628 95167300 137158588 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | wc -l 164 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 155 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l 8 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-ves-collector-59d4ff58f7-94rpq 1/2 Running 0 4m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 59m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 59m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 59m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 59m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 59m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 10 59m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 59m ubuntu@ip-172-31-20-218:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 125869392 0 125869392 0% /dev tmpfs 25175472 54680 25120792 1% /run /dev/xvda1 121914320 91698036 30199900 76% / tmpfs 125877348 30312 125847036 1% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 125877348 0 125877348 0% /sys/fs/cgroup tmpfs 25175472 0 25175472 0% /run/user/1000 todo: verify the release is there after a helm install - as the configMap size issue is breaking the release for now
Prerequisites
Create a single VM - 256G+
20181015
ubuntu@a-onap-dmz-nodelete:~$ ./oom_deployment.sh -b master -s att.onap.cloud -e onap -r a_ONAP_CD_master -t _arm_deploy_onap_cd.json -p _arm_deploy_onap_cd_z_parameters.json # register the IP to DNS with route53 for att.onap.info - using this for the ONAP academic summit on the 22nd 13.68.113.104 = att.onap.cloud
Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - clustered
Prerequisites
Add an NFS (EFS on AWS) share
Create a 1 + N cluster
See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS
Create a 0.0.0.0/0 ::/O open security group
Use github to OAUTH authenticate your cluster just after installing it.
Last tested on ld.onap.info 20181029
# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # 1 - configure master - 15 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # on a 64G R4.2xlarge vm - 23 min later k8s cluster is up kubectl get pods --all-namespaces kube-system heapster-76b8cd7b5-g7p6n 1/1 Running 0 8m kube-system kube-dns-5d7b4487c9-jjgvg 3/3 Running 0 8m kube-system kubernetes-dashboard-f9577fffd-qldrw 1/1 Running 0 8m kube-system monitoring-grafana-997796fcf-g6tr7 1/1 Running 0 8m kube-system monitoring-influxdb-56fdcd96b-x2kvd 1/1 Running 0 8m kube-system tiller-deploy-54bcc55dd5-756gn 1/1 Running 0 2m # 2 - secure via github oauth the master - immediately to lock out crypto miners http://ld.onap.info:8880 # 3 - delete the master from the hosts in rancher http://ld.onap.info:8880 # 4 - create NFS share on master https://us-east-2.console.aws.amazon.com/efs/home?region=us-east-2#/filesystems/fs-92xxxxx # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # 5 - create NFS share and register each node - do this for all nodes sudo git clone https://gerrit.onap.org/r/logging-analytics # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # it takes about 1 min to run the script and 1 minute for the etcd and healthcheck containers to go green on each host # check the master cluster kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-19-9.us-east-2.compute.internal 9036m 56% 53266Mi 43% ip-172-31-21-129.us-east-2.compute.internal 6840m 42% 47654Mi 38% ip-172-31-18-85.us-east-2.compute.internal 6334m 39% 49545Mi 40% ip-172-31-26-114.us-east-2.compute.internal 3605m 22% 25816Mi 21% # fix helm on the master after adding nodes to the master - only if the server helm version is less than the client helm version (rancher 1.6.18 does not have this issue) ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"} ubuntu@ip-172-31-14-89:~$ sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} # 7a - manual: follow the helm plugin page # https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins sudo git clone https://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm cd oom/kubernetes sudo helm serve & sudo make all sudo make onap sudo helm deploy onap local/onap --namespace onap fetching local/onap release "onap" deployed release "onap-aaf" deployed release "onap-aai" deployed release "onap-appc" deployed release "onap-clamp" deployed release "onap-cli" deployed release "onap-consul" deployed release "onap-contrib" deployed release "onap-dcaegen2" deployed release "onap-dmaap" deployed release "onap-esr" deployed release "onap-log" deployed release "onap-msb" deployed release "onap-multicloud" deployed release "onap-nbi" deployed release "onap-oof" deployed release "onap-policy" deployed release "onap-pomba" deployed release "onap-portal" deployed release "onap-robot" deployed release "onap-sdc" deployed release "onap-sdnc" deployed release "onap-sniro-emulator" deployed release "onap-so" deployed release "onap-uui" deployed release "onap-vfc" deployed release "onap-vid" deployed release "onap-vnfsdk" deployed # 7b - automated: after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script # clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w true # rerun install - no delete of oom repo sudo ./cd.sh -b master -e onap -c false -d true -w true
Deployment Integrity based on Pod Dependencies
20181213 running 3.0.0-ONAP
Links
- LOG-899Getting issue details... STATUS
- LOG-898Getting issue details... STATUS
- OOM-1547Getting issue details... STATUS
- OOM-1543Getting issue details... STATUS
Patches
Windriver openstack heat template 1+13 vms
https://gerrit.onap.org/r/#/c/74781/
docker prepull script – run before cd.sh - https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh
https://gerrit.onap.org/r/#/c/74780/
Not merged with the heat template until the following nexus3 slowdown is addressed
https://jira.onap.org/browse/TSC-79
Base Platform First
Bring up dmaap and aaf first and the rest of the pods in the following order.
Every 2.0s: helm list Fri Dec 14 15:19:49 2018 NAME REVISION UPDATED STATUS CHART NAMESPACE onap 2 Fri Dec 14 15:10:56 2018 DEPLOYED onap-3.0.0 onap onap-aaf 1 Fri Dec 14 15:10:57 2018 DEPLOYED aaf-3.0.0 onap onap-dmaap 2 Fri Dec 14 15:11:00 2018 DEPLOYED dmaap-3.0.0 onap onap onap-aaf-aaf-cm-5c65c9dc55-snhlj 1/1 Running 0 10m onap onap-aaf-aaf-cs-7dff4b9c44-85zg2 1/1 Running 0 10m onap onap-aaf-aaf-fs-ff6779b94-gz682 1/1 Running 0 10m onap onap-aaf-aaf-gui-76cfcc8b74-wn8b8 1/1 Running 0 10m onap onap-aaf-aaf-hello-5d45dd698c-xhc2v 1/1 Running 0 10m onap onap-aaf-aaf-locate-8587d8f4-l4k7v 1/1 Running 0 10m onap onap-aaf-aaf-oauth-d759586f6-bmz2l 1/1 Running 0 10m onap onap-aaf-aaf-service-546f66b756-cjppd 1/1 Running 0 10m onap onap-aaf-aaf-sms-7497c9bfcc-j892g 1/1 Running 0 10m onap onap-aaf-aaf-sms-preload-vhbbd 0/1 Completed 0 10m onap onap-aaf-aaf-sms-quorumclient-0 1/1 Running 0 10m onap onap-aaf-aaf-sms-quorumclient-1 1/1 Running 0 8m onap onap-aaf-aaf-sms-quorumclient-2 1/1 Running 0 6m onap onap-aaf-aaf-sms-vault-0 2/2 Running 1 10m onap onap-aaf-aaf-sshsm-distcenter-27ql7 0/1 Completed 0 10m onap onap-aaf-aaf-sshsm-testca-mw95p 0/1 Completed 0 10m onap onap-dmaap-dbc-pg-0 1/1 Running 0 17m onap onap-dmaap-dbc-pg-1 1/1 Running 0 15m onap onap-dmaap-dbc-pgpool-c5f8498-fn9cn 1/1 Running 0 17m onap onap-dmaap-dbc-pgpool-c5f8498-t9s27 1/1 Running 0 17m onap onap-dmaap-dmaap-bus-controller-59c96d6b8f-9xsxg 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-db-557c66dc9d-gvb9f 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-node-6496d8f55b-ffgfr 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-prov-86f79c47f9-zb8p7 1/1 Running 0 17m onap onap-dmaap-message-router-5fb78875f4-lvsg6 1/1 Running 0 17m onap onap-dmaap-message-router-kafka-7964db7c49-n8prg 1/1 Running 0 17m onap onap-dmaap-message-router-zookeeper-5cdfb67f4c-5w4vw 1/1 Running 0 17m onap-msb 2 Fri Dec 14 15:31:12 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 4m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 4m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 4m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 4m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 4m onap-aai 1 Fri Dec 14 15:30:59 2018 DEPLOYED aai-3.0.0 onap onap onap-aai-aai-54b7bf7779-bfbmg 1/1 Running 0 2m onap onap-aai-aai-babel-6bbbcf5d5c-sp676 2/2 Running 0 13m onap onap-aai-aai-cassandra-0 1/1 Running 0 13m onap onap-aai-aai-cassandra-1 1/1 Running 0 12m onap onap-aai-aai-cassandra-2 1/1 Running 0 9m onap onap-aai-aai-champ-54f7986b6b-wql2b 2/2 Running 0 13m onap onap-aai-aai-data-router-f5f75c9bd-l6ww7 2/2 Running 0 13m onap onap-aai-aai-elasticsearch-c9bf9dbf6-fnj8r 1/1 Running 0 13m onap onap-aai-aai-gizmo-5f8bf54f6f-chg85 2/2 Running 0 13m onap onap-aai-aai-graphadmin-9b956d4c-k9fhk 2/2 Running 0 13m onap onap-aai-aai-graphadmin-create-db-schema-s2nnw 0/1 Completed 0 13m onap onap-aai-aai-modelloader-644b46df55-vt4gk 2/2 Running 0 13m onap onap-aai-aai-resources-745b6b4f5b-rj7lm 2/2 Running 0 13m onap onap-aai-aai-search-data-559b8dbc7f-l6cqq 2/2 Running 0 13m onap onap-aai-aai-sparky-be-75658695f5-z2xv4 2/2 Running 0 13m onap onap-aai-aai-spike-6778948986-7h7br 2/2 Running 0 13m onap onap-aai-aai-traversal-58b97f689f-jlblx 2/2 Running 0 13m onap onap-aai-aai-traversal-update-query-data-7sqt5 0/1 Completed 0 13m onap-msb 5 Fri Dec 14 15:51:42 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 18m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 18m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 18m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 18m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 18m onap-esr 3 Fri Dec 14 15:51:40 2018 DEPLOYED esr-3.0.0 onap onap onap-esr-esr-gui-6c5ccd59d6-6brcx 1/1 Running 0 2m onap onap-esr-esr-server-5f967d4767-ctwp6 2/2 Running 0 2m onap-robot 2 Fri Dec 14 15:51:48 2018 DEPLOYED robot-3.0.0 onap onap onap-robot-robot-ddd948476-n9szh 1/1 Running 0 11m onap-multicloud 1 Fri Dec 14 15:51:43 2018 DEPLOYED multicloud-3.0.0 onap
Tiller requires wait states between deployments
There is a patch going into 3.0.1 to delay deployments to not overload tiller 3+ seconds
sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm sudo vi ~/.helm/plugins/deploy/deploy.sh
Use public-cloud.yaml override
Note: your HD/SSD, ram and cpu configuration will drastically affect deployment. For example if you are cpu starved - the idle state of onap will delay pods as more come in - additionally network bandwidth to pull docker containers will be significant - and PV creation is sensitive to FS throughput/lag.
Some of the internal pod timings are optimized for certain azure deployment
https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/public-cloud.yaml
Optimizing Docker Image Pulls
Verify if the integration docker csv manifest is the truth or the oom repo values.yaml (no override required?)
- TSC-86Getting issue details... STATUS
Nexus Proxy
Soleil, Alain (Deactivated) pointed out the proxy page (was using commercial nexus3) - ONAP OOM Beijing - Hosting docker images locally - I had about 4 jiras on this and forgot about them.
20190121:
Answered John Lotoski for EKS and his other post on nexus3 proxy failures - looks like an issue with a double proxy between dockerhub - or an issue specific to the dockerhub/registry:2 container - https://lists.onap.org/g/onap-discuss/topic/registry_issue_few_images/29285134?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29285134 |
Running
- LOG-355Getting issue details... STATUS
nexus3.onap.info:5000 - my private AWS nexus3 proxy of nexus3.onap.org:10001
nexus3.onap.cloud:5000 - azure public proxy - filled with casablanca (will retire after Jan 2)
nexus4.onap.cloud:5000 - azure public proxy - filled with master - and later casablanca
nexus3windriver.onap.cloud:5000 - windriver/openstack lab inside the firewall to use only for the lab - access to public is throttled
Nexus3 proxy setup - host
# from a clean ubuntu 16.04 VM # install docker sudo curl https://releases.rancher.com/install-docker/17.03.sh | sh sudo usermod -aG docker ubuntu # install nexus mkdir -p certs openssl req -newkey rsa:4096 -nodes -sha256 -keyout certs/domain.key -x509 -days 365 -out certs/domain.crt Common Name (e.g. server FQDN or YOUR name) []:nexus3.onap.info sudo nano /etc/hosts sudo docker run -d --restart=unless-stopped --name registry -v `pwd`/certs:/certs -e REGISTRY_HTTP_ADDR=0.0.0.0:5000 -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key -e REGISTRY_PROXY_REMOTEURL=https://nexus3.onap.org:10001 -p 5000:5000 registry:2 sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7f9b0e97eb7f registry:2 "/entrypoint.sh /e..." 8 seconds ago Up 7 seconds 0.0.0.0:5000->5000/tcp registry # test it sudo docker login -u docker -p docker nexus3.onap.info:5000 Login Succeeded # get images from https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=casablanca # use for example the first line onap/aaf/aaf_agent,2.1.8 # or the prepull script in https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 819d6de9e493: Downloading [======================================> ] 770.7 kB/1.012 MB # list sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry 2 2e2f252f3c88 3 months ago 33.3 MB # prepull to cache images on the server - in this case casablanca branch sudo wget https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo chmod 777 docker_prepull.sh # prep - same as client vms - the cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp certs/domain.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 # prepull sudo nohup ./docker_prepull.sh -b casablanca -s nexus3.onap.cloud:5000 &
Nexus3 proxy usage per cluster node
Cert is on - TSC-79Getting issue details... STATUS
# on each host # Cert is on TSC-79 sudo wget https://jira.onap.org/secure/attachment/13127/domain_nexus3_onap_cloud.crt # or if you already have it scp domain_nexus3_onap_cloud.crt ubuntu@ld3.onap.cloud:~/ # to avoid sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Error response from daemon: Get https://nexus3.onap.cloud:5000/v1/users/: x509: certificate signed by unknown authority # cp cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp domain_nexus3_onap_cloud.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Login Succeeded # testing # vm with the image existing - 2 sec ubuntu@ip-172-31-33-46:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # vm with layers existing except for last 5 - 5 sec ubuntu@a-cd-master:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Already exists .. 20 49e90af50c7d: Already exists .... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # clean AWS VM (clean install of docker) - no pulls yet - 45 sec for everything ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 0addb6fece63: Pulling fs layer 78e58219b215: Pulling fs layer eb6959a66df2: Pulling fs layer 321bd3fd2d0e: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 ubuntu@ip-172-31-14-34:~$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE nexus3.onap.cloud:5000/onap/aaf/aaf_agent 2.1.8 090b326a7f11 5 weeks ago 1.14 GB # going to test a same size image directly from the LF - with minimal common layers nexus3.onap.org:10001/onap/testsuite 1.3.2 c4b58baa95e8 3 weeks ago 1.13 GB # 5 min in we are still at 3% - numbers below are a min old ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.org:10001/onap/testsuite:1.3.2 1.3.2: Pulling from onap/testsuite 32802c0cfa4d: Downloading [=============> ] 8.416 MB/32.1 MB da1315cffa03: Download complete fa83472a3562: Download complete f85999a86bef: Download complete 3eca7452fe93: Downloading [=======================> ] 8.517 MB/17.79 MB 9f002f13a564: Downloading [=========================================> ] 8.528 MB/10.24 MB 02682cf43e5c: Waiting .... 754645df4601: Waiting # in 5 min we get 3% 35/1130Mb - which comes out to 162 min for 1.13G for .org as opposed to 45 sec for .info - which is a 200X slowdown - some of this is due to the fact my nexus3.onap.info is on the same VPC as my test VM - testing on openlab # openlab - 2 min 40 sec which is 3.6 times slower - expected than in AWS - (25 min pulls vs 90min in openlab) - this makes nexus.onap.org 60 times slower in openlab than a proxy running from AWS (2 vCore/16G/ssd VM) ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 #pulling smaller from nexus3.onap.info 2 min 20 - for 36Mb = 0.23Mb/sec - extrapolated to 1.13Gb for above is 5022 sec or 83 min - half the rough calculation above ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.org:10001/onap/aaf/sms:3.0.1 3.0.1: Pulling from onap/aaf/sms c67f3896b22c: Pull complete ... 76eeb922b789: Pull complete Digest: sha256:d5b64947edb93848acacaa9820234aa29e58217db9f878886b7bafae00fdb436 Status: Downloaded newer image for nexus3.onap.org:10001/onap/aaf/sms:3.0.1 # conclusion - nexus3.onap.org is experiencing a routing issue from their DC outbound causing a 80-100x slowdown over a proxy nexus3 - since 20181217 - as local jenkins.onap.org builds complete faster # workaround is to use a nexus3 proxy above
and adding to values.yaml
global: #repository: nexus3.onap.org:10001 repository: nexus3.onap.cloud:5000 repositoryCred: user: docker password: docker
windriver lab also has a network issue (for example if i pull from nexus3.onap.cloud:5000 (azure) into an aws EC2 instance - 45 sec for 1.1G - If I pull the same in an openlab VM - on the order of 10+ min) - therefore you need a local nexus3 proxy if you are inside the openstack lab - I have registered nexus3windriver.onap.cloud:5000 to a nexus3 proxy in my logging tenant - cert above
Docker Prepull
https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh
using
via
https://gerrit.onap.org/r/#/c/74780/
- LOG-905Getting issue details... STATUS
git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics cd logging-analytics git pull ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics refs/changes/80/74780/1 ubuntu@onap-oom-obrien-rancher-e0:~$ sudo nohup ./docker_prepull.sh & [1] 14488 ubuntu@onap-oom-obrien-rancher-e0:~$ nohup: ignoring input and appending output to 'nohup.out'
POD redeployment/undeploy/deploy
If you need to redeploy a pod due to a job timeout, failure or to pickup a config/code change - delete the /dockerdata-nfs/*-aai for example subdirectory - so that a db restart for example does not run into existing data issues.
sudo chmod -R 777 /dockerdata-nfs sudo rm -rf /dockerdata-nfs/onap-aai
Casablanca Deployment Examples
Deploy to 13+1 cluster
Deploy as one with deploy.sh delays and public.cloud.yaml - single 500G server AWS
sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml where dev.yaml is the same as in resources but with all components turned on and IfNotPresent instead of Always
Deploy in sequence with validation on previous pod before proceeding - single 500G server AWS
we are not using the public-cloud.yaml override here - to verify just timing between deploys in this case - each pod waits for the previous to complete so resources are not in contention
see update to
https://git.onap.org/logging-analytics/tree/deploy/cd.sh
https://gerrit.onap.org/r/#/c/75422
DEPLOY_ORDER_POD_NAME_ARRAY=('robot consul aaf dmaap dcaegen2 msb aai esr multicloud oof so sdc sdnc vid policy portal log vfc uui vnfsdk appc clamp cli pomba vvp contrib sniro-emulator') # don't count completed pods DEPLOY_NUMBER_PODS_DESIRED_ARRAY=(1 4 13 11 13 5 15 2 6 17 10 12 11 2 8 6 3 18 2 5 5 5 1 11 11 3 1) # account for podd that have varying deploy times or replicaset sizes # don't count the 0/1 completed pods - and skip most of the ResultSet instances except 1 # dcae boostrap is problematic DEPLOY_NUMBER_PODS_PARTIAL_ARRAY=(1 2 11 9 13 5 11 2 6 16 10 12 11 2 8 6 3 18 2 5 5 5 1 9 11 3 1)
Deployment in sequence to Windriver Lab
Note: the Windriver Openstack lab requires that host registration occurs against the private network 10.0.0.0/16 not the 10.12.0.0/16 public network - this is fine in Azure/AWS but not in openstack
The docs will be adjusted - OOM-1550Getting issue details... STATUS
This is bad - public IP based cluster
This is good - private IP based cluster
Openstack/Windriver HEAT template for 13+1 kubernetes cluster
https://jira.onap.org/secure/attachment/13010/logging_openstack_13_16g.yaml
- LOG-324Getting issue details... STATUS
see
https://gerrit.onap.org/r/74781
obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack stack create -t logging_openstack_13_16g.yaml -e logging_openstack_oom.env OOM20181216-13 +---------------------+-----------------------------------------+ | Field | Value | +---------------------+-----------------------------------------+ | id | ed6aa689-2e2a-4e75-8868-9db29607c3ba | | stack_name | OOM20181216-13 | | description | Heat template to install OOM components | | creation_time | 2018-12-16T19:42:27Z | | updated_time | 2018-12-16T19:42:27Z | | stack_status | CREATE_IN_PROGRESS | | stack_status_reason | Stack CREATE started | +---------------------+-----------------------------------------+ obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack server list +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | 7695cf14-513e-4fea-8b00-6c2a25df85d3 | onap-oom-obrien-rancher-e13 | ACTIVE | oam_onap_RNa3=10.0.0.23, 10.12.7.14 | ubuntu-16-04-cloud-amd64 | | 1b70f179-007c-4975-8e4a-314a57754684 | onap-oom-obrien-rancher-e7 | ACTIVE | oam_onap_RNa3=10.0.0.10, 10.12.7.36 | ubuntu-16-04-cloud-amd64 | | 17c77bd5-0a0a-45ec-a9c7-98022d0f62fe | onap-oom-obrien-rancher-e2 | ACTIVE | oam_onap_RNa3=10.0.0.9, 10.12.6.180 | ubuntu-16-04-cloud-amd64 | | f85e075f-e981-4bf8-af3f-e439b7b72ad2 | onap-oom-obrien-rancher-e9 | ACTIVE | oam_onap_RNa3=10.0.0.6, 10.12.5.136 | ubuntu-16-04-cloud-amd64 | | 58c404d0-8bae-4889-ab0f-6c74461c6b90 | onap-oom-obrien-rancher-e6 | ACTIVE | oam_onap_RNa3=10.0.0.19, 10.12.5.68 | ubuntu-16-04-cloud-amd64 | | b91ff9b4-01fe-4c34-ad66-6ffccc9572c1 | onap-oom-obrien-rancher-e4 | ACTIVE | oam_onap_RNa3=10.0.0.11, 10.12.7.35 | ubuntu-16-04-cloud-amd64 | | d9be8b3d-2ef2-4a00-9752-b935d6dd2dba | onap-oom-obrien-rancher-e0 | ACTIVE | oam_onap_RNa3=10.0.16.1, 10.12.7.13 | ubuntu-16-04-cloud-amd64 | | da0b1be6-ec2b-43e6-bb3f-1f0626dcc88b | onap-oom-obrien-rancher-e1 | ACTIVE | oam_onap_RNa3=10.0.0.16, 10.12.5.10 | ubuntu-16-04-cloud-amd64 | | 0ffec4d0-bd6f-40f9-ab2e-f71aa5b9fbda | onap-oom-obrien-rancher-e5 | ACTIVE | oam_onap_RNa3=10.0.0.7, 10.12.6.248 | ubuntu-16-04-cloud-amd64 | | 125620e0-2aa6-47cf-b422-d4cbb66a7876 | onap-oom-obrien-rancher-e8 | ACTIVE | oam_onap_RNa3=10.0.0.8, 10.12.6.249 | ubuntu-16-04-cloud-amd64 | | 1efe102a-d310-48d2-9190-c442eaec3f80 | onap-oom-obrien-rancher-e12 | ACTIVE | oam_onap_RNa3=10.0.0.5, 10.12.5.167 | ubuntu-16-04-cloud-amd64 | | 7c248d1d-193a-415f-868b-a94939a6e393 | onap-oom-obrien-rancher-e3 | ACTIVE | oam_onap_RNa3=10.0.0.3, 10.12.5.173 | ubuntu-16-04-cloud-amd64 | | 98dc0aa1-e42d-459c-8dde-1a9378aa644d | onap-oom-obrien-rancher-e11 | ACTIVE | oam_onap_RNa3=10.0.0.12, 10.12.6.179 | ubuntu-16-04-cloud-amd64 | | 6799037c-31b5-42bd-aebf-1ce7aa583673 | onap-oom-obrien-rancher-e10 | ACTIVE | oam_onap_RNa3=10.0.0.13, 10.12.6.167 | ubuntu-16-04-cloud-amd64 | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+
# 13+1 vms on openlab available as of 20181216 - running 2 separate clusters # 13+1 all 16g VMs # 4+1 all 32g VMs # master undercloud sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s 10.12.7.13 -e onap # master nfs sudo wget https://jira.onap.org/secure/attachment/12887/master_nfs_node.sh sudo chmod 777 master_nfs_node.sh sudo ./master_nfs_node.sh 10.12.5.10 10.12.6.180 10.12.5.173 10.12.7.35 10.12.6.248 10.12.5.68 10.12.7.36 10.12.6.249 10.12.5.136 10.12.6.167 10.12.6.179 10.12.5.167 10.12.7.14 #sudo ./master_nfs_node.sh 10.12.5.162 10.12.5.198 10.12.5.102 10.12.5.4 # slaves nfs sudo wget https://jira.onap.org/secure/attachment/12888/slave_nfs_node.sh sudo chmod 777 slave_nfs_node.sh sudo ./slave_nfs_node.sh 10.12.7.13 #sudo ./slave_nfs_node.sh 10.12.6.125 # test it ubuntu@onap-oom-obrien-rancher-e4:~$ sudo ls /dockerdata-nfs/ test.sh # remove client from master node ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e0 Ready <none> 5m v1.11.5-rancher1 ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-7b48b696fc-2z47t 1/1 Running 0 5m kube-system kube-dns-6655f78c68-gn2ds 3/3 Running 0 5m kube-system kubernetes-dashboard-6f54f7c4b-sfvjc 1/1 Running 0 5m kube-system monitoring-grafana-7877679464-872zv 1/1 Running 0 5m kube-system monitoring-influxdb-64664c6cf5-rs5ms 1/1 Running 0 5m kube-system tiller-deploy-6f4745cbcf-zmsrm 1/1 Running 0 5m # after master removal from hosts - expected no nodes ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes error: the server doesn't have a resource type "nodes" # slaves rancher client - 1st node # register on the private network not the public IP # notice the CATTLE_AGENT sudo docker run -e CATTLE_AGENT_IP="10.0.0.7" --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.11 http://10.0.16.1:8880/v1/scripts/5A5E4F6388A4C0A0F104:1514678400000:9zpsWeGOsKVmWtOtoixAUWjPJs ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 0s v1.11.5-rancher1 # add the other nodes # the 4 node 32g = 128g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 4m v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 3m v1.11.5-rancher1 # the 13 node 16g = 208g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% onap-oom-obrien-rancher-e1 208m 2% 2693Mi 16% onap-oom-obrien-rancher-e10 38m 0% 1083Mi 6% onap-oom-obrien-rancher-e11 36m 0% 1104Mi 6% onap-oom-obrien-rancher-e12 57m 0% 1070Mi 6% onap-oom-obrien-rancher-e13 116m 1% 1017Mi 6% onap-oom-obrien-rancher-e2 73m 0% 1361Mi 8% onap-oom-obrien-rancher-e3 62m 0% 1099Mi 6% onap-oom-obrien-rancher-e4 74m 0% 1370Mi 8% onap-oom-obrien-rancher-e5 37m 0% 1104Mi 6% onap-oom-obrien-rancher-e6 55m 0% 1125Mi 7% onap-oom-obrien-rancher-e7 42m 0% 1102Mi 6% onap-oom-obrien-rancher-e8 53m 0% 1090Mi 6% onap-oom-obrien-rancher-e9 52m 0% 1072Mi 6%
Installing ONAP via cd.sh
The cluster hosting kubernetes is up with 13+1 nodes and 2 network interfaces (the private 10.0.0.0/16 subnet and the 10.12.0.0/16 public subnet)
Verify kubernetes hosts are ready
ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 2h v1.11.5-rancher1 onap-oom-obrien-rancher-e10 Ready <none> 25m v1.11.5-rancher1 onap-oom-obrien-rancher-e11 Ready <none> 20m v1.11.5-rancher1 onap-oom-obrien-rancher-e12 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e13 Ready <none> 1m v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 2h v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e5 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e6 Ready <none> 46m v1.11.5-rancher1 onap-oom-obrien-rancher-e7 Ready <none> 40m v1.11.5-rancher1 onap-oom-obrien-rancher-e8 Ready <none> 37m v1.11.5-rancher1 onap-oom-obrien-rancher-e9 Ready <none> 26m v1.11.5-rancher1
Openstack parameter overrides
# manually check out 3.0.0-ONAP (script is written for branches like casablanca) sudo git clone -b 3.0.0-ONAP http://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm # fix tiller bug sudo nano ~/.helm/plugins/deploy/deploy.sh # modify dev.yaml with logging-rc file openstack parameters - appc, sdnc and sudo cp logging-analytics/deploy/cd.sh . sudo cp oom/kubernetes/onap/resources/environments/dev.yaml . sudo nano dev.yaml ubuntu@onap-oom-obrien-rancher-0:~/oom/kubernetes/so/resources/config/mso$ echo -n "Whq..jCLj" | openssl aes-128-ecb -e -K `cat encryption.key` -nosalt | xxd -c 256 -p bdaee....c60d3e09 # so server configuration config: openStackUserName: "michael_o_brien" openStackRegion: "RegionOne" openStackKeyStoneUrl: "http://10.12.25.2:5000" openStackServiceTenantName: "service" openStackEncryptedPasswordHere: "bdaee....c60d3e09"
Deploy all or a subset of ONAP
# copy dev.yaml to dev0.yaml # bring up all onap in sequence or adjust the list for a subset specific for the vFW - assumes you already cloned oom sudo nohup ./cd.sh -b 3.0.0-ONAP -e onap -p false -n nexus3.onap.org:10001 -f true -s 900 -c false -d true -w false -r false & #sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml
The load is distributed across the cluster even for individual pods like dmaap
Verify the ONAP installation
vFW vFirewall Workarounds
From Alexis Chiarello currently verifying 20190125 - these are for the heat environment - not the kubernetes one - following Casablanca Stability Testing Instructions currently |
---|
20181213 - thank you Alexis and Beejal Shah Something else I forgot to mention, I did change the heat templates to adapt for our Ubuntu images in our env (to enable additional NICs, eth2 / eth3) and also disable gateway by default on the 2 additional subnets created. See attached for the modified files. Cheers, Alexis. sudo chmod 777 master_nfs_node.sh I reran the vFWCL use case in my re-installed Casablanca lab and here is what I had to manually do post-install : - fix Robot "robot-eteshare-configmap" config map and adjust values that did not my match my env (onap_private_subnet_id, sec_group, dcae_collector_ip, Ubuntu image names, etc...). - make sure to push the policies from pap (PRELOAD_POLICIES=true then run config/push-policies.sh from /tmp/policy-install folder) (the following are for heat not kubernetes)
That's it, in my case, with the above the vFWCL closed loop works just fine and able to see APP-C processing the modifyConfig event and change the number of streams using netconf to the packet generator. Cheers, Alexis. |
Full Entrypoint Install
Two choices, run the single oom_deployment.sh via your ARM, CloudFormation, Heat template wrapper as a oneclick or use the 2 step procedure above.
entrypoint aws/azure/openstack | Ubuntu 16 rancher install | oom deployment CD script | |
---|---|---|---|
Remove a Deployment
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n57
see also - OOM-1463Getting issue details... STATUS
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n57
required for a couple pods that leave left over resources and for the secondary cloudify out-of-band orchestration in DCAEGEN2
- OOM-1089Getting issue details... STATUS
- DCAEGEN2-1067Getting issue details... STATUS
- DCAEGEN2-1068Getting issue details... STATUS
sudo helm undeploy $ENVIRON --purge kubectl delete namespace onap sudo helm delete --purge onap kubectl delete pv --all kubectl delete pvc --all kubectl delete secrets --all kubectl delete clusterrolebinding --all sudo rm -rf /dockerdata-nfs/onap-<pod> # or for a single pod kubectl delete pod $ENVIRON-aaf-sms-vault-0 -n $ENVIRON --grace-period=0 --force
Using ONAP
Accessing the portal
Access the ONAP portal via the 8989 LoadBalancer Mandeep Khinda merged in for - OOM-633Getting issue details... STATUS and documented at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_user_guide.html#accessing-the-onap-portal-using-oom-and-a-kubernetes-cluster
ubuntu@a-onap-devopscd:~$ kubectl -n onap get services|grep "portal-app" portal-app LoadBalancer 10.43.145.94 13.68.113.105 8989:30215/TCP,8006:30213/TCP,8010:30214/TCP,8443:30225/TCP 20h In the case of connecting to openlab through the vpn from your mac - you would need the 2nd number - which will be something like 10.0.0.12 - but the public IP corresponding to this private network IP - which only for this case is the e1 instance with 10.12.7.7 as the external routable IP
add the following and prefix with the IP above to your client's /etc/hosts
in this case I am using the public 13... ip (elastic or generated public ip) - AWS in this example 13.68.113.105 portal.api.simpledemo.onap.org 13.68.113.105 vid.api.simpledemo.onap.org 13.68.113.105 sdc.api.fe.simpledemo.onap.org 13.68.113.105 portal-sdk.simpledemo.onap.org 13.68.113.105 policy.api.simpledemo.onap.org 13.68.113.105 aai.api.sparky.simpledemo.onap.org 13.68.113.105 cli.api.simpledemo.onap.org 13.68.113.105 msb.api.discovery.simpledemo.onap.org
launch
http://portal.api.simpledemo.onap.org:8989/ONAPPORTAL/login.htm
login with demo user
Accessing MariaDB portal container
kubectl n onap exec -it dev-portal-portal-db-b8db58679-q9pjq - mysql -D mysql -h localhost -e 'select * from user'
see
- PORTAL-399Getting issue details... STATUS and - PORTAL-498Getting issue details... STATUS
Running the vFirewall
Casablanca Stability Testing Instructions
# verifying on ld.onap.cloud 20190126 oom/kubernetes/robot/demo-k8s.sh onap init Initialize Customer And Models | FAIL | ConnectionError: HTTPConnectionPool(host='1.2.3.4', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efd0f8a4ad0>: Failed to establish a new connection: [Errno 110] Connection timed out',)) # push sample vFWCL policies PAP_POD=$(kubectl --namespace onap get pods | grep policy-pap | sed 's/ .*//') kubectl exec -it $PAP_POD -n onap -c pap -- bash -c 'export PRELOAD_POLICIES=true; /tmp/policy-install/config/push-policies.sh' # ete instantiateDemoVFWC /root/oom/kubernetes/robot/ete-k8s.sh onap instantiateDemoVFWCL # restart drools kubectl delete pod dev-policy-drools-0 -n onap # wait for policy to kick in sleep 20m # demo vfwclosedloop /root/oom/kubernetes/robot/demo-k8s.sh onap vfwclosedloop $PNG_IP # check the sink on 667
Deployment Profile
For a view of the system see Log Streaming Compliance and API
Minimum Single VM Deployment
A single 122g R4.4xlarge VM in progress
see also - LOG-630Getting issue details... STATUS
helm install will bring up everything without the configmap failure - but the release is busted - pods come up though ubuntu@ip-172-31-27-63:~$ sudo helm install local/onap -n onap --namespace onap -f onap/resources/environments/disable-allcharts.yaml --set aai.enabled=true --set dmaap.enabled=true --set log.enabled=true --set policy.enabled=true --set portal.enabled=true --set robot.enabled=true --set sdc.enabled=true --set sdnc.enabled=true --set so.enabled=true --set vid.enabled=true
deployment | containers | |
---|---|---|
minimum (no vfwCL) | ||
medium (vfwCL) | ||
full |
Container Issues
20180901
amdocs@ubuntu:~/_dev/oom/kubernetes$ kubectl get pods --all-namespaces | grep 0/1 onap onap-aai-champ-68ff644d85-mpkb9 0/1 Running 0 1d onap onap-pomba-kibana-d76b6dd4c-j4q9m 0/1 Init:CrashLoopBackOff 472 1d amdocs@ubuntu:~/_dev/oom/kubernetes$ kubectl get pods --all-namespaces | grep 1/2 onap onap-aai-gizmo-856f86d664-mf587 1/2 CrashLoopBackOff 568 1d onap onap-pomba-networkdiscovery-85d76975b7-w9sjl 1/2 CrashLoopBackOff 573 1d onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-rtdqc 1/2 CrashLoopBackOff 569 1d onap onap-vid-84c88db589-vbfht 1/2 CrashLoopBackOff 616 1d with clamp and pomba enabled (ran clamp first) amdocs@ubuntu:~/_dev/oom/kubernetes$ sudo helm upgrade -i onap local/onap --namespace onap -f dev.yaml Error: UPGRADE FAILED: failed to create resource: Service "pomba-kibana" is invalid: spec.ports[0].nodePort: Invalid value: 30234: provided port is already allocated
Full ONAP Cluster
see the AWS cluster install below
Requirements
Hardware Requirements
VMs | RAM | HD | vCores | Ports | Network |
---|---|---|---|---|---|
1 | 55-70G at startup | 40G per host min (30G for dockers) 100G after a week 5G min per NFS 4GBPS peak | (need to reduce 152 pods to 110) 8 min 60 peak at startup recommended 16-64 vCores | see list on PortProfile Recommend 0.0.0.0/0 (all open) inside VPC Block 10249-10255 outside secure 8888 with oauth | 170 MB/sec peak 1200 |
3+ | 85G Recommend min 3 x 64G class VMs Try for 4 | master: 40G hosts: 80G (30G of dockers) NFS: 5G | 24 to 64 | ||
This is snapshot of the CD system running on Amazon AWS at http://jenkins.onap.info/job/oom-cd-master/ It is a 1 + 4 node cluster composed of four 64G/8vCore R4.2xLarge VMs |
Amazon AWS
Account Provider: (2) Robin of Amazon and Michael O'Brien of Amdocs
Amazon has donated an allocation enough for 512G of VM space (a large 4 x 122G/16vCore cluster and a secondary 9 x 16G cluster) in order to run CD systems since Dec 2017 - at a cost savings of at least $500/month - thank you very much Amazon in supporting ONAP See example max/med allocations for IT/Finance in ONAP Deployment Specification for Finance and Operations#AmazonAWS |
Amazon AWS is currently hosting our RI for ONAP Continuous Deployment - this is a joint Proof Of Concept between Amazon and ONAP.
Auto Continuous Deployment via Jenkins and Kibana
AWS CLI Installation
Install the AWS CLI on the bastion VM
https://docs.aws.amazon.com/cli/latest/userguide/cli-install-macos.html
OSX
obrien:obrienlabs amdocs$ pip --version pip 9.0.1 from /Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg (python 2.7) obrien:obrienlabs amdocs$ curl -O https://bootstrap.pypa.io/get-pip.py obrien:obrienlabs amdocs$ python3 get-pip.py --user Requirement already up-to-date: pip in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages obrien:obrienlabs amdocs$ pip3 install awscli --upgrade --user Successfully installed awscli-1.14.41 botocore-1.8.45 pyasn1-0.4.2 s3transfer-0.1.13
Ubuntu
obrien:obrienlabs amdocs$ ssh ubuntu@<your domain/ip> $ sudo apt install python-pip $ pip install awscli --upgrade --user $ aws --version aws-cli/1.14.41 Python/2.7.12 Linux/4.4.0-1041-aws botocore/1.8.45
Windows Powershell
Configure Access Keys for your Account
$aws configure AWS Access Key ID [None]: AK....Q AWS Secret Access Key [None]: Dl....l Default region name [None]: us-east-1 Default output format [None]: json $aws ec2 describe-regions --output table || ec2.ca-central-1.amazonaws.com | ca-central-1 || ....
Option 0: Deploy OOM Kubernetes to a spot VM
Peak Performance MetricsWe hit a peak of 44 cores during startup, with an external network peak of 1.2Gbps (throttled nexus servers at ONAP), a peak SSD write rate of 4Gbps and 55G ram on a 64 vCore/256G VM on AWS Spot.
Kubernetes Installation via CLI
Allocate an EIP static public IP (one-time)
https://docs.aws.amazon.com/cli/latest/reference/ec2/allocate-address.html
$aws ec2 allocate-address { "PublicIp": "35.172..", "Domain": "vpc", "AllocationId": "eipalloc-2f743..."}
Create a Route53 Record Set - Type A (one-time)
$ cat route53-a-record-change-set.json {"Comment": "comment","Changes": [ { "Action": "CREATE", "ResourceRecordSet": { "Name": "amazon.onap.cloud", "Type": "A", "TTL": 300, "ResourceRecords": [ { "Value": "35.172.36.." }]}}]} $ aws route53 change-resource-record-sets --hosted-zone-id Z...7 --change-batch file://route53-a-record-change-set.json { "ChangeInfo": { "Status": "PENDING", "Comment": "comment", "SubmittedAt": "2018-02-17T15:02:46.512Z", "Id": "/change/C2QUNYTDVF453x" }} $ dig amazon.onap.cloud ; <<>> DiG 9.9.7-P3 <<>> amazon.onap.cloud amazon.onap.cloud. 300 IN A 35.172.36.. onap.cloud. 172800 IN NS ns-1392.awsdns-46.org.
Request a spot EC2 Instance
# request the usually cheapest $0.13 spot 64G EBS instance at AWS aws ec2 request-spot-instances --spot-price "0.25" --instance-count 1 --type "one-time" --launch-specification file://aws_ec2_spot_cli.json # don't pass in the the following - it will be generated for the EBS volume "SnapshotId": "snap-0cfc17b071e696816" launch specification json { "ImageId": "ami-c0ddd64ba", "InstanceType": "r4.2xlarge", "KeyName": "obrien_systems_aws_201", "BlockDeviceMappings": [ {"DeviceName": "/dev/sda1", "Ebs": { "DeleteOnTermination": true, "VolumeType": "gp2", "VolumeSize": 120 }}], "SecurityGroupIds": [ "s2" ]} # results { "SpotInstanceRequests": [{ "Status": { "Message": "Your Spot request has been submitted for review, and is pending evaluation.", "Code": "pending-evaluation",
Get EC2 instanceId after creation
aws ec2 describe-spot-instance-requests --spot-instance-request-id sir-1tyr5etg "InstanceId": "i-02a653592cb748e2x",
Associate EIP with EC2 Instance
Can be done separately as long as it is in the first 30 sec during initialization and before rancher starts on the instance.
$aws ec2 associate-address --instance-id i-02a653592cb748e2x --allocation-id eipalloc-375c1d0x { "AssociationId": "eipassoc-a4b5a29x"}
Reboot EC2 Instance to apply DNS change to Rancher in AMI
$aws ec2 reboot-instances --instance-ids i-02a653592cb748e2x
Clustered Deployment
look at https://github.com/kubernetes-incubator/external-storage
EC2 Cluster Creation
EFS share for shared NFS
"From the NFS wizard"
Setting up your EC2 instance
- Using the Amazon EC2 console, associate your EC2 instance with a VPC security group that enables access to your mount target. For example, if you assigned the "default" security group to your mount target, you should assign the "default" security group to your EC2 instance. Learn more
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
If you're not using the EFS mount helper, install the NFS client on your EC2 instance:- On an Ubuntu instance:
sudo apt-get install nfs-common
- On an Ubuntu instance:
Mounting your file system
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
- Create a new directory on your EC2 instance, such as "efs".
- sudo mkdir efs
- Mount your file system. If you require encryption of data in transit, use the EFS mount helper and the TLS mount option. Mounting considerations
- Using the EFS mount helper:
sudo mount -t efs fs-43b2763a:/ efs - Using the EFS mount helper and encryption of data in transit:
sudo mount -t efs -o tls fs-43b2763a:/ efs - Using the NFS client:
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-43b2763a.efs.us-east-2.amazonaws.com:/ efs
- Using the EFS mount helper:
If you are unable to connect, see our troubleshooting documentation.
https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html
EFS/NFS Provisioning Script for AWS
https://git.onap.org/logging-analytics/tree/deploy/aws/oom_cluster_host_install.sh
ubuntu@ip-172-31-19-239:~$ sudo git clone https://gerrit.onap.org/r/logging-analytics Cloning into 'logging-analytics'... ubuntu@ip-172-31-19-239:~$ sudo cp logging-analytics/deploy/aws/oom_cluster_host_install.sh . ubuntu@ip-172-31-19-239:~$ sudo ./oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-0000001b -r us-west-1 -t 5EA8A:15000:MWcEyoKw -c true -v # fix helm after adding nodes to the master ubuntu@ip-172-31-31-219:~$ sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. ubuntu@ip-172-31-31-219:~$ sudo helm repo add local http://127.0.0.1:8879 "local" has been added to your repositories ubuntu@ip-172-31-31-219:~$ sudo helm repo list NAME URL stable https://kubernetes-charts.storage.googleapis.com local http://127.0.0.1:8879
4 Node Kubernetes Cluster on AWS
Notice that we are vCore bound Ideally we need 64 vCores for a minimal production system
Client Install
# setup the master sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # manually delete the host that was installed on the master - in the rancher gui for now # run without a client on the master sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true ls /dockerdata-nfs/ onap test.sh # run the script from git on each cluster nodes sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # check a node ls /dockerdata-nfs/ onap test.sh sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6e4a57e19c39 rancher/healthcheck:v0.3.3 "/.r/r /rancher-en..." 1 second ago Up Less than a second r-healthcheck-healthcheck-5-f0a8f5e8 f9bffc6d9b3e rancher/network-manager:v0.7.19 "/rancher-entrypoi..." 1 second ago Up 1 second r-network-services-network-manager-5-103f6104 460f31281e98 rancher/net:holder "/.r/r /rancher-en..." 4 seconds ago Up 4 seconds r-ipsec-ipsec-5-2e22f370 3e30b0cf91bb rancher/agent:v1.2.9 "/run.sh run" 17 seconds ago Up 16 seconds rancher-agent # On the master - fix helm after adding nodes to the master sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. sudo helm repo add local http://127.0.0.1:8879 # check the cluster on the master kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-16-85.us-west-1.compute.internal 129m 3% 1805Mi 5% ip-172-31-25-15.us-west-1.compute.internal 43m 1% 1065Mi 3% ip-172-31-28-145.us-west-1.compute.internal 40m 1% 1049Mi 3% ip-172-31-21-240.us-west-1.compute.internal 30m 0% 965Mi 3% # important: secure your rancher cluster by adding an oauth github account - to keep out crypto miners http://cluster.onap.info:8880/admin/access/github # now back to master to install onap # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w false -r false 136 pending > 0 at the 1th 15 sec interval ubuntu@ip-172-31-28-152:~$ kubectl get pods -n onap | grep -E '1/1|2/2' | wc -l 20 120 pending > 0 at the 39th 15 sec interval ubuntu@ip-172-31-28-152:~$ kubectl get pods -n onap | grep -E '1/1|2/2' | wc -l 47 99 pending > 0 at the 93th 15 sec interval after an hour most of the 136 containers should be up kubectl get pods --all-namespaces | grep -E '0/|1/2' onap onap-aaf-cs-59954bd86f-vdvhx 0/1 CrashLoopBackOff 7 37m onap onap-aaf-oauth-57474c586c-f9tzc 0/1 Init:1/2 2 37m onap onap-aai-champ-7d55cbb956-j5zvn 0/1 Running 0 37m onap onap-drools-0 0/1 Init:0/1 0 1h onap onap-nexus-54ddfc9497-h74m2 0/1 CrashLoopBackOff 17 1h onap onap-sdc-be-777759bcb9-ng7zw 1/2 Running 0 1h onap onap-sdc-es-66ffbcd8fd-v8j7g 0/1 Running 0 1h onap onap-sdc-fe-75fb4965bd-bfb4l 0/2 Init:1/2 6 1h # cpu bound - a small cluster has 4x4 cores - try to run with 4x16 cores ubuntu@ip-172-31-28-152:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-28-145.us-west-1.compute.internal 3699m 92% 26034Mi 85% ip-172-31-21-240.us-west-1.compute.internal 3741m 93% 3872Mi 12% ip-172-31-16-85.us-west-1.compute.internal 3997m 99% 23160Mi 75% ip-172-31-25-15.us-west-1.compute.internal 3998m 99% 27076Mi 88%
13 Node Kubernetes Cluster on AWS
Node: R4.large (2 cores, 16g)
Notice that we are vCore bound Ideally we need 64 vCores for a minimal production system - this runs with 12 x 4 vCores = 48
30 min after helm install start - DCAE containers come at at 55
ssh ubuntu@ld.onap.info # setup the master sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # manually delete the host that was installed on the master - in the rancher gui for now # get the token for use with the EFS/NFS share ubuntu@ip-172-31-8-245:~$ cat ~/.kube/config | grep token token: "QmFzaWMgTVVORk4wRkdNalF3UXpNNE9E.........RtNWxlbXBCU0hGTE1reEJVamxWTjJ0Tk5sWlVjZz09" # run without a client on the master ubuntu@ip-172-31-8-245:~$ sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s ld.onap.info -e fs-....eb -r us-east-2 -t QmFzaWMgTVVORk4wRkdNalF3UX..........aU1dGSllUVkozU0RSTmRtNWxlbXBCU0hGTE1reEJVamxWTjJ0Tk5sWlVjZz09 -c true -v true ls /dockerdata-nfs/ onap test.sh # run the script from git on each cluster node sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true ubuntu@ip-172-31-8-245:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-14-254.us-east-2.compute.internal 45m 1% 1160Mi 7% ip-172-31-3-195.us-east-2.compute.internal 29m 0% 1023Mi 6% ip-172-31-2-105.us-east-2.compute.internal 31m 0% 1004Mi 6% ip-172-31-0-159.us-east-2.compute.internal 30m 0% 1018Mi 6% ip-172-31-12-122.us-east-2.compute.internal 34m 0% 1002Mi 6% ip-172-31-0-197.us-east-2.compute.internal 30m 0% 1015Mi 6% ip-172-31-2-244.us-east-2.compute.internal 123m 3% 2032Mi 13% ip-172-31-11-30.us-east-2.compute.internal 38m 0% 1142Mi 7% ip-172-31-9-203.us-east-2.compute.internal 33m 0% 998Mi 6% ip-172-31-1-101.us-east-2.compute.internal 32m 0% 996Mi 6% ip-172-31-9-128.us-east-2.compute.internal 31m 0% 1037Mi 6% ip-172-31-3-141.us-east-2.compute.internal 30m 0% 1011Mi 6% # now back to master to install onap # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w false -r false after an hour most of the 136 containers should be up kubectl get pods --all-namespaces | grep -E '0/|1/2'
Amazon EKS Cluster for ONAP Deployment
- LOG-554Getting issue details... STATUS
- LOG-939Getting issue details... STATUS
follow
https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html
https://aws.amazon.com/getting-started/projects/deploy-kubernetes-app-amazon-eks/
follow the VPC CNI plugin - https://aws.amazon.com/blogs/opensource/vpc-cni-plugin-v1-1-available/
and 20190121 work with John Lotoskion https://lists.onap.org/g/onap-discuss/topic/aws_efs_nfs_and_rancher_2_2/29382184?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29382184
Network Diagram
Standard ELB and public/private VPC
Create EKS cluster
Provision access to EKS cluster
Kubernetes Installation via CloudFormation
ONAP Installation
SSH and upload OOM
oom_rancher_install.sh is in - OOM-715Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32019/
Run OOM
see - OOM-710Getting issue details... STATUS
cd.sh in - OOM-716Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32653/
Scenario: installing Rancher on clean Ubuntu 16.04 64g VM (single collocated server/host) and the master branch of onap via OOM deployment (2 scripts)
1 hour video of automated installation on an AWS EC2 spot instance
Run Healthcheck
Run Automated Robot parts of vFirewall VNF
Report Results
Stop Spot Instance
$ aws ec2 terminate-instances --instance-ids i-0040425ac8c0d8f6x { "TerminatingInstances": [ { "InstanceId": "i-0040425ac8c0d8f63", "CurrentState": { "Code": 32, "Name": "shutting-down" }, "PreviousState": { "Code": 16, "Name": "running" } } ]}
Verify Instance stopped
Video on Installing and Running the ONAP Demos#ONAPDeploymentVideos
WE can run ONAP on an AWS EC2 instance for $0.17/hour as opposed to Rackspace at $1.12/hour for a 64G Ubuntu host VM.
I have created an AMI on Amazon AWS under the following ID that has a reference 20170825 tag of ONAP 1.0 running on top of Rancher
ami-b8f3f3c3 : onap-oom-k8s-10
EIP 34.233.240.214 maps to http://dev.onap.info:8880/env/1a7/infra/hosts
A D2.2xlarge with 61G ram on the spot market https://console.aws.amazon.com/ec2sp/v1/spot/launch-wizard?region=us-east-1 at $0.16/hour for all of ONAP
It may take up to 3-8 min for kubernetes pods to initialize as long as you preload the docker images - OOM-328Getting issue details... STATUS
Workaround for the disk space error - even though we are running with a 1.9 TB NVMe SSD
https://github.com/kubernetes/kubernetes/issues/48703
Use a flavor that uses EBS like M4.4xLarge which is OK
Use a flavor that uses EBS like M4.4xLarge which is OK - except for AAI right now
Expected Monthly Billing
r4.2xlarge is the smallest and most cost effective 64g min instance to use for full ONAP deployment - it requires EBS stores. This is assuming 1 instance up at all times and a couple ad-hoc instances up a couple hours for testing/experimentation.
Option 1: Migrating Heat to CloudFormation
Resource Correspondence
ID | Type | Parent | AWS | Openstack |
---|---|---|---|---|
Using the CloudFormationDesigner
https://console.aws.amazon.com/cloudformation/designer/home?region=us-east-1#
Decoupling and Abstracting Southbound Orchestration via Plugins
Part of getting another infrastructure provider like AWS to work with ONAP will be in identifying and decoupling southbound logic from any particular cloud provider using an extensible plugin architecture on the SBI interface.
see Multi VIM/Cloud (5/11/17), VID project (5/17/17), Service Orchestrator (5/14/17), ONAP Operations Manager (5/10/17), ONAP Operations Manager / ONAP on Containers
Design Issues
DI 1: Refactor nested orchestration in DCAE
Replace the DCAE Controller
DI 2: Elastic IP allocation
DI 3: Investigate Cloudify plugin for AWS
Cloudify is Tosca based - https://github.com/cloudify-cosmo/cloudify-aws-plugin
DI 4: 20180803 Investigate ISTIO service mesh
https://istio.io/docs/setup/kubernetes/quick-start/
- LOG-592Getting issue details... STATUS
Links
Waiting for the EC2 C5 instance types under the C620 chipset to arrive at AWS so we can experiment under EC2 Spot - http://technewshunter.com/cpus/intel-launches-xeon-w-cpus-for-workstations-skylake-sp-ecc-for-lga2066-41771/ https://aws.amazon.com/about-aws/whats-new/2016/11/coming-soon-amazon-ec2-c5-instances-the-next-generation-of-compute-optimized-instances/
http://docs.aws.amazon.com/cli/latest/userguide/cli-install-macos.html
use
curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip" unzip awscli-bundle.zip sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws aws --version aws-cli/1.11.170 Python/2.7.13 Darwin/16.7.0 botocore/1.7.28
EC2 VMs
AWS Clustered Deployment
AWS EC2 Cluster Creation
AWS EFS share for shared NFS
You need an NFS share between the VM's in your Kubernetes cluster - an Elastic File System share will wrap NFS
"From the NFS wizard"
Setting up your EC2 instance
- Using the Amazon EC2 console, associate your EC2 instance with a VPC security group that enables access to your mount target. For example, if you assigned the "default" security group to your mount target, you should assign the "default" security group to your EC2 instance. Learn more
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
If you're not using the EFS mount helper, install the NFS client on your EC2 instance:- On an Ubuntu instance:
sudo apt-get install nfs-common
- On an Ubuntu instance:
Mounting your file system
- Open an SSH client and connect to your EC2 instance. (Find out how to connect)
- Create a new directory on your EC2 instance, such as "efs".
- sudo mkdir efs
- Mount your file system. If you require encryption of data in transit, use the EFS mount helper and the TLS mount option. Mounting considerations
- Using the EFS mount helper:
sudo mount -t efs fs-43b2763a:/ efs - Using the EFS mount helper and encryption of data in transit:
sudo mount -t efs -o tls fs-43b2763a:/ efs - Using the NFS client:
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-43b2763a.efs.us-east-2.amazonaws.com:/ efs
- Using the EFS mount helper:
If you are unable to connect, see our troubleshooting documentation.
https://docs.aws.amazon.com/efs/latest/ug/mounting-fs.html
Automated
Manual
ubuntu@ip-10-0-0-66:~$ sudo apt-get install nfs-common ubuntu@ip-10-0-0-66:~$ cd / ubuntu@ip-10-0-0-66:~$ sudo mkdir /dockerdata-nfs root@ip-10-0-0-19:/# sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-43b2763a.efs.us-east-2.amazonaws.com:/ /dockerdata-nfs # write something on one vm - and verify it shows on another ubuntu@ip-10-0-0-8:~$ ls /dockerdata-nfs/ test.sh
Microsoft Azure
Subscription Sponsor: (1) Microsoft
VMs
Deliverables are deployment scripts, arm/cli templates for various deployment scenarios (single, multiple, federated servers)
In review - OOM-711Getting issue details... STATUS
Quickstart
Single collocated VM
Automation is currently only written for single VM that hosts both the rancher server and the deployed onap pods. Use the ARM template below to deploy your VM and provision it (adjust your config parameters)
Two choices, run the single oom_deployment.sh ARM wrapper - or use it to bring up an empty vm and run oom_entrypoint.sh manually. Once the VM comes up the oom_entrypoint.sh script will run - which will download the oom_rancher_setup.sh script to setup docker, rancher, kubernetes and helm - the entrypoint script will then run the cd.sh script to bring up onap based on your values.yaml config by running helm install on it.
# login to az cli, wget the deployment script, arm template and parameters file - edit the parameters file (dns, ssh key ...) and run the arm template wget https://git.onap.org/logging-analytics/plain/deploy/azure/oom_deployment.sh wget https://git.onap.org/logging-analytics/plain/deploy/azure/_arm_deploy_onap_cd.json wget https://git.onap.org/logging-analytics/plain/deploy/azure/_arm_deploy_onap_cd_z_parameters.json # either run the entrypoint which creates a resource template and runs the stack - or do those two commands manually ./oom_deployment.sh -b master -s azure.onap.cloud -e onap -r a_auto-youruserid_20180421 -t arm_deploy_onap_cd.json -p arm_deploy_onap_cd_z_parameters.json # wait for the VM to finish in about 75 min or watch progress by ssh'ing into the vm and doing root@ons-auto-201803181110z: sudo tail -f /var/lib/waagent/custom-script/download/0/stdout # if you wish to run the oom_entrypoint script yourself - edit/break the cloud init section at the end of the arm template and do it yourself below # download and edit values.yaml with your onap preferences and openstack tenant config wget https://jira.onap.org/secure/attachment/11414/values.yaml # download and run the bootstrap and onap install script, the -s server name can be an IP, FQDN or hostname wget https://git.onap.org/logging-analytics/plain/deploy/rancher/oom_entrypoint.sh chmod 777 oom_entrypoint.sh sudo ./oom_entrypoint.sh -b master -s devops.onap.info -e onap # wait 15 min for rancher to finish, then 30-90 min for onap to come up #20181015 - delete the deployment, recreate the onap environment in rancher with the template adjusted for more than the default 110 container limit - by adding --max-pods=500 # then redo the helm install
- OOM-714Getting issue details... STATUS see https://jira.onap.org/secure/attachment/11455/oom_openstack.yaml and https://jira.onap.org/secure/attachment/11454/oom_openstack_oom.env
- LOG-320Getting issue details... STATUS see https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_entrypoint.sh
customize your template (true/false for any components, docker overrides etc...)
https://jira.onap.org/secure/attachment/11414/values.yaml
Run oom_entrypoint.sh after you verified values.yaml - it will run both scripts below for you - a single node kubernetes setup running what you configured in values.yaml will be up in 50-90 min. If you want to just configure your vm without bringing up ONAP - comment out the cd.sh line and run that separately.
- LOG-325Getting issue details... STATUS see wget https://git.onap.org/logging-analytics/plain/deploy/rancher/oom_rancher_setup.sh
- LOG-326Getting issue details... STATUS see wget https://git.onap.org/logging-analytics/plain/deploy/cd.sh
Verify your system is up by doing a kubectl get pods --all-namespaces and checking the 8880 port to bring up the rancher or kubernetes gui.
Login to Azure CLI
https://portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Resources%2Fresources
Download Azure ONAP ARM template
see
- OOM-711Getting issue details... STATUS
Edit Azure ARM template environment parameters
Create Resource Group
az group create --name onap_eastus --location eastus
Run ARM template
az group deployment create --resource-group onap_eastus --template-file oom_azure_arm_deploy.json --parameters @oom_azure_arm_deploy_parameters.json
Wait for Rancher/Kubernetes install
The oom_entrypoint.sh script will be run as a cloud-init script on the VM - see
- LOG-320Getting issue details... STATUS
which runs
- LOG-325Getting issue details... STATUS
Wait for OOM ONAP install
see
- LOG-326Getting issue details... STATUS
Verify ONAP installation
kubectl get pods --all-namespaces # raise/lower onap components from the installed directory if using the oneclick arm template # amsterdam only root@ons-auto-master-201803191429z:/var/lib/waagent/custom-script/download/0/oom/kubernetes/oneclick# ./createAll.bash -n onap
Azure CLI Installation
Requirements
Azure subscription
OSX
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest
Install homebrew first (reinstall if you are on the latest OSX 10.13.2 https://github.com/Homebrew/install because of 3718)
Will install Python 3.6
$brew update $brew install azure-cli
https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli?view=azure-cli-latest
$ az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E..D to authenticate. [ { "cloudName": "AzureCloud", "id": "f4...b", "isDefault": true, "name": "Pay-As-You-Go", "state": "Enabled", "tenantId": "bcb.....f", "user": { "name": "michael@....org", "type": "user" }}]
Bastion/Jumphost VM in Azure
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-apt?view=azure-cli-latest
# in root AZ_REPO=$(lsb_release -cs) echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | sudo tee /etc/apt/sources.list.d/azure-cli.list apt-key adv --keyserver packages.microsoft.com --recv-keys 52E16F86FEE04B979B07E28DB02C46DF417A0893 apt-get install apt-transport-https apt-get update && sudo apt-get install azure-cli az login # verify root@ons-dmz:~# ps -ef | grep az root 1427 1 0 Mar17 ? 00:00:00 /usr/lib/linux-tools/4.13.0-1011-azure/hv_vss_daemon -n
Windows Powershell
https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?view=azure-cli-latest
ARM Template
Follow https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-create-first-template
Create a Storage Account
$ az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E...Z to authenticate. $ az group create --name examplegroup --location "South Central US" { "id": "/subscriptions/f4b...e8b/resourceGroups/examplegroup", "location": "southcentralus", "managedBy": null, "name": "examplegroup", "properties": { "provisioningState": "Succeeded" }, "tags": null } obrien:obrienlabs amdocs$ vi azuredeploy_storageaccount.json obrien:obrienlabs amdocs$ az group deployment create --resource-group examplegroup --template-file azuredeploy_storageaccount.json { "id": "/subscriptions/f4...e8b/resourceGroups/examplegroup/providers/Microsoft.Resources/deployments/azuredeploy_storageaccount", "name": "azuredeploy_storageaccount", "properties": { "additionalProperties": { "duration": "PT32.9822642S", "outputResources": [ { "id": "/subscriptions/f4..e8b/resourceGroups/examplegroup/providers/Microsoft.Storage/storageAccounts/storagekj6....kk2w", "resourceGroup": "examplegroup" }], "templateHash": "11440483235727994285"}, "correlationId": "41a0f79..90c291", "debugSetting": null, "dependencies": [], "mode": "Incremental", "outputs": {}, "parameters": {}, "parametersLink": null, "providers": [ { "id": null, "namespace": "Microsoft.Storage", "registrationState": null, "resourceTypes": [ { "aliases": null, "apiVersions": null, "locations": [ "southcentralus" ], "properties": null, "resourceType": "storageAccounts" }]}], "provisioningState": "Succeeded", "template": null, "templateLink": null, "timestamp": "2018-02-17T16:15:11.562170+00:00" }, "resourceGroup": "examplegroup"}
Pick a region
az account list-locations northcentralus for example
Create a resource group
# create a resource group if not already there az group create --name obrien_jenkins_b_westus2 --location westus2
Create a VM
We need a 128G VM with at least 8vCores (peak is 60) and a 100+GB drive. The sizes are detailed on https://docs.microsoft.com/en-ca/azure/virtual-machines/windows/sizes-memory - we will use the Standard_D32s_v3 type
We need an "all open 0.0.0.0/0" security group and a reassociated data drive as boot drive - see the arm template in LOG-321
Get the ARM template
see open review in - OOM-711Getting issue details... STATUS
"ubuntuOSVersion": "16.04.0-LTS" "imagePublisher": "Canonical", "imageOffer": "UbuntuServer", "vmSize": "Standard_E8s_v3" "osDisk": {"createOption": "FromImage"},"dataDisks": [{"diskSizeGB": 511,"lun": 0, "createOption": "Empty" }]
Follow
https://github.com/Azure/azure-quickstart-templates/tree/master/101-acs-kubernetes
https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-template-deploy
https://github.com/Azure/azure-quickstart-templates/tree/master/101-vm-simple-linux
It needs a security group https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-create-nsg-arm-template
{ "apiVersion": "2017-03-01", "type": "Microsoft.Network/networkSecurityGroups", "name": "[variables('networkSecurityGroupName')]", "location": "[resourceGroup().location]", "tags": { "displayName": "NSG - Front End" }, "properties": { "securityRules": [ { "name": "in-rule", "properties": { "description": "All in", "protocol": "Tcp", "sourcePortRange": "*", "destinationPortRange": "*", "sourceAddressPrefix": "Internet", "destinationAddressPrefix": "*", "access": "Allow", "priority": 100, "direction": "Inbound" } }, { "name": "out-rule", "properties": { "description": "All out", "protocol": "Tcp", "sourcePortRange": "*", "destinationPortRange": "*", "sourceAddressPrefix": "Internet", "destinationAddressPrefix": "*", "access": "Allow", "priority": 101, "direction": "Outbound" } } ] } } , { "apiVersion": "2017-04-01", "type": "Microsoft.Network/virtualNetworks", "name": "[variables('virtualNetworkName')]", "location": "[resourceGroup().location]", "dependson": [ "[concat('Microsoft.Network/networkSecurityGroups/', variables('networkSecurityGroupName'))]" ], "properties": { "addressSpace": { "addressPrefixes": [ "[variables('addressPrefix')]" ] }, "subnets": [ { "name": "[variables('subnetName')]", "properties": { "addressPrefix": "[variables('subnetPrefix')]", "networkSecurityGroup": { "id": "[resourceId('Microsoft.Network/networkSecurityGroups', variables('networkSecurityGroupName'))]" } } } ] } },
# validate first (validate instead of create) az group deployment create --resource-group obrien_jenkins_b_westus2 --template-file oom_azure_arm_deploy.json --parameters @oom_azure_arm_cd_amsterdam_deploy_parameters.json
SSH into your VM and run the Kubernetes and OOM installation scripts
Use the entrypoint script in - OOM-710Getting issue details... STATUS
# clone the oom repo to get the install directory sudo git clone https://gerrit.onap.org/r/logging-analytics # run the Rancher RI installation (to install kubernetes) sudo logging-analytics/deploy/rancher/oom_rancher_install.sh -b master -s 192.168.240.32 -e onap # run the oom deployment script # get a copy of onap-parameters.yaml and place in this folder logging-analytics/deploy/cd.sh -b master -s 192.168.240.32 -e onap
oom_rancher_install.sh is in - OOM-715Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32019/
cd.sh in - OOM-716Getting issue details... STATUS under https://gerrit.onap.org/r/#/c/32653/
Delete the VM and resource group
# delete the vm and resources az group deployment delete --resource-group ONAPAMDOCS --name oom_azure_arm_deploy # the above deletion will not delete the actual resources - only a delete of the group or each individual resource works # optionally delete the resource group az group delete --name ONAPAMDOCS -y
Azure devops
create static IP
az network public-ip create --name onap-argon --resource-group a_ONAP_argon_prod_donotdelete --location eastus --allocation-method Static
ONAP on Azure Container Service
AKS Installation
Follow https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-deploy-cluster
Register for AKS preview via az cli
obrienbiometrics:obrienlabs michaelobrien$ az provider register -n Microsoft.ContainerService Registering is still on-going. You can monitor using 'az provider show -n Microsoft.ContainerService'
Create an AKS resource group
Raise your AKS vCPU quota - optional
http://aka.ms/corequotaincrease
https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest
Deployment failed. Correlation ID: 4b4707a7-2244-4557-855e-11bcced556de. Provisioning of resource(s) for container service onapAKSCluster in resource group onapAKS failed. Message: Operation results in exceeding quota limits of Core. Maximum allowed: 10, Current in use: 10, Additional requested: 1. Please read more about quota increase at http://aka.ms/corequotaincrease.. Details:
Create AKS cluster
obrienbiometrics:obrienlabs michaelobrien$ az aks create --resource-group onapAKS --name onapAKSCluster --node-count 1 --generate-ssh-keys - Running .. "fqdn": "onapaksclu-onapaks-f4....3.hcp.eastus.azmk8s.io",
AKS cluster VM granularity
The cluster will start with a 3.5G VM before scaling
Resources for your AKS cluster
Bring up AAI only for now
Design Issues
Resource Group
A resource group makes it easier to package and remove everything for a deployment - essentially making the deployment stateless
Network Security Group
Global or local to the resource group?
Follow CSEC guidelines https://www.cse-cst.gc.ca/en/system/files/pdf_documents/itsg-22-eng.pdf
Static public IP
Register a CNAME for an existing domain and use the same IP address everytime the deployment comes up
Entrypoint cloud init script
How to attach the cloud init script to provision the VM
ARM template chaining
passing derived varialbles into the next arm template - for example when bringing up an entire federated set in one or more DCs
see script attached to
Troubleshooting
DNS propagation and caching
It takes about 2 min for DNS entries to propagate out from A record DNS changes. For example the following IP/DNS association took 2 min to appear in dig.
obrienbiometrics:onap_oom_711_azure michaelobrien$ dig azure.onap.info ; <<>> DiG 9.9.7-P3 <<>> azure.onap.info ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10599 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;azure.onap.info. IN A ;; ANSWER SECTION: azure.onap.info. 251 IN A 52.224.233.230 ;; Query time: 68 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue Feb 20 10:26:59 EST 2018 ;; MSG SIZE rcvd: 60 obrienbiometrics:onap_oom_711_azure michaelobrien$ dig azure.onap.info ; <<>> DiG 9.9.7-P3 <<>> azure.onap.info ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30447 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;azure.onap.info. IN A ;; ANSWER SECTION: azure.onap.info. 299 IN A 13.92.225.167 ;; Query time: 84 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue Feb 20 10:27:04 EST 2018
Corporate Firewall Access
Inside the corporate firewall - avoid it PS C:\> az login Please ensure you have network connection. Error detail: HTTPSConnectionPool(host='login.microsoftonline.com', port=443) : Max retries exceeded with url: /common/oauth2/devicecode?api-version=1.0 (Caused by NewConnectionError('<urllib3.conne ction.VerifiedHTTPSConnection object at 0x04D18730>: Failed to establish a new connection: [Errno 11001] getaddrinfo fai led',)) at home or cell hotspot PS C:\> az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E...2W to authenticate. [ { "cloudName": "AzureCloud", "id": "4...da1", "isDefault": true, "name": "Microsoft Azure Internal Consumption", "state": "Enabled", "tenantId": "72f98....47", "user": { "name": "fran...ocs.com", "type": "user" }] On corporate account (need permissions bump to be able to create a resource group prior to running an arm template https://wiki.onap.org/display/DW/ONAP+on+Kubernetes+on+Microsoft+Azure#ONAPonKubernetesonMicrosoftAzure-ARMTemplate PS C:\> az group create --name onapKubernetes --location eastus The client 'fra...s.com' with object id '08f98c7e-...ed' does not have authorization to per form action 'Microsoft.Resources/subscriptions/resourcegroups/write' over scope '/subscriptions/42e...8 7da1/resourcegroups/onapKubernetes'. try my personal = OK PS C:\> az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code EE...ULR to authenticate. Terminate batch job (Y/N)? y # hangs when first time login in a new pc PS C:\> az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code E.PBKS to authenticate. [ { "cloudName": "AzureCloud", "id": "f4b...b", "isDefault": true "name": "Pay-As-You-Go", "state": "Enabled", "tenantId": "bcb...f4f", "user": "name": "michael@obrien...org", "type": "user" } }] PS C:\> az group create --name onapKubernetes2 --location eastus { "id": "/subscriptions/f4b....b/resourceGroups/onapKubernetes2", "location": "eastus", "managedBy": null, "name": "onapKubernetes2", "properties": { "provisioningState": "Succeeded" }, "tags": null}
Design Issues
20180228: Deployment delete does not delete resources without a resourceGroup delete
I find that a delete deployment deletes the deployment but not the actual resources. The workaround is to delete the resource group - but in some constrained subscriptions the cli user may not have the ability to create a resource group - and hence delete it.
see
https://github.com/Azure/azure-sdk-for-java/issues/1167
deleting the resources manually for now - is a workaround if you cannot create/delete resource groups
# delete the vm and resources az group deployment delete --resource-group ONAPAMDOCS --name oom_azure_arm_deploy
# the above deletion will not delete the actual resources - only a delete of the group or each individual resource works
# optionally delete the resource group az group delete --name ONAPAMDOCS -y
However modifying the template to add resources works well. For example adding a reference to a network security group
20180228: Resize the OS disk
ONAP requires at least 75g - the issue is than in most VM templates on Azure - the OS disk is 30g - we need to either switch to the data disk or resize the os disk.
# add diskSizeGB to the template "osDisk": { "diskSizeGB": 255, "createOption": "FromImage" }, ubuntu@oom-auto-deploy:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 65989400 0 65989400 0% /dev tmpfs 13201856 8848 13193008 1% /run /dev/sda1 259142960 1339056 257787520 1% / tmpfs 66009280 0 66009280 0% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 66009280 0 66009280 0% /sys/fs/cgroup none 64 0 64 0% /etc/network/interfaces.dynamic.d /dev/sdb1 264091588 60508 250592980 1% /mnt tmpfs 13201856 0 13201856 0% /run/user/1000 ubuntu@oom-auto-deploy:~$ free total used free shared buff/cache available Mem: 132018560 392336 131242164 8876 384060 131012328
20180301: Add oom_entrypoint.sh bootstrap script to install rancher and onap
in review under OOM-715
https://jira.onap.org/secure/attachment/11206/oom_entrypoint.sh
If using amsterdam - swap out the onap-parameters.yaml (the curl is hardcoded to a master branch version)
20180303: cloudstorage access on OSX via Azure Storage Manager
use this method instead of installing az cli directly - for certain corporate oauth configurations
https://azure.microsoft.com/en-us/features/storage-explorer/
Install AZM using the name and access key of a storage account created manually or by enabling the az cli on the browser
20180318: add oom_entrypoint.sh to cloud-init on the arm template
See https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/virtualmachines/extensions it looks like Azure has a similar setup to AWS ebextentions
Targetting
type | string | No | Specifies the type of the extension; an example is "CustomScriptExtension". |
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/extensions-customscript
deprecated { "apiVersion": "2015-06-15", "type": "Microsoft.Compute/virtualMachines/extensions", "name": "[concat(parameters('vmName'),'/onap')]", "location": "[resourceGroup().location]", "dependsOn": ["[concat('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"], "properties": { "publisher": "Microsoft.Azure.Extensions", "type": "CustomScript", "typeHandlerVersion": "1.9", "autoUpgradeMinorVersion": true, "settings": { "fileUris": [ "https://jira.onap.org/secure/attachment/11263/oom_entrypoint.sh" ], "commandToExecute": "[concat('./' , parameters('scriptName'), ' -b master -s dns/pub/pri-ip -e onap' )]" } } } use { "apiVersion": "2017-12-01", "type": "Microsoft.Compute/virtualMachines/extensions", "name": "[concat(parameters('vmName'),'/onap')]", "location": "[resourceGroup().location]", "dependsOn": ["[concat('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"], "properties": { "publisher": "Microsoft.Azure.Extensions", "type": "CustomScript", "typeHandlerVersion": "2.0", "autoUpgradeMinorVersion": true, "settings": { "fileUris": [ "https://jira.onap.org/secure/attachment/11281/oom_entrypoint.sh" ], "commandToExecute": "[concat('./' , parameters('scriptName'), ' -b master ', ' -s ', 'ons-auto-201803181110z', ' -e onap' )]" } } }
ubuntu@ons-dmz:~$ ./oom_deployment.sh
Deployment template validation failed: 'The template resource 'entrypoint' for type 'Microsoft.Compute/virtualMachines/extensions' at line '1' and column '6182' has incorrect segment lengths. A nested resource type must have identical number of segments as its resource name. A root resource type must have segment length one greater than its resource name. Please see https://aka.ms/arm-template/#resources for usage details.'.
ubuntu@ons-dmz:~$ ./oom_deployment.sh
Deployment failed. Correlation ID: 532b9a9b-e0e8-4184-9e46-6c2e7c15e7c7. {
"error": {
"code": "ParentResourceNotFound",
"message": "Can not perform requested operation on nested resource. Parent resource '[concat(parameters('vmName'),'' not found."
}
}
fixed 20180318:1600
Install runs - but I need visibility - checking /var/lib/waagent/custom-script/download/0/
progress
./oom_deployment.sh # 7 min to delete old deployment ubuntu@ons-dmz:~$ az vm extension list -g a_ONAP_auto_201803181110z --vm-name ons-auto-201803181110z .. "provisioningState": "Creating", "settings": { "commandToExecute": "./oom_entrypoint.sh -b master -s ons-auto-201803181110zons-auto-201803181110z.eastus.cloudapp.azure.com -e onap", "fileUris": [ "https://jira.onap.org/secure/attachment/11263/oom_entrypoint.sh" ubuntu@ons-auto-201803181110z:~$ sudo su - root@ons-auto-201803181110z:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 83458596d7a6 rancher/server:v1.6.14 "/usr/bin/entry /u..." 3 minutes ago Up 3 minutes 3306/tcp, 0.0.0.0:8880->8080/tcp rancher_server root@ons-auto-201803181110z:~# tail -f /var/log/azure/custom-script/handler.log time=2018-03-18T22:51:59Z version=v2.0.6/git@1008306-clean operation=enable seq=0 file=0 event="download complete" output=/var/lib/waagent/custom-script/download/0 time=2018-03-18T22:51:59Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="executing command" output=/var/lib/waagent/custom-script/download/0 time=2018-03-18T22:51:59Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="executing public commandToExecute" output=/var/lib/waagent/custom-script/download/0 root@ons-auto-201803181110z:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 539733f24c01 rancher/agent:v1.2.9 "/run.sh run" 13 seconds ago Up 13 seconds rancher-agent 83458596d7a6 rancher/server:v1.6.14 "/usr/bin/entry /u..." 5 minutes ago Up 5 minutes 3306/tcp, 0.0.0.0:8880->8080/tcp rancher_server root@ons-auto-201803181110z:~# ls -la /var/lib/waagent/custom-script/download/0/ total 31616 -rw-r--r-- 1 root root 16325186 Aug 31 2017 helm-v2.6.1-linux-amd64.tar.gz -rw-r--r-- 1 root root 4 Mar 18 22:55 kube_env_id.json drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 18 22:53 linux-amd64 -r-x------ 1 root root 2822 Mar 18 22:51 oom_entrypoint.sh -rwxrwxrwx 1 root root 7288 Mar 18 22:52 oom_rancher_setup.sh -rwxr-xr-x 1 root root 12213376 Mar 18 22:53 rancher -rw-r--r-- 1 root root 3736787 Dec 20 19:41 rancher-linux-amd64-v0.6.7.tar.gz drwxr-xr-x 2 root root 4096 Dec 20 19:39 rancher-v0.6.7
testing via http://jenkins.onap.cloud/job/oom_azure_deployment/
Need the ip address and not the domain name - via linked template
or
https://docs.microsoft.com/en-us/azure/templates/microsoft.network/publicipaddresses
https://github.com/Azure/azure-quickstart-templates/issues/583
Arm templates cannot specify a static ip - without a private subnet
reference(variables('publicIPAddressName')).ipAddress
for
reference(variables('nicName')).ipConfigurations[0].properties.privateIPAddress
Using the hostname instead of the private/public ip works (verify /etc/hosts though)
obrienbiometrics:oom michaelobrien$ ssh ubuntu@13.99.207.60 ubuntu@ons-auto-201803181110z:~$ sudo su - root@ons-auto-201803181110z:/var/lib/waagent/custom-script/download/0# cat stdout INFO: Running Agent Registration Process, CATTLE_URL=http://ons-auto-201803181110z:8880/v1 INFO: Attempting to connect to: http://ons-auto-201803181110z:8880/v1 INFO: http://ons-auto-201803181110z:8880/v1 is accessible INFO: Inspecting host capabilities INFO: Boot2Docker: false INFO: Host writable: true INFO: Token: xxxxxxxx INFO: Running registration INFO: Printing Environment INFO: ENV: CATTLE_ACCESS_KEY=9B0FA1695A3E3CFD07DB INFO: ENV: CATTLE_HOME=/var/lib/cattle INFO: ENV: CATTLE_REGISTRATION_ACCESS_KEY=registrationToken INFO: ENV: CATTLE_REGISTRATION_SECRET_KEY=xxxxxxx INFO: ENV: CATTLE_SECRET_KEY=xxxxxxx INFO: ENV: CATTLE_URL=http://ons-auto-201803181110z:8880/v1 INFO: ENV: DETECTED_CATTLE_AGENT_IP=172.17.0.1 INFO: ENV: RANCHER_AGENT_IMAGE=rancher/agent:v1.2.9 INFO: Launched Rancher Agent: b44bd62fd21c961f32f642f7c3b24438fc4129eabbd1f91e1cf58b0ed30b5876 waiting 7 min for host registration to finish 1 more min KUBECTL_TOKEN base64 encoded: QmFzaWMgUWpBNE5EWkdRlRNN.....Ukc1d2MwWTJRZz09 run the following if you installed a higher kubectl version than the server helm init --upgrade Verify all pods up on the kubernetes system - will return localhost:8080 until a host is added kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-76b8cd7b5-v5jrd 1/1 Running 0 5m kube-system kube-dns-5d7b4487c9-9bwk5 3/3 Running 0 5m kube-system kubernetes-dashboard-f9577fffd-cpwv7 1/1 Running 0 5m kube-system monitoring-grafana-997796fcf-s4sjm 1/1 Running 0 5m kube-system monitoring-influxdb-56fdcd96b-2mn6r 1/1 Running 0 5m kube-system tiller-deploy-cc96d4f6b-fll4t 1/1 Running 0 5m
20180318: Create VM image without destroying running VM
In AWS we can select the "no reboot" option and create an image from a running VM as-is with no effect on the running system.
Having issues with the Azure image creator - it is looking for the ubuntu pw even though I only use key based access
20180319: New Relic Monitoring
20180319: document devops flow
aka: travellers guide
20180319: Document Virtual Network Topology
20180429: Helm repo n/a after reboot - rerun helm serve
If you run into issues doing a make all - your helm server is not running
# rerun helm serve & helm repo add local http://127.0.0.1:8879
20180516: Clustered NFS share via Azure Files
Need a cloud native NFS wrapper like EFS(AWS) - looking at Azure files
Training
(Links below from Microsoft - thank you)
General Azure Documentation
Azure Site http://azure.microsoft.com
Azure Documentation Site https://docs.microsoft.com/en-us/azure/
Azure Training Courses https://azure.microsoft.com/en-us/training/free-online-courses/
Azure Portal http://portal.azure.com
Developer Documentation
Azure AD Authentication Libraries https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-authentication-libraries
Java Overview on Azure https://azure.microsoft.com/en-us/develop/java/
Java Docs for Azure https://docs.microsoft.com/en-us/java/azure/
Java SDK on GitHub https://github.com/Azure/azure-sdk-for-java
Python Overview on Azure https://azure.microsoft.com/en-us/develop/python/
Python Docs for Azure https://docs.microsoft.com/en-us/python/azure/
Python SDK on GitHub https://github.com/Azure/azure-sdk-for-python
REST Api and CLI Documentation
REST API Documentation https://docs.microsoft.com/en-us/rest/api/
CLI Documentation https://docs.microsoft.com/en-us/cli/azure/index
Other Documentation
Using Automation for VM shutdown & startup https://docs.microsoft.com/en-us/azure/automation/automation-solution-vm-management
Azure Resource Manager (ARM) QuickStart Templates https://github.com/Azure/azure-quickstart-templates
Known Forks
The code in this github repo has 2 month old copies of cd.sh and oom_rancher_install.sh
https://github.com/taranki/onap-azure
Use the official ONAP code in
https://gerrit.onap.org/r/logging-analytics
The original seed source from 2017 below is deprecated - use onap links above
https://github.com/obrienlabs/onap-root
Links
https://azure.microsoft.com/en-us/services/container-service/
https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/virtualmachines
https://kubernetes.io/docs/concepts/containers/images/#using-azure-container-registry-acr
https://azure.microsoft.com/en-us/features/storage-explorer/
https://docs.microsoft.com/en-ca/azure/virtual-machines/linux/capture-image
AKS
Google GCE
Account Provider: Michael O'Brien of Amdocs
OOM Installation on a GCE VM
The purpose of this page is to detail getting ONAP on Kubernetes (OOM) setup on a GCE VM.
I recommend using the ONAP on Kubernetes on Amazon EC2 Amazon EC2 Spot API - as it runs around $0.12-0.25/hr at 75% off instead of the $0.60 below (33% off for reserved instances) - this page is here so we can support GCE and also work with the kubernetes open source project in a space it was originally designed in at Google.
Login to your google account and start creating a 128g Ubuntu 16.04 VM
Install google command line tools
?????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? Components ? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? Status ? Name ? ID ? Size ? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? Not Installed ? App Engine Go Extensions ? app-engine-go ? 97.7 MiB ? ? Not Installed ? Cloud Bigtable Command Line Tool ? cbt ? 4.0 MiB ? ? Not Installed ? Cloud Bigtable Emulator ? bigtable ? 3.5 MiB ? ? Not Installed ? Cloud Datalab Command Line Tool ? datalab ? < 1 MiB ? ? Not Installed ? Cloud Datastore Emulator ? cloud-datastore-emulator ? 17.7 MiB ? ? Not Installed ? Cloud Datastore Emulator (Legacy) ? gcd-emulator ? 38.1 MiB ? ? Not Installed ? Cloud Pub/Sub Emulator ? pubsub-emulator ? 33.2 MiB ? ? Not Installed ? Emulator Reverse Proxy ? emulator-reverse-proxy ? 14.5 MiB ? ? Not Installed ? Google Container Local Builder ? container-builder-local ? 3.7 MiB ? ? Not Installed ? Google Container Registry's Docker credential helper ? docker-credential-gcr ? 2.2 MiB ? ? Not Installed ? gcloud Alpha Commands ? alpha ? < 1 MiB ? ? Not Installed ? gcloud Beta Commands ? beta ? < 1 MiB ? ? Not Installed ? gcloud app Java Extensions ? app-engine-java ? 116.0 MiB ? ? Not Installed ? gcloud app PHP Extensions ? app-engine-php ? 21.9 MiB ? ? Not Installed ? gcloud app Python Extensions ? app-engine-python ? 6.2 MiB ? ? Not Installed ? kubectl ? kubectl ? 15.9 MiB ? ? Installed ? BigQuery Command Line Tool ? bq ? < 1 MiB ? ? Installed ? Cloud SDK Core Libraries ? core ? 5.9 MiB ? ? Installed ? Cloud Storage Command Line Tool ? gsutil ? 3.3 MiB ? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????? ==> Source [/Users/michaelobrien/gce/google-cloud-sdk/completion.bash.inc] in your profile to enable shell command completion for gcloud. ==> Source [/Users/michaelobrien/gce/google-cloud-sdk/path.bash.inc] in your profile to add the Google Cloud SDK command line tools to your $PATH. gcloud init obrienbiometrics:google-cloud-sdk michaelobrien$ source ~/.bash_profile obrienbiometrics:google-cloud-sdk michaelobrien$ gcloud components update All components are up to date.
Connect to your VM by getting a dynamic SSH key
obrienbiometrics:google-cloud-sdk michaelobrien$ gcloud compute ssh instance-1 WARNING: The public SSH key file for gcloud does not exist. WARNING: The private SSH key file for gcloud does not exist. WARNING: You do not have an SSH key for gcloud. WARNING: SSH keygen will be executed to generate a key. Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /Users/michaelobrien/.ssh/google_compute_engine. Your public key has been saved in /Users/michaelobrien/.ssh/google_compute_engine.pub. The key fingerprint is: SHA256:kvS8ZIE1egbY+bEpY1RGN45ruICBo1WH8fLWqO435+Y michaelobrien@obrienbiometrics.local The key's randomart image is: +---[RSA 2048]----+ | o=o+* o | | . .oo+*.= . | |o o ..=.=+. | |.o o ++X+o | |. . ..BoS | | + * . | | . . . | | . o o | | .o. *E | +----[SHA256]-----+ Updating project ssh metadata.../Updated [https://www.googleapis.com/compute/v1/projects/onap-184300]. Updating project ssh metadata...done. Waiting for SSH key to propagate. Warning: Permanently added 'compute.2865548946042680113' (ECDSA) to the list of known hosts. Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-37-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Get cloud support with Ubuntu Advantage Cloud Guest: http://www.ubuntu.com/business/services/cloud 0 packages can be updated. 0 updates are security updates. michaelobrien@instance-1:~$
Open up firewall rules or the entire VM
We need at least port 8880 for rancher
obrienbiometrics:20171027_log_doc michaelobrien$ gcloud compute firewall-rules create open8880 --allow tcp:8880 --source-tags=instance-1 --source-ranges=0.0.0.0/0 --description="8880" Creating firewall...|Created [https://www.googleapis.com/compute/v1/projects/onap-184300/global/firewalls/open8880]. Creating firewall...done. NAME NETWORK DIRECTION PRIORITY ALLOW DENY open8880 default INGRESS 1000 tcp:8880
Better to edit the existing internal firewall rule to the CIDR 0.0.0.0/0
Continue with ONAP on Kubernetes
ONAP on Kubernetes#QuickstartInstallation
Kubernetes
Kubernetes API
follow https://kubernetes.io/docs/reference/kubectl/jsonpath/
Take the ~/.kube/config server and token and retrofit a rest call like the curl below
curl -k -H "Authorization: Bearer $TOKEN" -H 'Accept: application/json' $K8S-server-and-6443-port/api/v1/pods | jq -r .items[0].metadata.name heapster-7b48b696fc-67qv6
Kubernetes v11 Curl examples
for validating raw kubernetes api calls (take the .kube/config server and token and create a curl call with optional json parsing) - like below
ubuntu@ip-172-31-30-96:~$ curl -k -H "Authorization: Bearer QmFzaWMgUVV........YW5SdGFrNHhNdz09" -H 'Accept: application/json' https://o...fo:8880/r/projects/1a7/kubernetes:6443/api/v1/pods | jq -r .items[0].spec.containers[0] { "name": "heapster", "image": "docker.io/rancher/heapster-amd64:v1.5.2", "command": [ "/heapster", "--source=kubernetes:https://$KUBERNETES_SERVICE_HOST:443?inClusterConfig=true&useServiceAccount=true", "--sink=influxdb:http://monitoring-influxdb.kube-system.svc.cluster.local:8086?retention=0s", "--v=2" ], "resources": {}, "volumeMounts": [ { "name": "io-rancher-system-token-wf6d4", "readOnly": true, "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount" } ], "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "imagePullPolicy": "IfNotPresent" }
Kubernetes Best Practices
Local nexus proxy
in progress - needs values.yaml global override
ubuntu@a-onap-devopscd:~$ docker run -d -p 5000:5000 --restart=unless-stopped --name registry -e REGISTRY_PROXY_REMOTEURL=https://nexus3.onap.org:10001 registry:2 Unable to find image 'registry:2' locally 2: Pulling from library/registry Status: Downloaded newer image for registry:2 bd216e444f133b30681dab8b144a212d84e1c231cc12353586b7010b3ae9d24b ubuntu@a-onap-devopscd:~$ sudo docker ps | grep registry bd216e444f13 registry:2 "/entrypoint.sh /e..." 2 minutes ago Up About a minute 0.0.0.0:5000->5000/tcp registry
Verify your Kubernetes cluster is functioning properly - Tiller is up
Check the dashboard
http://dev.onap.info:8880/r/projects/1a7/kubernetes-dashboard:9090/#!/pod?namespace=_all
check kubectl
check tiller container is in state Running - not just tiller-deploy
ubuntu@a-onap-devops:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-9lqt2 1/1 Running 0 20d kube-system kube-dns-75c8cb4ccb-tw992 3/3 Running 0 20d kube-system kubernetes-dashboard-6f4c8b9cd5-rcbp2 1/1 Running 0 20d kube-system monitoring-grafana-76f5b489d5-r99rh 1/1 Running 0 20d kube-system monitoring-influxdb-6fc88bd58d-h875w 1/1 Running 0 20d kube-system tiller-deploy-645bd55c5d-bmxs7 1/1 Running 0 20d onap logdemonode-logdemonode-5c8bffb468-phbzd 2/2 Running 0 20d onap onap-log-elasticsearch-7557486bc4-72vpw 1/1 Running 0 20d onap onap-log-kibana-fc88b6b79-d88r7 1/1 Running 0 20d onap onap-log-logstash-9jlf2 1/1 Running 0 20d onap onap-portal-app-8486dc7ff8-tssd2 2/2 Running 0 5d onap onap-portal-cassandra-8588fbd698-dksq5 1/1 Running 0 5d onap onap-portal-db-7d6b95cd94-66474 1/1 Running 0 5d onap onap-portal-sdk-77cd558c98-6rsvq 2/2 Running 0 5d onap onap-portal-widget-6469f4bc56-hms24 1/1 Running 0 5d onap onap-portal-zookeeper-5d8c598c4c-hck2d 1/1 Running 0 5d onap onap-robot-6f99cb989f-kpwdr 1/1 Running 0 20d ubuntu@a-onap-devops:~$ kubectl describe pod tiller-deploy-645bd55c5d-bmxs7 -n kube-system Name: tiller-deploy-645bd55c5d-bmxs7 Namespace: kube-system Node: a-onap-devops/172.17.0.1 Start Time: Mon, 30 Jul 2018 22:20:09 +0000 Labels: app=helm name=tiller pod-template-hash=2016811718 Annotations: <none> Status: Running IP: 10.42.0.5 Controlled By: ReplicaSet/tiller-deploy-645bd55c5d Containers: tiller: Container ID: docker://a26420061a01a5791401c2519974c3190bf9f53fce5a9157abe7890f1f08146a Image: gcr.io/kubernetes-helm/tiller:v2.8.2 Image ID: docker-pullable://gcr.io/kubernetes-helm/tiller@sha256:9b373c71ea2dfdb7d42a6c6dada769cf93be682df7cfabb717748bdaef27d10a Port: 44134/TCP Command: /tiller --v=2 State: Running Started: Mon, 30 Jul 2018 22:20:14 +0000 Ready: True
LOGs
Helm Deploy plugin logs
Need these to triage helm deploys that do not show up in a helm list - as in there were errors before marking the deployment as failed
also use --verbose
ubuntu@a-ld0:~$ sudo ls ~/.helm/plugins/deploy/cache/onap/logs/onap- onap-aaf.log onap-cli.log onap-dmaap.log onap-multicloud.log onap-portal.log onap-sniro-emulator.log onap-vid.log onap-aai.log onap-consul.log onap-esr.log onap-oof.log onap-robot.log onap-so.log onap-vnfsdk.log onap-appc.log onap-contrib.log onap-log.log onap-policy.log onap-sdc.log onap-uui.log onap-vvp.log onap-clamp.log onap-dcaegen2.log onap-msb.log onap-pomba.log onap-sdnc.log onap-vfc.log
Monitoring
Grafana Dashboards
There is a built in grafana dashboard (thanks Mandeep Khinda and James MacNider) that once enabled can show more detail about the cluster you are running - you need to expose the nodeport and target the VM the pod is on.
The CD system one is running below http://master3.onap.info:32628/dashboard/db/cluster?orgId=1&from=now-12h&to=now
# expose the nodeport kubectl expose -n kube-system deployment monitoring-grafana --type=LoadBalancer --name monitoring-grafana-client service "monitoring-grafana-client" exposed # get the nodeport pod is running on kubectl get services --all-namespaces -o wide | grep graf kube-system monitoring-grafana ClusterIP 10.43.44.197 <none> 80/TCP 7d k8s-app=grafana kube-system monitoring-grafana-client LoadBalancer 10.43.251.214 18.222.4.161 3000:32628/TCP 15s k8s-app=grafana,task=monitoring # get the cluster vm DNS name ubuntu@ip-10-0-0-169:~$ kubectl get pods --all-namespaces -o wide | grep graf kube-system monitoring-grafana-997796fcf-7kkl4 1/1 Running 0 5d 10.42.84.138 ip-10-0-0-80.us-east-2.compute.internal
see also
- MSB-209Getting issue details... STATUS
Kubernetes DevOps
ONAP Development#KubernetesDevOps
Additional Tools
https://github.com/jonmosco/kube-ps1
https://github.com/ahmetb/kubectx
https://medium.com/@thisiskj/quickly-change-clusters-and-namespaces-in-kubernetes-6a5adca05615
https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
brew install kube-ps1 brew install kubectx
Openstack
Windriver Intel Lab
see - OOM-714Getting issue details... STATUS
Windriver/Openstack Lab Network Topology
Openlab VNC and CLI
The following is missing some sections and a bit out of date (v2 deprecated in favor of v3)
Get an openlab account - Integration / Developer Lab Access | |
Install openVPN - Using Lab POD-ONAP-01 Environment For OSX both Viscosity and TunnelBlick work fine | |
Login to Openstack | |
Install openstack command line tools | Tutorial: Configuring and Starting Up the Base ONAP Stack#InstallPythonvirtualenvTools(optional,butrecommended) |
get your v3 rc file | |
verify your openstack cli access (or just use the jumpbox) | obrienbiometrics:aws michaelobrien$ source logging-openrc.sh obrienbiometrics:aws michaelobrien$ openstack server list +--------------------------------------+---------+--------+-------------------------------+------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+---------+--------+-------------------------------+------------+ | 1ed28213-62dd-4ef6-bdde-6307e0b42c8c | jenkins | ACTIVE | admin-private-mgmt=10.10.2.34 | | +--------------------------------------+---------+--------+-------------------------------+------------+ |
get some elastic IP's | You may need to release unused IPs from other tenants - as we have 4 pools of 50 |
fill in your stack env parameters | to fill in your config (mso) settings in values.yaml follow https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_quickstart_guide.html section "To generate openStackEncryptedPasswordHere" example ubuntu@ip-172-31-54-73:~/_dev/log-137-57171/oom/kubernetes/so/resources/config/mso$ cat encryption.key aa3871669d893c7fb8abbcda31b88b4f ubuntu@ip-172-31-54-73:~/_dev/log-137-57171/oom/kubernetes/so/resources/config/mso$ echo -n "55" | openssl aes-128-ecb -e -K aa3871669d893c7fb8abbcda31b88b4f -nosalt | xxd -c 256 -p a355b08d52c73762ad9915d98736b23b |
Run the HEAT stack to create the kubernetes undercloud VMs | [michaelobrien@obrienbiometrics onap_log-324_heat(keystone_michael_o_brien)]$ openstack stack list +--------------------------------------+--------------------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+--------------------------+-----------------+----------------------+----------------------+ | d6371a95-dc3d-4103-978e-bab1f378573a | OOM-obrien-20181223-13-0 | CREATE_COMPLETE | 2018-12-23T14:55:10Z | 2018-12-23T14:55:10Z | | 7f821906-2216-4a6e-8ef0-d46a97adf3fc | obrien-nexus3 | CREATE_COMPLETE | 2018-12-20T02:41:38Z | 2018-12-20T02:41:38Z | | 9c4d3ebb-b7c9-4428-9e44-7ef5fba08940 | OOM20181216 | CREATE_COMPLETE | 2018-12-16T18:28:21Z | 2018-12-16T18:28:21Z | | 52379aea-d0a9-48db-a13e-35ca00876768 | dcae | DELETE_FAILED | 2018-03-04T22:02:12Z | 2018-12-16T05:05:19Z | +--------------------------------------+--------------------------+-----------------+----------------------+----------------------+ [michaelobrien@obrienbiometrics onap_log-324_heat(keystone_michael_o_brien)]$ openstack stack create -t logging_openstack_13_16g.yaml -e logging_openstack_oom.env OOM-obrien-20181223-13-0 +---------------------+-----------------------------------------+ | Field | Value | +---------------------+-----------------------------------------+ | id | d6371a95-dc3d-4103-978e-bab1f378573a | | stack_name | OOM-obrien-20181223-13-0 | | description | Heat template to install OOM components | | creation_time | 2018-12-23T14:55:10Z | | updated_time | 2018-12-23T14:55:10Z | | stack_status | CREATE_IN_PROGRESS | | stack_status_reason | Stack CREATE started | +---------------------+-----------------------------------------+ |
ssh in | see clusters in Logging DevOps Infrastructure obrienbiometrics:onap_log-324_heat michaelobrien$ ssh ubuntu@10.12.6.151 ubuntu@onap-oom-obrien-rancher:~$ docker version Client: Version: 17.03.2-ce API version: 1.27 |
install Kubernetes stack (rancher, k8s, helm) | - LOG-325Getting issue details... STATUS sudo git clone https://gerrit.onap.org/r/logging-analytics cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . # 20190105 - master, casablanca and 3.0.0-ONAP are all at the same Rancher 1.6.25, Kubernetes 1.11.5, Helm 2.9.1 and docker 17.03 levels # ignore the docker warning - as the cloud init script in the heat template already installed docker and prepulled images sudo nohup ./oom_rancher_setup.sh -b master -s 10.0.16.1 -n onap & # wait 90 min kubectl get pods --all-namespaces kubectl get pods --all-namespaces | grep 0/ |
create the NFS share | Scripts from above 20181207 https://jira.onap.org/secure/attachment/12887/master_nfs_node.sh https://jira.onap.org/secure/attachment/12888/slave_nfs_node.sh #master ubuntu@onap-oom-obrien-rancher-0:~$ sudo ./master_nfs_node.sh 10.12.5.99 10.12.5.86 10.12.5.136 10.12.6.179 10.12.5.102 10.12.5.4 ubuntu@onap-oom-obrien-rancher-0:~$ sudo ls /dockerdata-nfs/ test.sh #slaves ubuntu@onap-oom-obrien-rancher-1:~$ sudo ./slave_nfs_node.sh 10.12.5.68 ubuntu@onap-oom-obrien-rancher-1:~$ sudo ls /dockerdata-nfs/ test.sh |
deploy onap | # note this will saturate your 64g vm unless you run a cluster or turn off parts of onap sudo vi oom/kubernetes/onap/values.yaml # rerun cd.sh # or # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w false -r false |
ONAP Usage
Accessing an external Node Port
Elasticsearch port example
# get pod names and the actual VM that any pod is on ubuntu@ip-10-0-0-169:~$ kubectl get pods --all-namespaces -o wide | grep log- onap onap-log-elasticsearch-756cfb559b-wk8c6 1/1 Running 0 2h 10.42.207.254 ip-10-0-0-227.us-east-2.compute.internal onap onap-log-kibana-6bb55fc66b-kxtg6 0/1 Running 16 1h 10.42.54.76 ip-10-0-0-111.us-east-2.compute.internal onap onap-log-logstash-689ccb995c-7zmcq 1/1 Running 0 2h 10.42.166.241 ip-10-0-0-111.us-east-2.compute.internal onap onap-vfc-catalog-5fbdfc7b6c-xc84b 2/2 Running 0 2h 10.42.206.141 ip-10-0-0-227.us-east-2.compute.internal # get nodeport ubuntu@ip-10-0-0-169:~$ kubectl get services --all-namespaces -o wide | grep log- onap log-es NodePort 10.43.82.53 <none> 9200:30254/TCP 2h app=log-elasticsearch,release=onap onap log-es-tcp ClusterIP 10.43.90.198 <none> 9300/TCP 2h app=log-elasticsearch,release=onap onap log-kibana NodePort 10.43.167.146 <none> 5601:30253/TCP 2h app=log-kibana,release=onap onap log-ls NodePort 10.43.250.182 <none> 5044:30255/TCP 2h app=log-logstash,release=onap onap log-ls-http ClusterIP 10.43.81.173 <none> 9600/TCP 2h app=log-logstash,release=onap # check nodeport outside container ubuntu@ip-10-0-0-169:~$ curl ip-10-0-0-111.us-east-2.compute.internal:30254 { "name" : "-pEf9q9", "cluster_name" : "onap-log", "cluster_uuid" : "ferqW-rdR_-Ys9EkWw82rw", "version" : { "number" : "5.5.0", "build_hash" : "260387d", "build_date" : "2017-06-30T23:16:05.735Z", "build_snapshot" : false, "lucene_version" : "6.6.0" }, "tagline" : "You Know, for Search" } # check inside docker container - for reference ubuntu@ip-10-0-0-169:~$ kubectl exec -it -n onap onap-log-elasticsearch-756cfb559b-wk8c6 bash [elasticsearch@onap-log-elasticsearch-756cfb559b-wk8c6 ~]$ curl http://127.0.0.1:9200 { "name" : "-pEf9q9",
ONAP Deployment Specification
Resiliency
Longest lived deployment so far
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-479mx 1/1 Running 7 59d kube-system kube-dns-75c8cb4ccb-sqxbr 3/3 Running 45 59d kube-system kubernetes-dashboard-6f4c8b9cd5-w5xr2 1/1 Running 8 59d kube-system monitoring-grafana-76f5b489d5-sj9tl 1/1 Running 6 59d kube-system monitoring-influxdb-6fc88bd58d-22vg2 1/1 Running 6 59d kube-system tiller-deploy-8b6c5d4fb-4rbb4 1/1 Running 7 19d
Performance
Cluster Performance
ONAP runs best on a large cluster. As of 20180508 there are 152 pods (above the 110 limit per VM). ONAP is also vCPU bound - therefore try to run with a minimum of 24 vCores, ideally 32 to 64.
Even though most replicaSets are set at 3 - try to have at least 4 nodes so we can survive a node failure and still be able to run all the pods. The memory profile is around 85g right now.
Security Profile
ONAP will require certain ports open by CIDR to several static domain names in order to deploy defined in a security group. At runtime the list is reduced.
Ideally these are all inside a private network.
It looks like we will need a standard public/private network locked down behind a combined ACL/SG for AWS VPC or a NSG for Azure where we only expose what we need outside the private network.
Still working on a list of ports but we should not need any of these exposed if we use a bastion/jumpbox + nat combo inside the network.
Known Security Vulnerabilities
https://medium.com/handy-tech/analysis-of-a-kubernetes-hack-backdooring-through-kubelet-823be5c3d67c
https://github.com/kubernetes/kubernetes/pull/59666 fixed in Kubernetes 1.10
ONAP Port Profile
ONAP on deployment will require the following incoming and outgoing ports. Note: within ONAP rest calls between components will be handled inside the Kubernetes namespace by the DNS server running as part of K8S.
port | protocol | incoming/outgoing | application | source | destination | Notes |
---|---|---|---|---|---|---|
22 | ssh | ssh | developer vm | host | ||
443 | tiller | client | host | |||
8880 | http | rancher | client | host | ||
9090 | http | kubernetes | host | |||
10001 | https | nexus3 | nexus3.onap.org | |||
10003 | https | nexus3 | nexus3.onap.org | |||
https | nexus | nexus.onap.org | ||||
https ssh | git | git.onap.org | ||||
30200-30399 | http/https | REST api | developer vm | host | ||
32628 | http | grafana | dashboard for the kubernetes cluster - must be enabled | |||
5005 | tcp | java debug port | developer vm | host | ||
Lockdown ports | ||||||
8080 | outgoing | |||||
10250-10255 | in/out | Lock these down via VPC or a source CIDR that equals only the server/client IP list https://medium.com/handy-tech/analysis-of-a-kubernetes-hack-backdooring-through-kubelet-823be5c3d67c |
Azure Security Group
AWS VPC + Security Group
OOM Deployment Specification - 20180507 Beijing/master
The generated host registration docker call is the same as the one generated by the wiki - minus server IP (currently single node cluster) | |
Cluster Stability
- OOM-1520Getting issue details... STATUS
Long Duration Clusters
Single Node Deployments
A 31 day Azure deployment eventually hits the 80% FS saturation barrier - fix: - LOG-853Getting issue details... STATUS
onap onap-vnfsdk-vnfsdk-postgres-0 1/1 Running 0 30d onap onap-vnfsdk-vnfsdk-postgres-1 1/1 Running 0 30d ubuntu@a-osn-cd:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 222891708 0 222891708 0% /dev tmpfs 44580468 4295720 40284748 10% /run /dev/sda1 129029904 125279332 3734188 98% /
TODO
https://docs.microsoft.com/en-us/windows/wsl/about
Links
https://kubernetes.io/docs/user-guide/kubectl-cheatsheet/
ONAP on Kubernetes#QuickstartInstallation
https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/
https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/