Cloud Native Deployment
Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-SingleVM
ONAP on deployed by or RKE managed by on | ||||
VMs | Microsoft Azure | Google Cloud Platform | OpenStack | |
Managed | Amazon EKS | AKS | ||
Sponsor | Amazon (384G/m - 201801 to 201808) - thank you Michael O'Brien (201705-201905) Amdocs - 201903+ | Microsoft (201801+) Amdocs | Intel/Windriver (2017-) |
This is a private page under daily continuous modification to keep it relevant as a live reference (don't edit it unless something is really wrong) https://twitter.com/_mikeobrien | https://www.linkedin.com/in/michaelobrien-developer/ For general support consult the official documentation at http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html and https://onap.readthedocs.io/en/beijing/submodules/oom.git/docs/oom_cloud_setup_guide.html and raise DOC JIRA's for any modifications required to them. |
---|
This page details deployment of ONAP on any environment that supports Kubernetes based containers.
Chat: http://onap-integration.eastus.cloudapp.azure.com:3000/group/onap-integration
Separate namespaces - to avoid the 1MB configmap limit - or just helm install/delete everything (no helm upgrade)
https://kubernetes.slack.com/messages/C09NXKJKA/?
https://d1.awsstatic.com/whitepapers/architecture/AWS_Well-Architected_Framework.pdf
Deployment Profile
28 pods, 196 pods including vvp without the filebeat sidecars - 20181130 - this number is when all replicaSets and DaemonSets are set to 1 - which is 241 instances in the clustered case
Docker images currently size up to 75G as of 20181230
After a docker_prepull.sh |
---|
/dev/sda1 389255816 77322824 311916608 20% / |
Type | VMs | Total RAM vCores HD | VM Flavor | K8S/Rancher Idle RAM | Deployed | Deployed ONAP RAM | Pods | Containers | Max vCores | Idle vCores | HD/VM | HD NFS only | IOPS | Date | Cost | branch | Notes deployment post 75min |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Full Cluster (14 + 1) - recommended | 15 | 224G 112 vC 100G/VM | 16G, 8 vCores C5.2xLarge | 187Gb | 102Gb | 28 | 248 total 241 onap 217 up 0 error 24 config | 18 | 6+G master | 8.1G | 20181106 | $1.20 US/hour using the spot market | C | ||||
Single VM (possible - not recommended) | 1 | 432G 64 vC 180G | 256G+ 32+ vCores | Rancher: 13G Kubernetes: 8G Top: 10G | 165Gb (after 24h) | 141Gb | 28 | 240 total 196 if RS and DS are set to 1 | 55 | 22 | 131G (including 75G dockers) | n/a | Max: 550/sec Idle: 220/sec | 20181105 20180101 | C | Tested on 432G/64vCore azure VM - R 1.6.22 K8S 1.11 updated 20190101 | |
Developer 1-n pods | 1 | 16G | 16/32G 4-16 vCores | 14Gb | 10Gb | 3+ | 120+G | n/a | C | AAI+robot only |
Security
The VM should be open with no CIDR rules - but lock down 10249-10255 with RBAC
If you get an issue connecting to your rancher server "dial tcp 127.0.0.1:8880: getsockopt: connection refused" - this is usually security related - this line is the first to fail for example
https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh#n117
check the server first - either of these - but if the helm version hangs on "server" - the ports have an issue - run with all tcp/udp ports open 0.0.0.0/0 and ::/0 - and lock down the API on 10249-10255 via oauth github security from the rancher console to keep out crypto miners.
Example 15 node (1 master + 14 nodes) OOM Deployment
Rancher 1.6.25, Kubernetes 1.11.5, Docker 17.03, Helm 2.9.1
empty
With ONAP deployed
Throughput and Volumetrics
Cloudwatch CPU Average
Specific to logging - we have a problem on any VM that contains AAI - the logstash container is being saturated there - see the 30+ percent VM - - LOG-376Getting issue details... STATUS
NFS Throughput for /dockerdata-nfs
Cloudwatch Network In Max
Cost
Using the spot market on AWS - we ran a bill of $10 for 8 hours of 15 VM's of C5.2xLarge - (includes EBS but not DNS, EFS/NFS)
Details: 20181106:1800 EDT master
ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | wc -l 248 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | wc -l 241 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 217 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces | grep onap | grep -E '0/|1/2' | wc -l 24 ubuntu@ip-172-31-40-250:~$ kubectl get pods --all-namespaces -o wide | grep onap | grep -E '0/|1/2' onap onap-aaf-aaf-sms-preload-lvqx9 0/1 Completed 0 4h 10.42.75.71 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-distcenter-ql5f8 0/1 Completed 0 4h 10.42.75.223 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-aaf-aaf-sshsm-testca-7rzcd 0/1 Completed 0 4h 10.42.18.37 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-aai-aai-graphadmin-create-db-schema-26pfs 0/1 Completed 0 4h 10.42.14.14 ip-172-31-37-59.us-east-2.compute.internal <none> onap onap-aai-aai-traversal-update-query-data-qlk7w 0/1 Completed 0 4h 10.42.88.122 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-gmmvj 0/1 Completed 0 4h 10.42.111.99 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-n6fw4 0/1 Error 0 4h 10.42.21.12 ip-172-31-36-163.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-nc8ww 0/1 Error 0 4h 10.42.109.156 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-contrib-netbox-app-provisioning-xcxds 0/1 Error 0 4h 10.42.152.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-node-6496d8f55b-jfvrm 0/1 Init:0/1 28 4h 10.42.95.32 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-dmaap-dmaap-dr-prov-86f79c47f9-tldsp 0/1 CrashLoopBackOff 59 4h 10.42.76.248 ip-172-31-34-207.us-east-2.compute.internal <none> onap onap-oof-music-cassandra-job-config-7mb5f 0/1 Completed 0 4h 10.42.38.249 ip-172-31-41-110.us-east-2.compute.internal <none> onap onap-oof-oof-has-healthcheck-rpst7 0/1 Completed 0 4h 10.42.241.223 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-oof-oof-has-onboard-5bd2l 0/1 Completed 0 4h 10.42.205.75 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-qshzn 0/2 Completed 0 4h 10.42.112.46 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-portal-portal-db-config-rk4m2 0/2 Init:Error 0 4h 10.42.57.79 ip-172-31-38-194.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-2vw2q 0/1 Completed 0 4h 10.42.87.181 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-be-config-backend-k57lh 0/1 Init:Error 0 4h 10.42.148.79 ip-172-31-45-152.us-east-2.compute.internal <none> onap onap-sdc-sdc-cs-config-cassandra-vgnz2 0/1 Completed 0 4h 10.42.111.187 ip-172-31-34-111.us-east-2.compute.internal <none> onap onap-sdc-sdc-es-config-elasticsearch-lkb9m 0/1 Completed 0 4h 10.42.20.202 ip-172-31-39-138.us-east-2.compute.internal <none> onap onap-sdc-sdc-onboarding-be-cassandra-init-7zv5j 0/1 Completed 0 4h 10.42.218.1 ip-172-31-41-229.us-east-2.compute.internal <none> onap onap-sdc-sdc-wfd-be-workflow-init-q8t7z 0/1 Completed 0 4h 10.42.255.91 ip-172-31-41-30.us-east-2.compute.internal <none> onap onap-vid-vid-galera-config-4f274 0/1 Completed 0 4h 10.42.80.200 ip-172-31-33-223.us-east-2.compute.internal <none> onap onap-vnfsdk-vnfsdk-init-postgres-lf659 0/1 Completed 0 4h 10.42.238.204 ip-172-31-38-194.us-east-2.compute.internal <none> ubuntu@ip-172-31-40-250:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-33-223.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.222.148.116 18.222.148.116 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-111.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 3.16.37.170 3.16.37.170 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-34-207.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.225.32.201 18.225.32.201 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-36-163.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.58.189.251 13.58.189.251 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-24.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.224.180.26 18.224.180.26 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-37-59.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.191.248.14 18.191.248.14 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-194.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.217.45.91 18.217.45.91 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-38-95.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 52.15.39.21 52.15.39.21 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-39-138.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.224.199.40 18.224.199.40 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-110.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.223.151.180 18.223.151.180 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-229.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 18.218.252.13 18.218.252.13 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-41-30.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 3.16.113.3 3.16.113.3 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-42-33.us-east-2.compute.internal Ready <none> 5h v1.11.2-rancher1 13.59.2.86 13.59.2.86 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ip-172-31-45-152.us-east-2.compute.internal Ready <none> 4h v1.11.2-rancher1 18.219.56.50 18.219.56.50 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 ubuntu@ip-172-31-40-250:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-33-223.us-east-2.compute.internal 852m 10% 13923Mi 90% ip-172-31-34-111.us-east-2.compute.internal 1160m 14% 11643Mi 75% ip-172-31-34-207.us-east-2.compute.internal 1101m 13% 7981Mi 51% ip-172-31-36-163.us-east-2.compute.internal 656m 8% 13377Mi 87% ip-172-31-37-24.us-east-2.compute.internal 401m 5% 8543Mi 55% ip-172-31-37-59.us-east-2.compute.internal 711m 8% 10873Mi 70% ip-172-31-38-194.us-east-2.compute.internal 1136m 14% 8195Mi 53% ip-172-31-38-95.us-east-2.compute.internal 1195m 14% 9127Mi 59% ip-172-31-39-138.us-east-2.compute.internal 296m 3% 10870Mi 70% ip-172-31-41-110.us-east-2.compute.internal 2586m 32% 10950Mi 71% ip-172-31-41-229.us-east-2.compute.internal 159m 1% 9138Mi 59% ip-172-31-41-30.us-east-2.compute.internal 180m 2% 9862Mi 64% ip-172-31-42-33.us-east-2.compute.internal 1573m 19% 6352Mi 41% ip-172-31-45-152.us-east-2.compute.internal 1579m 19% 10633Mi 69%
Quickstart
Undercloud Install - Rancher/Kubernetes/Helm/Docker
Ubuntu 16.04 Host VM Configuration
key | value |
---|---|
Redhat 7.6 Host VM Configuration
see https://gerrit.onap.org/r/#/c/77850/
key | value |
---|---|
firewalld off | systemctl disable firewalld |
git, make, python | yum install git yum groupinstall 'Development Tools' |
IPv4 forwarding | add to /etc/sysctl.conf net.ipv4.ip_forward = 1 |
Networking enabled | sudo vi /etc/sysconfig/network-scripts/ifcfg-ens33 with ONBOOT=yes" |
General Host VM Configuration
Follow https://git.onap.org/logging-analytics/tree/deploy/rancher/oom_rancher_setup.sh
Run the following script on a clean Ubuntu 16.04 or Redhat RHEL 7.x (7.6) VM anywhere - it will provision and register your kubernetes system as a collocated master/host.
Ideally you install a clustered set of hosts away from the master VM - you can do this by deleting the host from the cluster after it is installed below and run the (docker, nfs and the rancher agent docker on each host)/
vm.max_map_count 64 to 256kb limit
The cd.sh script will fix your VM for this limitation first found in - LOG-334Getting issue details... STATUS . If you don't run the cd.sh script - run the following command manually on each VM so that any elasticsearch container comes up properly - this is a base OS issue.
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n49
# fix virtual memory for onap-log:elasticsearch under Rancher 1.6.11 - OOM-431 sudo sysctl -w vm.max_map_count=262144
Scripted RKE Kubernetes Cluster install
Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - Single VM
Prerequisites
Create a single VM - 256G+
See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS
Create a 0.0.0.0/0 ::/O open security group
Use github to OAUTH authenticate your cluster just after installing it.
Last test 20190305 using 3.0.1-ONAP
ONAP Development#Changemax-podsfromdefault110podlimit
# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # to be save edit/make sure dns resolution is setup to the host ubuntu@ld:~$ sudo cat /etc/hosts 127.0.0.1 cd.onap.info # 1 - configure combined master/host VM - 26 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # to deploy more than 110 pods per vm before the environment (1a7) is created from the kubernetes template (1pt2) - at the waiting 3 min mark - edit it via https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Changemax-podsfromdefault110podlimit --max-pods=900 https://lists.onap.org/g/onap-discuss/topic/oom_110_kubernetes_pod/25213556?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,25213556 in "additional kubelet flags" --max-pods=500 # on a 244G R4.8xlarge vm - 26 min later k8s cluster is up NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-6cfb49f776-5pq45 1/1 Running 0 10m kube-system kube-dns-75c8cb4ccb-7dlsh 3/3 Running 0 10m kube-system kubernetes-dashboard-6f4c8b9cd5-v625c 1/1 Running 0 10m kube-system monitoring-grafana-76f5b489d5-zhrjc 1/1 Running 0 10m kube-system monitoring-influxdb-6fc88bd58d-9494h 1/1 Running 0 10m kube-system tiller-deploy-8b6c5d4fb-52zmt 1/1 Running 0 2m # 3 - secure via github oauth the master - immediately to lock out crypto miners http://cd.onap.info:8880 # check the master cluster ubuntu@ip-172-31-14-89:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-8-245.us-east-2.compute.internal 179m 2% 2494Mi 4% ubuntu@ip-172-31-14-89:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-172-31-8-245.us-east-2.compute.internal Ready <none> 13d v1.10.3-rancher1 172.17.0.1 Ubuntu 16.04.1 LTS 4.4.0-1049-aws docker://17.3.2 # 7 - after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script - a clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp dev.yaml dev0.yaml sudo vi dev0.yaml sudo cp dev0.yaml dev1.yaml sudo cp logging-analytics/deploy/cd.sh . # this does a prepull (-p), clones 3.0.0-ONAP, managed install -f true sudo ./cd.sh -b 3.0.0-ONAP -e onap -p true -n nexus3.onap.org:10001 -f true -s 300 -c true -d true -w false -r false # check around 55 min (on a 256G single node - with 32 vCores) pods/failed/up @ min and ram 161/13/153 @ 50m 107g @55 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 152 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-deployment-handler-5789b89d4b-s6fzw 1/2 Running 0 8m onap dep-service-change-handler-76dcd99f84-fchxd 0/1 ContainerCreating 0 3m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 53m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 9 53m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 53m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 7 53m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 9 53m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 9 53m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 7 52m Note: DCAE has 2 sets of orchestration after the initial k8s orchestration - another at 57 min ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-prh-6b5c6ff445-pr547 0/2 ContainerCreating 0 2m onap dep-dcae-tca-analytics-7dbd46d5b5-bgrn9 0/2 ContainerCreating 0 1m onap dep-dcae-ves-collector-59d4ff58f7-94rpq 0/2 ContainerCreating 0 1m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 57m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 57m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 57m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 57m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 57m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 Error 10 57m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 57m at 1 hour ubuntu@ip-172-31-20-218:~$ free total used free shared buff/cache available Mem: 251754696 111586672 45000724 193628 95167300 137158588 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | wc -l 164 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep onap | grep -E '1/1|2/2' | wc -l 155 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l 8 ubuntu@ip-172-31-20-218:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' onap dep-dcae-ves-collector-59d4ff58f7-94rpq 1/2 Running 0 4m onap onap-aai-champ-68ff644d85-rv7tr 0/1 Running 0 59m onap onap-aai-gizmo-856f86d664-q5pvg 1/2 CrashLoopBackOff 10 59m onap onap-oof-85864d6586-zcsz5 0/1 ImagePullBackOff 0 59m onap onap-pomba-kibana-d76b6dd4c-sfbl6 0/1 Init:CrashLoopBackOff 8 59m onap onap-pomba-networkdiscovery-85d76975b7-mfk92 1/2 CrashLoopBackOff 11 59m onap onap-pomba-networkdiscoveryctxbuilder-c89786dfc-qnlx9 1/2 CrashLoopBackOff 10 59m onap onap-vid-84c88db589-8cpgr 1/2 CrashLoopBackOff 9 59m ubuntu@ip-172-31-20-218:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 125869392 0 125869392 0% /dev tmpfs 25175472 54680 25120792 1% /run /dev/xvda1 121914320 91698036 30199900 76% / tmpfs 125877348 30312 125847036 1% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 125877348 0 125877348 0% /sys/fs/cgroup tmpfs 25175472 0 25175472 0% /run/user/1000 todo: verify the release is there after a helm install - as the configMap size issue is breaking the release for now
Prerequisites
Create a single VM - 256G+
20181015
ubuntu@a-onap-dmz-nodelete:~$ ./oom_deployment.sh -b master -s att.onap.cloud -e onap -r a_ONAP_CD_master -t _arm_deploy_onap_cd.json -p _arm_deploy_onap_cd_z_parameters.json # register the IP to DNS with route53 for att.onap.info - using this for the ONAP academic summit on the 22nd 13.68.113.104 = att.onap.cloud
Scripted undercloud(Helm/Kubernetes/Docker) and ONAP install - clustered
Prerequisites
Add an NFS (EFS on AWS) share
Create a 1 + N cluster
See recommended cluster configurations on ONAP Deployment Specification for Finance and Operations#AmazonAWS
Create a 0.0.0.0/0 ::/O open security group
Use github to OAUTH authenticate your cluster just after installing it.
Last tested on ld.onap.info 20181029
# 0 - verify the security group has all protocols (TCP/UCP) for 0.0.0.0/0 and ::/0 # 1 - configure master - 15 min sudo git clone https://gerrit.onap.org/r/logging-analytics sudo logging-analytics/deploy/rancher/oom_rancher_setup.sh -b master -s <your domain/ip> -e onap # on a 64G R4.2xlarge vm - 23 min later k8s cluster is up kubectl get pods --all-namespaces kube-system heapster-76b8cd7b5-g7p6n 1/1 Running 0 8m kube-system kube-dns-5d7b4487c9-jjgvg 3/3 Running 0 8m kube-system kubernetes-dashboard-f9577fffd-qldrw 1/1 Running 0 8m kube-system monitoring-grafana-997796fcf-g6tr7 1/1 Running 0 8m kube-system monitoring-influxdb-56fdcd96b-x2kvd 1/1 Running 0 8m kube-system tiller-deploy-54bcc55dd5-756gn 1/1 Running 0 2m # 2 - secure via github oauth the master - immediately to lock out crypto miners http://ld.onap.info:8880 # 3 - delete the master from the hosts in rancher http://ld.onap.info:8880 # 4 - create NFS share on master https://us-east-2.console.aws.amazon.com/efs/home?region=us-east-2#/filesystems/fs-92xxxxx # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n false -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # 5 - create NFS share and register each node - do this for all nodes sudo git clone https://gerrit.onap.org/r/logging-analytics # add -h 1.2.10 (if upgrading from 1.6.14 to 1.6.18 of rancher) sudo logging-analytics/deploy/aws/oom_cluster_host_install.sh -n true -s <your domain/ip> -e fs-nnnnnn1b -r us-west-1 -t 371AEDC88zYAZdBXPM -c true -v true # it takes about 1 min to run the script and 1 minute for the etcd and healthcheck containers to go green on each host # check the master cluster kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-172-31-19-9.us-east-2.compute.internal 9036m 56% 53266Mi 43% ip-172-31-21-129.us-east-2.compute.internal 6840m 42% 47654Mi 38% ip-172-31-18-85.us-east-2.compute.internal 6334m 39% 49545Mi 40% ip-172-31-26-114.us-east-2.compute.internal 3605m 22% 25816Mi 21% # fix helm on the master after adding nodes to the master - only if the server helm version is less than the client helm version (rancher 1.6.18 does not have this issue) ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"} ubuntu@ip-172-31-14-89:~$ sudo helm init --upgrade $HELM_HOME has been configured at /home/ubuntu/.helm. Tiller (the Helm server-side component) has been upgraded to the current version. ubuntu@ip-172-31-14-89:~$ sudo helm version Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"} # 7a - manual: follow the helm plugin page # https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins sudo git clone https://gerrit.onap.org/r/oom sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm cd oom/kubernetes sudo helm serve & sudo make all sudo make onap sudo helm deploy onap local/onap --namespace onap fetching local/onap release "onap" deployed release "onap-aaf" deployed release "onap-aai" deployed release "onap-appc" deployed release "onap-clamp" deployed release "onap-cli" deployed release "onap-consul" deployed release "onap-contrib" deployed release "onap-dcaegen2" deployed release "onap-dmaap" deployed release "onap-esr" deployed release "onap-log" deployed release "onap-msb" deployed release "onap-multicloud" deployed release "onap-nbi" deployed release "onap-oof" deployed release "onap-policy" deployed release "onap-pomba" deployed release "onap-portal" deployed release "onap-robot" deployed release "onap-sdc" deployed release "onap-sdnc" deployed release "onap-sniro-emulator" deployed release "onap-so" deployed release "onap-uui" deployed release "onap-vfc" deployed release "onap-vid" deployed release "onap-vnfsdk" deployed # 7b - automated: after cluster is up - run cd.sh script to get onap up - customize your values.yaml - the 2nd time you run the script # clean install - will clone new oom repo # get the dev.yaml and set any pods you want up to true as well as fill out the openstack parameters sudo wget https://git.onap.org/oom/plain/kubernetes/onap/resources/environments/dev.yaml sudo cp logging-analytics/deploy/cd.sh . sudo ./cd.sh -b master -e onap -c true -d true -w true # rerun install - no delete of oom repo sudo ./cd.sh -b master -e onap -c false -d true -w true
Deployment Integrity based on Pod Dependencies
20181213 running 3.0.0-ONAP
Links
- LOG-899Getting issue details... STATUS
- LOG-898Getting issue details... STATUS
- OOM-1547Getting issue details... STATUS
- OOM-1543Getting issue details... STATUS
Patches
Windriver openstack heat template 1+13 vms
https://gerrit.onap.org/r/#/c/74781/
docker prepull script – run before cd.sh - https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh
https://gerrit.onap.org/r/#/c/74780/
Not merged with the heat template until the following nexus3 slowdown is addressed
https://jira.onap.org/browse/TSC-79
Base Platform First
Bring up dmaap and aaf first and the rest of the pods in the following order.
Every 2.0s: helm list Fri Dec 14 15:19:49 2018 NAME REVISION UPDATED STATUS CHART NAMESPACE onap 2 Fri Dec 14 15:10:56 2018 DEPLOYED onap-3.0.0 onap onap-aaf 1 Fri Dec 14 15:10:57 2018 DEPLOYED aaf-3.0.0 onap onap-dmaap 2 Fri Dec 14 15:11:00 2018 DEPLOYED dmaap-3.0.0 onap onap onap-aaf-aaf-cm-5c65c9dc55-snhlj 1/1 Running 0 10m onap onap-aaf-aaf-cs-7dff4b9c44-85zg2 1/1 Running 0 10m onap onap-aaf-aaf-fs-ff6779b94-gz682 1/1 Running 0 10m onap onap-aaf-aaf-gui-76cfcc8b74-wn8b8 1/1 Running 0 10m onap onap-aaf-aaf-hello-5d45dd698c-xhc2v 1/1 Running 0 10m onap onap-aaf-aaf-locate-8587d8f4-l4k7v 1/1 Running 0 10m onap onap-aaf-aaf-oauth-d759586f6-bmz2l 1/1 Running 0 10m onap onap-aaf-aaf-service-546f66b756-cjppd 1/1 Running 0 10m onap onap-aaf-aaf-sms-7497c9bfcc-j892g 1/1 Running 0 10m onap onap-aaf-aaf-sms-preload-vhbbd 0/1 Completed 0 10m onap onap-aaf-aaf-sms-quorumclient-0 1/1 Running 0 10m onap onap-aaf-aaf-sms-quorumclient-1 1/1 Running 0 8m onap onap-aaf-aaf-sms-quorumclient-2 1/1 Running 0 6m onap onap-aaf-aaf-sms-vault-0 2/2 Running 1 10m onap onap-aaf-aaf-sshsm-distcenter-27ql7 0/1 Completed 0 10m onap onap-aaf-aaf-sshsm-testca-mw95p 0/1 Completed 0 10m onap onap-dmaap-dbc-pg-0 1/1 Running 0 17m onap onap-dmaap-dbc-pg-1 1/1 Running 0 15m onap onap-dmaap-dbc-pgpool-c5f8498-fn9cn 1/1 Running 0 17m onap onap-dmaap-dbc-pgpool-c5f8498-t9s27 1/1 Running 0 17m onap onap-dmaap-dmaap-bus-controller-59c96d6b8f-9xsxg 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-db-557c66dc9d-gvb9f 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-node-6496d8f55b-ffgfr 1/1 Running 0 17m onap onap-dmaap-dmaap-dr-prov-86f79c47f9-zb8p7 1/1 Running 0 17m onap onap-dmaap-message-router-5fb78875f4-lvsg6 1/1 Running 0 17m onap onap-dmaap-message-router-kafka-7964db7c49-n8prg 1/1 Running 0 17m onap onap-dmaap-message-router-zookeeper-5cdfb67f4c-5w4vw 1/1 Running 0 17m onap-msb 2 Fri Dec 14 15:31:12 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 4m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 4m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 4m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 4m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 4m onap-aai 1 Fri Dec 14 15:30:59 2018 DEPLOYED aai-3.0.0 onap onap onap-aai-aai-54b7bf7779-bfbmg 1/1 Running 0 2m onap onap-aai-aai-babel-6bbbcf5d5c-sp676 2/2 Running 0 13m onap onap-aai-aai-cassandra-0 1/1 Running 0 13m onap onap-aai-aai-cassandra-1 1/1 Running 0 12m onap onap-aai-aai-cassandra-2 1/1 Running 0 9m onap onap-aai-aai-champ-54f7986b6b-wql2b 2/2 Running 0 13m onap onap-aai-aai-data-router-f5f75c9bd-l6ww7 2/2 Running 0 13m onap onap-aai-aai-elasticsearch-c9bf9dbf6-fnj8r 1/1 Running 0 13m onap onap-aai-aai-gizmo-5f8bf54f6f-chg85 2/2 Running 0 13m onap onap-aai-aai-graphadmin-9b956d4c-k9fhk 2/2 Running 0 13m onap onap-aai-aai-graphadmin-create-db-schema-s2nnw 0/1 Completed 0 13m onap onap-aai-aai-modelloader-644b46df55-vt4gk 2/2 Running 0 13m onap onap-aai-aai-resources-745b6b4f5b-rj7lm 2/2 Running 0 13m onap onap-aai-aai-search-data-559b8dbc7f-l6cqq 2/2 Running 0 13m onap onap-aai-aai-sparky-be-75658695f5-z2xv4 2/2 Running 0 13m onap onap-aai-aai-spike-6778948986-7h7br 2/2 Running 0 13m onap onap-aai-aai-traversal-58b97f689f-jlblx 2/2 Running 0 13m onap onap-aai-aai-traversal-update-query-data-7sqt5 0/1 Completed 0 13m onap-msb 5 Fri Dec 14 15:51:42 2018 DEPLOYED msb-3.0.0 onap onap onap-msb-kube2msb-5c79ddd89f-dqhm6 1/1 Running 0 18m onap onap-msb-msb-consul-6949bd46f4-jk6jw 1/1 Running 0 18m onap onap-msb-msb-discovery-86c7b945f9-bc4zq 2/2 Running 0 18m onap onap-msb-msb-eag-5f86f89c4f-fgc76 2/2 Running 0 18m onap onap-msb-msb-iag-56cdd4c87b-jsfr8 2/2 Running 0 18m onap-esr 3 Fri Dec 14 15:51:40 2018 DEPLOYED esr-3.0.0 onap onap onap-esr-esr-gui-6c5ccd59d6-6brcx 1/1 Running 0 2m onap onap-esr-esr-server-5f967d4767-ctwp6 2/2 Running 0 2m onap-robot 2 Fri Dec 14 15:51:48 2018 DEPLOYED robot-3.0.0 onap onap onap-robot-robot-ddd948476-n9szh 1/1 Running 0 11m onap-multicloud 1 Fri Dec 14 15:51:43 2018 DEPLOYED multicloud-3.0.0 onap
Tiller requires wait states between deployments
There is a patch going into 3.0.1 to delay deployments to not overload tiller 3+ seconds
sudo cp -R ~/oom/kubernetes/helm/plugins/ ~/.helm sudo vi ~/.helm/plugins/deploy/deploy.sh
Use public-cloud.yaml override
Note: your HD/SSD, ram and cpu configuration will drastically affect deployment. For example if you are cpu starved - the idle state of onap will delay pods as more come in - additionally network bandwidth to pull docker containers will be significant - and PV creation is sensitive to FS throughput/lag.
Some of the internal pod timings are optimized for certain azure deployment
https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/public-cloud.yaml
Optimizing Docker Image Pulls
Verify if the integration docker csv manifest is the truth or the oom repo values.yaml (no override required?)
- TSC-86Getting issue details... STATUS
Nexus Proxy
Soleil, Alain (Deactivated) pointed out the proxy page (was using commercial nexus3) - ONAP OOM Beijing - Hosting docker images locally - I had about 4 jiras on this and forgot about them.
20190121:
Answered John Lotoski for EKS and his other post on nexus3 proxy failures - looks like an issue with a double proxy between dockerhub - or an issue specific to the dockerhub/registry:2 container - https://lists.onap.org/g/onap-discuss/topic/registry_issue_few_images/29285134?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29285134 |
Running
- LOG-355Getting issue details... STATUS
nexus3.onap.info:5000 - my private AWS nexus3 proxy of nexus3.onap.org:10001
nexus3.onap.cloud:5000 - azure public proxy - filled with casablanca (will retire after Jan 2)
nexus4.onap.cloud:5000 - azure public proxy - filled with master - and later casablanca
nexus3windriver.onap.cloud:5000 - windriver/openstack lab inside the firewall to use only for the lab - access to public is throttled
Nexus3 proxy setup - host
# from a clean ubuntu 16.04 VM # install docker sudo curl https://releases.rancher.com/install-docker/17.03.sh | sh sudo usermod -aG docker ubuntu # install nexus mkdir -p certs openssl req -newkey rsa:4096 -nodes -sha256 -keyout certs/domain.key -x509 -days 365 -out certs/domain.crt Common Name (e.g. server FQDN or YOUR name) []:nexus3.onap.info sudo nano /etc/hosts sudo docker run -d --restart=unless-stopped --name registry -v `pwd`/certs:/certs -e REGISTRY_HTTP_ADDR=0.0.0.0:5000 -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key -e REGISTRY_PROXY_REMOTEURL=https://nexus3.onap.org:10001 -p 5000:5000 registry:2 sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7f9b0e97eb7f registry:2 "/entrypoint.sh /e..." 8 seconds ago Up 7 seconds 0.0.0.0:5000->5000/tcp registry # test it sudo docker login -u docker -p docker nexus3.onap.info:5000 Login Succeeded # get images from https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=casablanca # use for example the first line onap/aaf/aaf_agent,2.1.8 # or the prepull script in https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 819d6de9e493: Downloading [======================================> ] 770.7 kB/1.012 MB # list sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry 2 2e2f252f3c88 3 months ago 33.3 MB # prepull to cache images on the server - in this case casablanca branch sudo wget https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh sudo chmod 777 docker_prepull.sh # prep - same as client vms - the cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp certs/domain.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 # prepull sudo nohup ./docker_prepull.sh -b casablanca -s nexus3.onap.cloud:5000 &
Nexus3 proxy usage per cluster node
Cert is on - TSC-79Getting issue details... STATUS
# on each host # Cert is on TSC-79 sudo wget https://jira.onap.org/secure/attachment/13127/domain_nexus3_onap_cloud.crt # or if you already have it scp domain_nexus3_onap_cloud.crt ubuntu@ld3.onap.cloud:~/ # to avoid sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Error response from daemon: Get https://nexus3.onap.cloud:5000/v1/users/: x509: certificate signed by unknown authority # cp cert sudo mkdir /etc/docker/certs.d sudo mkdir /etc/docker/certs.d/nexus3.onap.cloud:5000 sudo cp domain_nexus3_onap_cloud.crt /etc/docker/certs.d/nexus3.onap.cloud:5000/ca.crt sudo systemctl restart docker sudo docker login -u docker -p docker nexus3.onap.cloud:5000 Login Succeeded # testing # vm with the image existing - 2 sec ubuntu@ip-172-31-33-46:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # vm with layers existing except for last 5 - 5 sec ubuntu@a-cd-master:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Already exists .. 20 49e90af50c7d: Already exists .... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 # clean AWS VM (clean install of docker) - no pulls yet - 45 sec for everything ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pulling fs layer 0addb6fece63: Pulling fs layer 78e58219b215: Pulling fs layer eb6959a66df2: Pulling fs layer 321bd3fd2d0e: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.cloud:5000/onap/aaf/aaf_agent:2.1.8 ubuntu@ip-172-31-14-34:~$ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE nexus3.onap.cloud:5000/onap/aaf/aaf_agent 2.1.8 090b326a7f11 5 weeks ago 1.14 GB # going to test a same size image directly from the LF - with minimal common layers nexus3.onap.org:10001/onap/testsuite 1.3.2 c4b58baa95e8 3 weeks ago 1.13 GB # 5 min in we are still at 3% - numbers below are a min old ubuntu@ip-172-31-14-34:~$ sudo docker pull nexus3.onap.org:10001/onap/testsuite:1.3.2 1.3.2: Pulling from onap/testsuite 32802c0cfa4d: Downloading [=============> ] 8.416 MB/32.1 MB da1315cffa03: Download complete fa83472a3562: Download complete f85999a86bef: Download complete 3eca7452fe93: Downloading [=======================> ] 8.517 MB/17.79 MB 9f002f13a564: Downloading [=========================================> ] 8.528 MB/10.24 MB 02682cf43e5c: Waiting .... 754645df4601: Waiting # in 5 min we get 3% 35/1130Mb - which comes out to 162 min for 1.13G for .org as opposed to 45 sec for .info - which is a 200X slowdown - some of this is due to the fact my nexus3.onap.info is on the same VPC as my test VM - testing on openlab # openlab - 2 min 40 sec which is 3.6 times slower - expected than in AWS - (25 min pulls vs 90min in openlab) - this makes nexus.onap.org 60 times slower in openlab than a proxy running from AWS (2 vCore/16G/ssd VM) ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 2.1.8: Pulling from onap/aaf/aaf_agent 18d680d61657: Pull complete ... acb05d09ff6e: Pull complete Digest: sha256:71781f3cfa51066abb1a4a35267af37beec01b6bb75817fdfae056582839290c Status: Downloaded newer image for nexus3.onap.info:5000/onap/aaf/aaf_agent:2.1.8 #pulling smaller from nexus3.onap.info 2 min 20 - for 36Mb = 0.23Mb/sec - extrapolated to 1.13Gb for above is 5022 sec or 83 min - half the rough calculation above ubuntu@onap-oom-obrien-rancher-e4:~$ sudo docker pull nexus3.onap.org:10001/onap/aaf/sms:3.0.1 3.0.1: Pulling from onap/aaf/sms c67f3896b22c: Pull complete ... 76eeb922b789: Pull complete Digest: sha256:d5b64947edb93848acacaa9820234aa29e58217db9f878886b7bafae00fdb436 Status: Downloaded newer image for nexus3.onap.org:10001/onap/aaf/sms:3.0.1 # conclusion - nexus3.onap.org is experiencing a routing issue from their DC outbound causing a 80-100x slowdown over a proxy nexus3 - since 20181217 - as local jenkins.onap.org builds complete faster # workaround is to use a nexus3 proxy above
and adding to values.yaml
global: #repository: nexus3.onap.org:10001 repository: nexus3.onap.cloud:5000 repositoryCred: user: docker password: docker
windriver lab also has a network issue (for example if i pull from nexus3.onap.cloud:5000 (azure) into an aws EC2 instance - 45 sec for 1.1G - If I pull the same in an openlab VM - on the order of 10+ min) - therefore you need a local nexus3 proxy if you are inside the openstack lab - I have registered nexus3windriver.onap.cloud:5000 to a nexus3 proxy in my logging tenant - cert above
Docker Prepull
https://git.onap.org/logging-analytics/plain/deploy/docker_prepull.sh
using
via
https://gerrit.onap.org/r/#/c/74780/
- LOG-905Getting issue details... STATUS
git clone ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics cd logging-analytics git pull ssh://michaelobrien@gerrit.onap.org:29418/logging-analytics refs/changes/80/74780/1 ubuntu@onap-oom-obrien-rancher-e0:~$ sudo nohup ./docker_prepull.sh & [1] 14488 ubuntu@onap-oom-obrien-rancher-e0:~$ nohup: ignoring input and appending output to 'nohup.out'
POD redeployment/undeploy/deploy
If you need to redeploy a pod due to a job timeout, failure or to pickup a config/code change - delete the /dockerdata-nfs/*-aai for example subdirectory - so that a db restart for example does not run into existing data issues.
sudo chmod -R 777 /dockerdata-nfs sudo rm -rf /dockerdata-nfs/onap-aai
Casablanca Deployment Examples
Deploy to 13+1 cluster
Deploy as one with deploy.sh delays and public.cloud.yaml - single 500G server AWS
sudo helm deploy onap local/onap --namespace $ENVIRON -f ../../dev.yaml -f onap/resources/environments/public-cloud.yaml where dev.yaml is the same as in resources but with all components turned on and IfNotPresent instead of Always
Deploy in sequence with validation on previous pod before proceeding - single 500G server AWS
we are not using the public-cloud.yaml override here - to verify just timing between deploys in this case - each pod waits for the previous to complete so resources are not in contention
see update to
https://git.onap.org/logging-analytics/tree/deploy/cd.sh
https://gerrit.onap.org/r/#/c/75422
DEPLOY_ORDER_POD_NAME_ARRAY=('robot consul aaf dmaap dcaegen2 msb aai esr multicloud oof so sdc sdnc vid policy portal log vfc uui vnfsdk appc clamp cli pomba vvp contrib sniro-emulator') # don't count completed pods DEPLOY_NUMBER_PODS_DESIRED_ARRAY=(1 4 13 11 13 5 15 2 6 17 10 12 11 2 8 6 3 18 2 5 5 5 1 11 11 3 1) # account for podd that have varying deploy times or replicaset sizes # don't count the 0/1 completed pods - and skip most of the ResultSet instances except 1 # dcae boostrap is problematic DEPLOY_NUMBER_PODS_PARTIAL_ARRAY=(1 2 11 9 13 5 11 2 6 16 10 12 11 2 8 6 3 18 2 5 5 5 1 9 11 3 1)
Deployment in sequence to Windriver Lab
Note: the Windriver Openstack lab requires that host registration occurs against the private network 10.0.0.0/16 not the 10.12.0.0/16 public network - this is fine in Azure/AWS but not in openstack
The docs will be adjusted - OOM-1550Getting issue details... STATUS
This is bad - public IP based cluster
This is good - private IP based cluster
Openstack/Windriver HEAT template for 13+1 kubernetes cluster
https://jira.onap.org/secure/attachment/13010/logging_openstack_13_16g.yaml
- LOG-324Getting issue details... STATUS
see
https://gerrit.onap.org/r/74781
obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack stack create -t logging_openstack_13_16g.yaml -e logging_openstack_oom.env OOM20181216-13 +---------------------+-----------------------------------------+ | Field | Value | +---------------------+-----------------------------------------+ | id | ed6aa689-2e2a-4e75-8868-9db29607c3ba | | stack_name | OOM20181216-13 | | description | Heat template to install OOM components | | creation_time | 2018-12-16T19:42:27Z | | updated_time | 2018-12-16T19:42:27Z | | stack_status | CREATE_IN_PROGRESS | | stack_status_reason | Stack CREATE started | +---------------------+-----------------------------------------+ obrienbiometrics:onap_oom-714_heat michaelobrien$ openstack server list +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+ | 7695cf14-513e-4fea-8b00-6c2a25df85d3 | onap-oom-obrien-rancher-e13 | ACTIVE | oam_onap_RNa3=10.0.0.23, 10.12.7.14 | ubuntu-16-04-cloud-amd64 | | 1b70f179-007c-4975-8e4a-314a57754684 | onap-oom-obrien-rancher-e7 | ACTIVE | oam_onap_RNa3=10.0.0.10, 10.12.7.36 | ubuntu-16-04-cloud-amd64 | | 17c77bd5-0a0a-45ec-a9c7-98022d0f62fe | onap-oom-obrien-rancher-e2 | ACTIVE | oam_onap_RNa3=10.0.0.9, 10.12.6.180 | ubuntu-16-04-cloud-amd64 | | f85e075f-e981-4bf8-af3f-e439b7b72ad2 | onap-oom-obrien-rancher-e9 | ACTIVE | oam_onap_RNa3=10.0.0.6, 10.12.5.136 | ubuntu-16-04-cloud-amd64 | | 58c404d0-8bae-4889-ab0f-6c74461c6b90 | onap-oom-obrien-rancher-e6 | ACTIVE | oam_onap_RNa3=10.0.0.19, 10.12.5.68 | ubuntu-16-04-cloud-amd64 | | b91ff9b4-01fe-4c34-ad66-6ffccc9572c1 | onap-oom-obrien-rancher-e4 | ACTIVE | oam_onap_RNa3=10.0.0.11, 10.12.7.35 | ubuntu-16-04-cloud-amd64 | | d9be8b3d-2ef2-4a00-9752-b935d6dd2dba | onap-oom-obrien-rancher-e0 | ACTIVE | oam_onap_RNa3=10.0.16.1, 10.12.7.13 | ubuntu-16-04-cloud-amd64 | | da0b1be6-ec2b-43e6-bb3f-1f0626dcc88b | onap-oom-obrien-rancher-e1 | ACTIVE | oam_onap_RNa3=10.0.0.16, 10.12.5.10 | ubuntu-16-04-cloud-amd64 | | 0ffec4d0-bd6f-40f9-ab2e-f71aa5b9fbda | onap-oom-obrien-rancher-e5 | ACTIVE | oam_onap_RNa3=10.0.0.7, 10.12.6.248 | ubuntu-16-04-cloud-amd64 | | 125620e0-2aa6-47cf-b422-d4cbb66a7876 | onap-oom-obrien-rancher-e8 | ACTIVE | oam_onap_RNa3=10.0.0.8, 10.12.6.249 | ubuntu-16-04-cloud-amd64 | | 1efe102a-d310-48d2-9190-c442eaec3f80 | onap-oom-obrien-rancher-e12 | ACTIVE | oam_onap_RNa3=10.0.0.5, 10.12.5.167 | ubuntu-16-04-cloud-amd64 | | 7c248d1d-193a-415f-868b-a94939a6e393 | onap-oom-obrien-rancher-e3 | ACTIVE | oam_onap_RNa3=10.0.0.3, 10.12.5.173 | ubuntu-16-04-cloud-amd64 | | 98dc0aa1-e42d-459c-8dde-1a9378aa644d | onap-oom-obrien-rancher-e11 | ACTIVE | oam_onap_RNa3=10.0.0.12, 10.12.6.179 | ubuntu-16-04-cloud-amd64 | | 6799037c-31b5-42bd-aebf-1ce7aa583673 | onap-oom-obrien-rancher-e10 | ACTIVE | oam_onap_RNa3=10.0.0.13, 10.12.6.167 | ubuntu-16-04-cloud-amd64 | +--------------------------------------+-----------------------------+--------+--------------------------------------+--------------------------+
# 13+1 vms on openlab available as of 20181216 - running 2 separate clusters # 13+1 all 16g VMs # 4+1 all 32g VMs # master undercloud sudo git clone https://gerrit.onap.org/r/logging-analytics sudo cp logging-analytics/deploy/rancher/oom_rancher_setup.sh . sudo ./oom_rancher_setup.sh -b master -s 10.12.7.13 -e onap # master nfs sudo wget https://jira.onap.org/secure/attachment/12887/master_nfs_node.sh sudo chmod 777 master_nfs_node.sh sudo ./master_nfs_node.sh 10.12.5.10 10.12.6.180 10.12.5.173 10.12.7.35 10.12.6.248 10.12.5.68 10.12.7.36 10.12.6.249 10.12.5.136 10.12.6.167 10.12.6.179 10.12.5.167 10.12.7.14 #sudo ./master_nfs_node.sh 10.12.5.162 10.12.5.198 10.12.5.102 10.12.5.4 # slaves nfs sudo wget https://jira.onap.org/secure/attachment/12888/slave_nfs_node.sh sudo chmod 777 slave_nfs_node.sh sudo ./slave_nfs_node.sh 10.12.7.13 #sudo ./slave_nfs_node.sh 10.12.6.125 # test it ubuntu@onap-oom-obrien-rancher-e4:~$ sudo ls /dockerdata-nfs/ test.sh # remove client from master node ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e0 Ready <none> 5m v1.11.5-rancher1 ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system heapster-7b48b696fc-2z47t 1/1 Running 0 5m kube-system kube-dns-6655f78c68-gn2ds 3/3 Running 0 5m kube-system kubernetes-dashboard-6f54f7c4b-sfvjc 1/1 Running 0 5m kube-system monitoring-grafana-7877679464-872zv 1/1 Running 0 5m kube-system monitoring-influxdb-64664c6cf5-rs5ms 1/1 Running 0 5m kube-system tiller-deploy-6f4745cbcf-zmsrm 1/1 Running 0 5m # after master removal from hosts - expected no nodes ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes error: the server doesn't have a resource type "nodes" # slaves rancher client - 1st node # register on the private network not the public IP # notice the CATTLE_AGENT sudo docker run -e CATTLE_AGENT_IP="10.0.0.7" --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.2.11 http://10.0.16.1:8880/v1/scripts/5A5E4F6388A4C0A0F104:1514678400000:9zpsWeGOsKVmWtOtoixAUWjPJs ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 0s v1.11.5-rancher1 # add the other nodes # the 4 node 32g = 128g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION onap-oom-obrien-rancher-e1 Ready <none> 1h v1.11.5-rancher1 onap-oom-obrien-rancher-e2 Ready <none> 4m v1.11.5-rancher1 onap-oom-obrien-rancher-e3 Ready <none> 5m v1.11.5-rancher1 onap-oom-obrien-rancher-e4 Ready <none> 3m v1.11.5-rancher1 # the 13 node 16g = 208g cluster ubuntu@onap-oom-obrien-rancher-e0:~$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% onap-oom-obrien-rancher-e1 208m 2% 2693Mi 16% onap-oom-obrien-rancher-e10 38m 0% 1083Mi 6% onap-oom-obrien-rancher-e11 36m 0% 1104Mi 6% onap-oom-obrien-rancher-e12 57m 0% 1070Mi 6% onap-oom-obrien-rancher-e13 116m 1% 1017Mi 6% onap-oom-obrien-rancher-e2 73m 0% 1361Mi 8% onap-oom-obrien-rancher-e3 62m 0% 1099Mi 6% onap-oom-obrien-rancher-e4 74m 0% 1370Mi 8% onap-oom-obrien-rancher-e5 37m 0% 1104Mi 6% onap-oom-obrien-rancher-e6 55m 0% 1125Mi 7% onap-oom-obrien-rancher-e7 42m 0% 1102Mi 6% onap-oom-obrien-rancher-e8 53m 0% 1090Mi 6% onap-oom-obrien-rancher-e9 52m 0% 1072Mi 6%
Installing ONAP via cd.sh
The cluster hosting kubernetes is up with 13+1 nodes and 2 network interfaces (the private 10.0.0.0/16 subnet and the 10.12.0.0/16 public subnet)
Verify kubernetes hosts are ready