Tooling & best practices dealing with ONAP supervizion
Currently the administration guide of ONAP is poor, it could make sense to plan an update of the documentation in the future.
This page is to collect the tooling & best practices dealing with ONAP supervizion from an admin perspective in a production or pre-production context from each lab。
We thought of the following questions, if you have other better suggestions or experience, welcome feedback。
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion
Do you use a ligther tools to check the end points for the ONAP end users?
China Mobile Lab ( @LUKAI @Yan Yang)
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...? We actively use Zabbix for monitoring
What is your feedback on the tools?
we use Zabbix monitors the CPU, memory, and disk information of k8s nodes mainly by triggering alarm and sending messages by emails or others ways when a montior thereshold is exceed to users who is maintaining the env, and help uses to solve the problems.
how do you manage Infra versus ONAP versus VNF supervizion?Not considered yet.
Do you use a ligther tools to check the end points for the ONAP end users?
Not considered yet.
Orange OpenLab (@Morgan Richomme @Eric Debeau)
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
ONAP Integration/Windriver Lab(@Stephen Gooch )
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
China Telecom Lab( Yi Yang (yangyi.bri@chinatelecom.cn), Luman Wang (wanglm.bri@chinatelecom.cn))
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
TLAB(@Rich Bennett ,John Murray (JM2932@att.com) )
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
WINLAB(@Tracy Van Brakle ,Ivan Seskar (seskar@winlab.rutgers.edu))
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
Auto Lab(Paul Vaduva (paul.vaduva@enea.com))
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
ONAP Lab for OVP(@Sriram Rupanagunta)
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
What is your feedback on the tools?
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?
Bell Canada production ONAP(@Jean-Francois Patenaude)
Which tools are used: Nagios, Prometheus, Zabbix, Consul, proprietary third party tooling,...?
We actively use:
Zabbix for monitoring
Jira Service Desk for ticketing
Slack for live notifications
OpsGenie for alerting (Page)
We also use:
Elastic (ELK) for pods logs aggregation and some dashboards
Prometheus/Grafana for some dashboards
What is your feedback on the tools?
Zabbix has a steep learning curve but when you know how it works, it is very powerful. Triggers are easy to configure and they can be advanced if you need. It's even possible to do nice dynamic dashboards like this:
For info: all the green checkmarks are actual live monitoring. For instance, if an ONAP component gets in problem, the green checkmarkis replaced by a red X mark
.
These kind of dashboards let you quickly diagnose where the problems are located.We configured Zabbix to only collect metrics related to things it will alert on. It's nearly a 1:1 relationship between what we gather and what we alert on.
Prometheus/Grafana is useful to gather lots of metrics and build dashboard. We are not currently alerting using these metrics. When there is a metric we like, we send it in Zabbix instead for the alerting.
how do you manage Infra versus ONAP versus VNF supervizion?
Do you use a ligther tools to check the end points for the ONAP end users?