Thanks to all who joined us and submitted questions to the Nokia Vitrage & CloudBand demo Q&A session. During the DemoFriday, we learned all about the need for root cause analysis (RCA) within the Telco cloud, what project Vitrage is about, and CloudBand’s network function virtualization (NFV) orchestration platform. After the live demonstration from Nokia, there was a thoroughly engaging Q&A session. Unfortunately, SDxCentral ran out of time to be able to ask all the questions the audience asked. Read the full Q&A below:
Is the solution scalable to large NFV infrastructures?
Is there any overlap with other OpenStack components like Ceilometer and Monasca?
Nokia: We don’t have any overlapping components with Ceilomer and Monasca. Since one of the most interesting part of alarms monitoring is the deduced alarms and RCA, I believe that over time Monasca and AODH might want to develop these capabilities too. We hope that they will use our solution.
Is Vitrage currently ready to be installed and integrated into an existing OpenStack setup?
Nokia: Not yet. It’s still in development. We are working towards making a Vitrage install by Mitaka.
How difficult is it to add a plugin for another information source, such as Zabbix, Ganglia?
Nokia: Not too difficult, but still requires programming. As most of OpenStack is source code, the plugin code would need to be in Python.
Who is going to write the root cause analysis templates?
Nokia: The initial set of templates comes out of the box and is currently being written by us here in CloudBand. Anyone can add templates for additional use cases and contribute, we’ve made it fairly simple.
What is the min/max event propagation time through Vitrage?
Nokia: In CloudBand, once we see the alarm/problem, we immediately propagate it, raising additional alarms on relevant resources, change resource states and notify external managers. The time it takes to see the original alarm/error varies though. Some are seen immediately, while other might take up to a minute. In Vitrage, most of the alarms/errors will be acknowledged much faster, and following these AODH/other will be immediately notified. Alarms from Nagios though, such as Switches states and NetApp state, might take up to half a minute to notice. The Nagios sampling interval is configurable.
Is project Vitrage made available to OpenStack community or is it Cloudband’s own asset?
Nokia: It’s open source and will be made available to OpenStack community. It’s not “CloudBand’s proprietary”. Everybody can pitch in and contribute to make it better and more complete.
Does Vitrage support TOSCA templates?
Nokia: Project Vitrage does not support TOSCA templates. There is also no plan to support TOSCA templates in the near future.
I see that we can see VM performance degradation. What about network? If e.g. compute is fine, but network segment is having issue with performance or having congestion… Can we see it and estimate impact on the connected VMs (of course if we have template defined)?
Nokia: Yes, we do support in CloudBand, via templates, as you mentioned, network fault monitoring. For example, we support in real time, when there are problems with a switch, raising alarms on all of the VMs defined on the hosts connected to that switch. The same capabilities will be available, over time, in Vitrage too.
How would you incorporate more complex relationships into graph nodes, such as redundancy configs/states (e.g. fault tree like gates)?
Nokia: Complex relationships can be supported by a combination of enriching the Entity Graph on the one hand and writing complex conditions in the Vitrage Templates. The former can be done via enhancing of the plugins and using simpler templates to “bootstrap” support for complex relationships. The main support comes from the latter, however, as conditions in Vitrage templates will support AND, OR and NOT operators, and thus should support a wide range of conditions, specifically fault tree like gates.
To which extent, the topology of the virtual system is automatically discovered by Vitrage?
Nokia: The virtual system topology is discovered via synchronizers, which receive updates from different sources (e.g., Nova, Cinder, Neutron, Nagios, etc) and link resources one to another. As we add more synchronizers, so will the Vitrage automation grow.
The VM “hard drivers” is being replicated? It should not to be redistributed within CEPH storage cloud?
Nokia: That is correct. To clarify what you saw in the demo, when the physical host goes down, this affects other physical hosts as the other CEPH nodes rebuild and re-replicate data on the smaller set of nodes. The impact on other VMs is simply that as the VMs rely on getting services from the physical host, the load that CEPH produces on these hosts can impact the VM performance.