OOF - MultiCloud interaction in R2

Homing Policies and corresponding information required from MultiCloud for Homing in R2:

Latency (air distance)
- Location/Coordinates of customer
  - MultiCloud may not have an active role in providing this information, which would come from SO as a part of the customer order
  - [Ethan]: MultiCloud won’t populating that information.
- Location of existing service instances (or the cloud-region)
  - [Ethan]: The relationship information between VNF and cloud-region was populated by SO if I remember correctly.
  - In AAI, each cloud-region (GET /cloud-infrastructure/cloud-regions/cloud-region/{cloud-owner}/{cloud-region-id}) record points to a complex (GET /cloud-infrastructure/complexes/complex/{physical-location-id}) and the complex relates to a lat and long. Is it correct that a VIM needs to register with MultiVIM? Does that registration have enough information for MultiVIM to create the region and complex data in A&AI?
  - Example: service instance → cloud-region → complex → (lat,long)
  - [Ethan]: Currently we use A&AI/ESR portal to register a new VIM, some basic information like cloud-owner and cloud-region and credentials could be input from this portal and put into A&AI db, but not the complex info. After we nail down the information that OOF might need, we can discuss with ESR team to extend it. What exactly field you need in complex schema?
  - [Shankar] The minimal information required in the complex schema for homing in R2, would be the geographical coordinates (latitude, longitude). But I’m guessing it will be more than that - physical-location-id, complex-name, data-center-code, url, etc.. based on the R1 API of AAI (v11, I think)
  - Ethan Who will create a complex object? Can we pre-load the complex information into A&AI and provide a list of locations in ESR portal for customer to select? Since multiple open stack instance could share a same location (us-west etc)
  - [Shankar] This is a good question to ask, and I’m not sure who creates these complex information in AAI. That said, I do know from the ECOMP parlance that multiple openstack instances (cloud-regions) could share the same complex.
  - [Matti] Complex is the physical location (e.g., data center) where the cloud-regions (= OpenStack instance = set of compute nodes managed by one instance of OpenStack control plane) reside. Is the model in MultiVIM that each cloud-region registers with MultiVIM individually even if there may be multiple cloud-regions in the same data center?
  - [Bin ] The relationship between a service instance or more specifically a generic VNF of that service instance and the complex can be inferred by the relationship chain: generic-vnf → vserver → cloud region → complex. Please note that the cloud region is the parent node of vserver, so you can find out the cloud region by the vserver easily .

Cloud Affinity
- Location of cloud-regions (same as above)
- [Ethan] If it’s the same as above, this do not fall into the scope of MultiCloud.
- [Bin] The location information is coming along with complex which is populated by robot script in In Release 1. ESR might be able to help since complex belongs to external system information logically.

HPA capabilities, indirectly, through AAI
- CPU Pinning, NUMA as key value pairs
- Example: cloud-region-id: {CPU Pinning: True, NUMA: False, Large Pages: True}
- required for evaluating HPA constraints
- Is this provided by MultiCloud during VIM registration
- [Ethan] Yes, When a new VIM was registered through ESR portal, ESR will call MultiCloud to update VIM informations like networks or images in A&AI, we could populate HPA during register in R2 release. What kind of HPA features that might need in R2?
- [Shankar] The exact HPA attributes relevant in R2 is still being determined. That said, it is most likely to be a list of attributes that are associated with a cloud-instance in AAI. Does that sound a reasonable way to capture capabilities ?
- [Ethan] About HPA, using a list [hpa1, hpa2] sounds not good to me since it will make filtering harder, I would prefer a map {hpa1: true, hpa2: false}.
- [Shankar] A map works just fine from an OOF perspective. However, I wonder when a HPA:false check will need to be performed
Min-guarantee capability, indirectly through AAI
- Boolean (Yes/No) for a given cloud instance.
- This can be treated similar to other capabilities (as a key value pair)
- Example: cloud-region: {min-guarantee: True}
- [Ethan] Does software configuration attribute exist in A&AI schema now?
- [Shankar] This is very similar to the HPA attributes except that these are more general attributes (as compared to hardware specific attributes).
Available Capacity, directly, through MultiCloud
- CPU, Mem, Network, Disk, etc

- required for evaluating instantaneous/available capacity check constraints
- provided by MultiCloud through a Query or DMaaP
- Example: available_capacity(cloud_region_id) -> {x: vCPU, y: memory, z: storage}
- If a public cloud does not want to reveal exact numbers, they could return ranges of capacities.
- Which of these metrics (cpu, mem, disk, network) are relevant to vCPE
- [Ethan] MultiCloud could provide an API for OOF to query available resources information. In my mind, there are two ways, first is OOF check every VIM’s available capacity from MultiCloud, MultiCloud return {vCPU, Memory, Storage}, the second one is OOF send out VNF resources requests of {vCPU: xx, Memory: yy, Storage: zz} to MultiCloud, and MultiCloud return a list of VIMs that capable for this resources requests.
- [Shankar] I’d probably go with the former, where OOF queries for the available capacity for a given set of VIM IDs. Eg., getCapacities([cpu,mem],[vim-id-1, vim-id-5, vim-id-7]) To me this appears a much more scalable approach, since OOF would have eliminated many VIMs based on other constraints that make them an infeasible choice for a given problem. Please do let me know what you think.
- [ethanlynnl] About get_capacity API, the input format could be {“vcpu”: int, “men”: int, “vims”: [vim1, vim2, vim3]} then the return data would be [vim1, vim3]. What do you think?
- [Shankar] We should probably call it check_vim_capacity() perhaps. Just to make the semantics of this clear, check_vim_capacity({“vcpu”: int, “men”: int, “vims”: [vim1, vim2, vim3]}) would mean that the vims returned have met *both* constraints on vcpu and mem.
- [Bin] Please make the specification of VIM in the list be possible to be extended. So the input for check_capacity API might looks like: {“vcpu”: int, “men”: int, “vims”: [{"cloud-owner":"vim1 owner", "cloud-region-id":"region id 1"}, {"cloud-owner":"vim2 owner", "cloud-region-id":"region id 2"}, {"cloud-owner":"vim3 owner", "cloud-region-id":"region id 3}]}. With this format, later OOF could add more constraints to those specified VIMs in list, e.g. for the cloud region with multiple available zone, and OOF want to check that if required capacity is available on specified AZ, then the vim list could be [{"cloud-owner":"vim1 owner", "cloud-region-id":"region id 1", "availabe-zone":"az1"}, {"cloud-owner":"vim2 owner", "cloud-region-id":"region id 2","availabe-zone":"az2"}, {"cloud-owner":"vim3 owner", "cloud-region-id":"region id 3, "availabe-zone":"az3"}]
- Bin Hu: A hybrid approach seems more ideal, i.e. a definitive "Yes/No" answer plus a list of available capacity information. The following is an example (and only an example):

Step 0: This is pre-condition. OOF is doing global optimization with many constraints, and worked out a shortlist of 3 VIMs { cloud-region-id1, cloud-region-id2, cloud-region-id3 }. Now OOF needs to check those VIMs' capacity.

Step 1: OOF calls MultiVIM's API to check those 3 VIMs' capacity information.

Step 2: MultiVIM checks those 3 VIMs' capacity information (internal implementation of MultiVIM)

Step 3: MultiVIM returns 2 pieces of information of each VIM to OOF:

One piece of information is { Yes | No }, which indicates whether this VIM has sufficient capacity ("Yes") or not ("No").
1. If there is no knowledge (either directly or indirectly) of "Yes" or "No", empty data will be returned
The other piece of information is the actual capacity data of the VIM.
1. If the actual capacity data is not available, empty data will be returned

Step 3: OOF gets the capacity information of those 3 VIMs, and continue its work.

Request:

check_vim_capacity({“vcpu”: int, “mem”: int, “vims”: [cloud-region-id1, cloud-region-id2, cloud-region-id3]})

Response:

{

cloud-region-1:

{

has_capacity: Yes,

capacities: {

“vcpu”: 20,

“mem”: 100

}

cloud-region-2:

{

has_capacity: Yes,

capacities: {

}

cloud-region-3:

{

has_capacity: No,

capacities: {

“vcpu”: 0,

“mem”: 10

}

Here, cloud-region-1 has capacity and is willing to give precise capacity information, cloud-region-2 has capacity, but is unwilling to provide exact numbers, while cloud-region-3 doesn’t have capacity but is willing to provide available capacity information. Therefore, OOF will now be able to recommend placement for both cloud-region-1 and cloud-region-2, and will be able to cache cloud-region-1 and cloud-region-3 capacities for making better placement decisions.

Aggregate capacity, indirectly, through AAI
- CPU, Mem, Network, Disk, etc
- Does this fold naturally under the aggregate utilization - when do we actually need aggregate static capacity for decision making ?

R3 and beyond:

Host-level affinity/anti-affinity capability, indirectly through AAI

Open Questions:

Is it possible to get capacities and capabilities for logical groups of resources (like host aggregates)
- Example: available_capacity(cloud_region, host_aggregate)?
- Would there be a logical group for every combination of capabilities
  - grows into a large # of combinations very quickly!
Evaluating constraint combinations:
- Example: consider the following two constraints
  - Is there sufficient number of CPUs in the (cloud) cluster ? - (Yes)
  - Does the (cloud) cluster support NUMA ? - (Yes)
- But, is there sufficient number of CPUs that support NUMA ? - Maybe not !
- Example Query: available_capacity(cloud_region, {HPA1, HPA2}) and the output should give the available capacity where the two HPA attributes are satisfied.
- [Ethan] Since HPA information will be populated into A&AI, and OOF can directly query from A&AI and do the match. Do you still need this API in MultiCloud and MultiCloud return a list of available VIM for you?
- [Shankar] I think we will still need support for such holistic capacity checks from MultiCloud. As shown in the example above, the two constraints can independently be satisfied, but may not be satisfiable when looked at in combination. Since the VIM is the only place that has this holistic view, I’m guessing MultiCloud is the right entity capable of making these associations. Please let me know if this is still confusing, and I’ll try to elaborate more on this.
- [Ethan] About the available_capacity(cloud_region, {HPA1, HPA2}), I’m still confused. I was thinking it works like OOF get all VIMs from A&AI, and select those with HPA1 and HPA2 supported, e.g. vim1, vim3, then call multi cloud get_capacity API get_capacity({cpu: 5, mem: 64, vims:[vim1, vim3]}), and multi cloud return satisfied vim list [vim1].
- [Shankar] The flow that you have mentioned is absolutely correct. However, consider a scenario where vim1 has 10 vCPUs 5 of which supports HPA, and 5 that don’t support HPA. Assume that the 5 vCPUs that support HPA are already allocated and not available. So, when OOF checks the following constraints: Evaluating, Constraint (i) Does vim1, vim3 support {HPA1, HPA2} ? The answer is Yes. Now evaluating, Constraint (ii) Does vim1, vim3 have 5 vCPUs ? The answer is VIM1, which has 5 vCPUs. However, as you can see from the example, this doesn’t guarantee that the 5 vCPUs that VIM1 has available supports HPA. OOF has knowledge of whether constraints (i) and (ii) are independently met, but will not be able to know that both (i) and (ii) can be simultaneously met. This association is something that only the MultiCloud can make given its holistic view into the VIM.
- [Matti] I was thinking how this would work in OpenStack. It seems you could do it like this. Assume each HPA attribute matches with a metadata attribute associated with a host aggregate.
  - List all the host aggregates (nova aggregate-list)
  - Determine which ones match all the HPA attribute metadata attributes (nova aggregate-details)
  - Get the list of hosts in these matching host aggregates
  - Get the remaining capacity on these hosts and see if it is larger than the specified amount
- Bin Hu: Assume we have [cpupin, numa, largepages] as the HPA capabilities for vCPU resource, "Holistic Check" of a vCPU with those HPA means:
  cpupin AND numa AND largepages AND capacity(vcpu: 5)

Reservations (R3 + ):

Based on prior discussions with SO, OOF and MultiCloud, reservations may be a desirable feature.
Actual orchestration of resources may take long enough during the SO service instantiation workflows, during when homing recommendations provided by OOF may become potentially infeasible. It is highly desirable to reduce this window of vulnerability.
Infeasibility may happen even if a "capacity check" is performed by OOF
A possible solution is to have soft reservations, which is made by OOF to MultiCloud. The response to soft reservation requests can be binary (Success/Failure) from MultiCloud. OOF would be responsible rollback of resources when only a part of reservation succeeds. Soft reservations have a timeout which bounds the time for which resources are locked by MultiCloud.

Open Questions:

Trade-off: cost of implementing soft reservations in MultiCloud vs cost of failed orchestration attempts at SO
Do all VIMs support reservations ?
Start with Openstack reservations