OOF - MultiCloud interaction in R2



Homing Policies and corresponding information required from MultiCloud for Homing in R2:

  • Latency (air distance)

    • Location/Coordinates of customer 

      • MultiCloud may not have an active role in providing this information, which would come from SO as a part of the customer order

      • [Ethan]: MultiCloud won’t populating that information.

    • Location of existing service instances (or the cloud-region)

      • [Ethan]: The relationship information between VNF and cloud-region was populated by SO if I remember correctly.

      • In AAI, each cloud-region (GET /cloud-infrastructure/cloud-regions/cloud-region/{cloud-owner}/{cloud-region-id}) record points to a complex (GET /cloud-infrastructure/complexes/complex/{physical-location-id}) and the complex relates to a lat and long. Is it correct that a VIM needs to register with MultiVIM? Does that registration have enough information for MultiVIM to create the region and complex data in A&AI?

      • Example: service instance → cloud-region → complex  → (lat,long)

      • [Ethan]: Currently we use A&AI/ESR portal to register a new VIM, some basic information like cloud-owner and cloud-region and credentials could be input from this portal and put into A&AI db, but not the complex info. After we nail down the information that OOF might need, we can discuss with ESR team to extend it. What exactly field you need in complex schema?

      • [Shankar] The minimal information required in the complex schema for homing in R2, would be the geographical coordinates (latitude, longitude). But I’m guessing it will be more than that - physical-location-id, complex-name, data-center-code, url, etc.. based on the R1 API of AAI (v11, I think)

      • Ethan Who will create a complex object? Can we pre-load the complex information into A&AI and provide a list of locations in ESR portal for customer to select? Since multiple open stack instance could share a same location (us-west etc)

      • [Shankar] This is a good question to ask, and I’m not sure who creates these complex information in AAI. That said, I do know from the ECOMP parlance that multiple openstack instances (cloud-regions) could share the same complex. 

      • [Matti] Complex is the physical location (e.g., data center) where the cloud-regions (= OpenStack instance = set of compute nodes managed by one instance of OpenStack control plane) reside. Is the model in MultiVIM that each cloud-region registers with MultiVIM individually even if there may be multiple cloud-regions in the same data center? 

      • [Bin ] The relationship between a service instance or more specifically a generic VNF of that service instance and the complex can be inferred by the relationship chain: generic-vnf → vserver → cloud region → complex.  Please note that the cloud region is the parent node of vserver, so you can find out the cloud region by the vserver easily .

  •  

    • For R2

      • @ethanlynnl Based on discussion of last meeting, the location info like latitude and longitude are existed in A&AI complex schema, and if anyone wants to find a location of vnf, it could query the info in A&AI and follow the relationship to find the cloud-region and latitude/longitude info in complex.  The only problem is ESR portal need to allow cloud admin to input these information.


  • Cloud Affinity

    • Location of cloud-regions (same as above)

      • [Ethan] If it’s the same as above, this do not fall into the scope of MultiCloud.

      • [Bin] The location information is coming along with complex which is populated by robot script in In Release 1. ESR might be able to help since complex belongs to external system information logically.

    • For R2

      • @ethanlynnl Same as above.

  • HPA capabilities, indirectly, through AAI

    • CPU Pinning, NUMA as key value pairs 

    • Example: cloud-region-id: {CPU Pinning: True, NUMA: False, Large Pages: True}

    • required for evaluating HPA constraints

    • Is this provided by MultiCloud during VIM registration 

      • [Ethan] Yes, When a new VIM was registered through ESR portal, ESR will call MultiCloud to update VIM informations like networks or images in A&AI, we could populate HPA during register in R2 release.  What kind of HPA features that might need in R2?

      • [Shankar] The exact HPA attributes relevant in R2 is still being determined. That said, it is most likely to be a list of attributes that are associated with a cloud-instance in AAI. Does that sound a reasonable way to capture capabilities ?

      • [Ethan] About HPA, using a list [hpa1, hpa2] sounds not good to me since it will make filtering harder, I would prefer a map {hpa1: true, hpa2: false}. 

      • [Shankar] A map works just fine from an OOF perspective. However, I wonder when a HPA:false check will need to be performed

    • For R2

      • @ethanlynnl Based on discussion of last meeting, MultiCloud will put HPA capabilities into A&AI and the final schema of HPA is under discussion. MultiCloud will not provide API to query the HPA capabilities for R2.



  • Min-guarantee capability, indirectly through AAI 

    • Boolean (Yes/No) for a given cloud instance.  

    • This can be treated similar to other capabilities (as a key value pair)

    • Example: cloud-region: {min-guarantee: True} 

      • [Ethan] Does software configuration attribute exist in A&AI schema now?

      • [Shankar] This is very similar to the HPA attributes  except that these are more general attributes (as compared to hardware specific attributes).

    • For R2:

      • Still need to discuss further.



  • Available Capacity, directly, through MultiCloud

    • CPU, Mem, Network, Disk, etc

  •  

    • required for evaluating instantaneous/available capacity check constraints

    • provided by MultiCloud through a Query or DMaaP 

    • Example: available_capacity(cloud_region_id) -> {x: vCPU, y: memory, z: storage} 

    • If a public cloud does not want to reveal exact numbers, they could return ranges of capacities.

    • Which of these metrics (cpu, mem, disk, network) are relevant to vCPE 

      • [Ethan] MultiCloud could provide an API for OOF to query available resources information. In my mind, there are two ways, first is OOF check every VIM’s available capacity from MultiCloud, MultiCloud return {vCPU, Memory, Storage}, the second one is OOF send out VNF resources requests of {vCPU: xx, Memory: yy, Storage: zz} to MultiCloud, and MultiCloud return a list of VIMs that capable for this resources requests.

      • [Shankar] I’d probably go with the former, where OOF queries for the available capacity for a given set of VIM IDs. Eg., getCapacities([cpu,mem],[vim-id-1, vim-id-5, vim-id-7]) To me this appears a much more scalable approach, since OOF would have eliminated many VIMs based on other constraints that make them an infeasible choice for a given problem. Please do let me know what you think.

      • [@ethanlynnl] About get_capacity API, the input format could be {“vcpu”: int, “men”: int, “vims”: [vim1, vim2, vim3]} then the return data would be [vim1, vim3]. What do you think?

      • [Shankar] We should probably call it check_vim_capacity() perhaps. Just to make the semantics of this clear, check_vim_capacity({“vcpu”: int, “men”: int, “vims”: [vim1, vim2, vim3]}) would mean that the vims returned have met *both* constraints on vcpu and mem. 

      • [Bin] Please make the specification of VIM in the list be possible to be extended. So the input for check_capacity API might looks like:  {“vcpu”: int, “men”: int, “vims”: [{"cloud-owner":"vim1 owner", "cloud-region-id":"region id 1"}, {"cloud-owner":"vim2 owner", "cloud-region-id":"region id 2"}, {"cloud-owner":"vim3 owner", "cloud-region-id":"region id 3}]}. With this format, later OOF could add more constraints to those specified VIMs in list, e.g. for the cloud region with multiple available zone,  and OOF want to check that if required capacity is available on specified AZ, then the vim list could be  [{"cloud-owner":"vim1 owner", "cloud-region-id":"region id 1", "availabe-zone":"az1"}, {"cloud-owner":"vim2 owner", "cloud-region-id":"region id 2","availabe-zone":"az2"}{"cloud-owner":"vim3 owner", "cloud-region-id":"region id 3, "availabe-zone":"az3"}]

      • @Bin Hu: A hybrid approach seems more ideal, i.e. a definitive "Yes/No" answer plus a list of available capacity information. The following is an example (and only an example):

    • For R2:

    •  

      • @ethanlynnl MultiCloud will provider an API check_vim_capacity for OOF. The input looks like following, and it means ALL the requirements need to be satisfied.

        • {
          "vCPU": int, // number of cores
          "Memory": float, // size of memory, GB
          "Storage": int, //GB
          "VIMs": array of vim_id // VIM IDs  that OOF wish to check with
          }

      • In R2 the combination of HPA and available resources query is not mandatory to support.

      • The ouput of this API we still need to discuss more. Could be an array of VIM IDs satisfied with this requirement, or an array of  satisfied VIM objects with current capabicities.

      • @Bin Hu "get_vim_capacity" API is pending further discussion because we ran out of time in the meeting on Jan 31. The proposal at the meeting is to have the following API:

        • Signature: object get_vim_capacity (object vims);

        • Input parameter: “vims” is a json object that lists the set of cloud-ids; the format is consistent with the "VIMs" in "check_vim_capacity" API.

        • Output: “object” is the data model that will be worked out during R2.

        • This API signature is agnostic of any infrastructure data model and ensures the extensibility and backward compatibility, no matter how data model evolves beyond R2.

      • 4:30pm Wednesday 02/07/2018: @Ramki Krishnan, @Sastry Isukapalli@Shankaranarayanan Puzhavakath Narayanan and @Bin Hu discussed the MVP to support OOF in MultiVIM. It was concluded that the following 3 components are MVP for R2:

        • check_vim_capacity

        • capacity information

          • capacity information will be used to support flexible placement policy

          • minimum resource information include CPU and RAM, and metrics are: average utilization, and peak utilization.

          • Aligning with OpenStack terminology for simplicity, the key aggregates of interest (including HPA) for OOF would be cloud region, project and host aggregate

        • The method to send capacity information to OOF

          • DMaaP is recommended by Architecture Committee and highly preferred for R2

Additional Discussion:

One form of check_vim_capacity API provided by @ethanlynnl



  • Input of check_vim_capacity will be

{
"vCPU": int, // number of cores
"Memory": float, // size of memory, GB
"Storage": int, //GB
"VIMs": array of vim_id // VIM IDs that OOF wish to check with
}



  • Output of check_vim_capacity will be

{
"VIMs": array of vim_id // subset of requested VIM IDs satisfy with this resource requirement
}



Another response format of check_vim_capacity API by @Bin Hu

Step 0: This is pre-condition. OOF is doing global optimization with many constraints, and worked out a shortlist of 3 VIMs { cloud-region-id1, cloud-region-id2, cloud-region-id3 }. Now OOF needs to check those VIMs' capacity.

Step 1: OOF calls MultiVIM's API to check those 3 VIMs' capacity information.

Step 2: MultiVIM checks those 3 VIMs' capacity information (internal implementation of MultiVIM)

Step 3: MultiVIM returns 2 pieces of information of each VIM to OOF:
1.    One piece of information is { Yes | No }, which indicates whether this VIM has sufficient capacity ("Yes") or not ("No").
1.    If there is no knowledge (either directly or indirectly) of "Yes" or "No", empty data will be returned
2.    The other piece of information is the actual capacity data of the VIM.
1.    If the actual capacity data is not available, empty data will be returned

Step 3: OOF gets the capacity information of those 3 VIMs, and continue its work.



Request: 

check_vim_capacity({“vcpu”: int, “mem”: int, “vims”: [cloud-region-id1, cloud-region-id2, cloud-region-id3]}) 

Response: 

{

            cloud-region-1: 

                        {

                                     has_capacity: Yes, 

                                     capacities: { 

                                                “vcpu”: 20, 

                                                “mem”: 100

                                     }

                        }

            cloud-region-2: 

                        {

                                    has_capacity: Yes, 

                                     capacities: { 

                                     }

                        }

            cloud-region-3: 

                        {

                                    has_capacity: No, 

                                     capacities: { 

                                                “vcpu”: 0, 

                                                “mem”: 10

                                     }

                        }

}

Here, cloud-region-1 has capacity and is willing to give precise capacity information, cloud-region-2 has capacity, but is unwilling to provide exact numbers, while cloud-region-3 doesn’t have capacity but is willing to provide available capacity information. Therefore, OOF will now be able to recommend placement for both cloud-region-1 and cloud-region-2, and will be able to cache cloud-region-1 and cloud-region-3 capacities for making better placement decisions.



  • Aggregate capacity, indirectly, through AAI

    • CPU, Mem, Network, Disk, etc

    • Does this fold naturally under the aggregate utilization - when do we actually need aggregate static capacity for decision making ?



R3 and beyond:

  • Host-level affinity/anti-affinity capability, indirectly through AAI



Open Questions:

  • Is it possible to get capacities and capabilities for logical groups of resources (like host aggregates)

    • Example: available_capacity(cloud_region, host_aggregate)?

    • Would there be a logical group for every combination of capabilities  

      • grows into a large # of combinations very quickly! 

  • Evaluating constraint combinations: 

    • Example: consider the following two constraints

      • Is there sufficient number of CPUs in the (cloud) cluster ? - (Yes)

      • Does the (cloud) cluster support NUMA ? - (Yes)

    • But, is there sufficient number of CPUs that support NUMA ? - Maybe not !

    • Example Query: available_capacity(cloud_region, {HPA1, HPA2}) and the output should  give the available capacity where the two HPA attributes are satisfied. 

    • [Ethan] Since HPA information will be populated into A&AI, and OOF can directly query from A&AI and do the match. Do you still need this API in MultiCloud and MultiCloud return a list of available VIM for you?

    • [ShankarI think we will still need support for such holistic capacity checks from MultiCloud. As shown in the example above, the  two constraints can independently be satisfied, but may not be satisfiable when looked at in combination. Since the VIM is the only place that has this holistic view, I’m guessing MultiCloud is the right entity capable of making these associations. Please let me know if this is still confusing, and I’ll try to elaborate more on this. 

    • [Ethan] About the available_capacity(cloud_region, {HPA1, HPA2}), I’m still confused. I was thinking it works like OOF get all VIMs from A&AI, and select those with HPA1 and HPA2 supported, e.g. vim1, vim3, then call multi cloud get_capacity API get_capacity({cpu: 5, mem: 64, vims:[vim1, vim3]}), and multi cloud return satisfied vim list [vim1]. 

    • [Shankar] The flow that you have mentioned is absolutely correct. However, consider a scenario where vim1 has 10 vCPUs 5 of which supports HPA, and 5 that don’t support HPA. Assume that the 5 vCPUs that support HPA are already allocated and not available. So, when OOF checks the following constraints: Evaluating, Constraint (i) Does vim1, vim3 support {HPA1, HPA2} ? The answer is Yes. Now evaluating, Constraint (ii) Does vim1, vim3 have 5 vCPUs ? The answer is VIM1, which has 5 vCPUs. However, as you can see from the example, this doesn’t guarantee that the 5 vCPUs that VIM1 has available supports HPA. OOF has knowledge of whether constraints (i) and (ii) are independently met, but will not be able to know that both (i) and (ii) can be simultaneously met. This association is something that only the MultiCloud can make given its holistic view into the VIM. 

    • [Matti] I was thinking how this would work in OpenStack. It seems you could do it like this. Assume each HPA attribute matches with a metadata attribute associated with a host aggregate.

      • List all the host aggregates (nova aggregate-list)

      • Determine which ones match all the HPA attribute metadata attributes (nova aggregate-details)

      • Get the list of hosts in these matching host aggregates

      • Get the remaining capacity on these hosts and see if it is larger than the specified amount

    • @Bin Hu: Assume we have [cpupin, numa, largepages] as the HPA capabilities for vCPU resource, "Holistic Check" of a vCPU with those HPA means:

      cpupin AND numa AND largepages AND capacity(vcpu: 5)





Reservations (R3  + ):

  • Based on prior discussions with SO, OOF and MultiCloud, reservations may be a desirable feature.

  • Actual orchestration of resources may take long enough during the SO service instantiation workflows, during when homing recommendations provided by OOF may become potentially infeasible. It is highly desirable to reduce this window of vulnerability.

  • Infeasibility may happen even if a "capacity check" is performed by OOF 

  • A possible solution is to have soft reservations, which is made by OOF to MultiCloud. The response to soft reservation requests can be binary (Success/Failure) from MultiCloud. OOF would be responsible rollback of resources when only a part of reservation succeeds. Soft reservations have a timeout which bounds the time for which resources are locked by MultiCloud. 

Open Questions:

  • Trade-off: cost of implementing soft reservations in MultiCloud vs cost of failed orchestration attempts at SO 

  • Do all VIMs support reservations ?

  • Start with Openstack reservations