...
- Machine Learning and Deep Learning inference/training offloads
- Crypto and compression offloads
- Protocol (such as IPsec and PDCP) offloads
Though fixed function accelerator accelerators do provide better performance, multiple accelerators are required in a given compute node if there are workloads of different types.
Being programmable, acceleration function functions can be dynamically changed in FPGA/GPU GPUs based on the need and hence FPGA FPGAs and GPUs are considered to support various kinds of workloadsbe generic accelerators.
Openstack and Kubernetes orchestrators are adding support for FPGA FPGAs and GPUGPUs.
ONAP being a service orchestrator (end-to-end), makes placement decisions on placing the for VNFs/workloads and hence the awareness of FPGA/GPU is required in ONAP like any other HPA feature.
Since Openstack is in advanced stages (in Rocky release) of providing FPGA support (via Cyborg project), initial support in ONAP would support Openstack based cloud regions. K8S support will be added later.
FGPAs can be used in following ways:
...
From ONAP perspective, there is not much difference between pre programmed and orchestrator programmed. These two are commonly known as AFaaS (Acceleration Function as a Service). In case of workload programmed, it is called FPGAaaS (FPGA as a Service).
Since workload programmed FPGA is yet to be matured in Industry, AFaaS is considered initially. Rest of design note assumes the support for AFaaS.
Note on Openstack FPGA support
...
One good thing that is happening in Rocky release is that all resources, whether it is CPU, disk, memory, PCIe SRIOV VF or even FPGA/GPU, are represented in uniform way. And hence the request for resources via flavor is also becomes uniform.There are no longer unique way of representing flavor attributes. Please see this design note here : https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html
In summary: Any resource request in flavor always starts with property resource<N>:<resource name>:<count>. Any associated trait for the resource or traits by themselves can be represented as trait<N>:<trait name>=required.
In case of FPGA and GPU, there is additional generic property is getting added, whose syntax is function<N>:<Function type>=required. Note that this keyword is not interpreted by NOVA. It is only interpreted by Cyborg.
One example:
- Say that Openstack instance supports ARRIA10 FPGAs and functions such as IPSEC_ACCELERATION, PUBLIC_CRYPTO_ACCELERATION, TENSORFLOW_ACCELERATION, COMPRESSION_ACCELERATION.
- Say that compute nodes have 3 FPGA cards in each of them.
- Say that Openstack operator thinks that there would be
- Some workload types that require IPSEC_ACCLERATION and COMPRESSION_ACCELERATION together.
- Some workload types that require TENSORFLOW_ACCELERATION (2 of them) and COMPRESSION_ACCELERATION together
- Some workload types that require IPSEC_ACCELERATION, PUBLIC_CRYPTO_ACCLERATION and COMPRESSION_ACCELERATION together
Openstack operator creates following flavors
Flavor 1:
- resource1:CUSTOM_ACCELERATOR_FPGA=1
- trait1:FPGA_INTEL_ARRIA10=required
- function1:IPSEC_ACCLERATION_AF=required
- resource2:CUSTOM_ACCELERATOR_FPGA=1
- trait2:FPGA_INTEL_ARRIA10=required
- function2:COMPRESSON_ACCLERATION_AF=required
Flavor 2:
- resource1:CUSTOM_ACCELERATOR_FPGA=2
- trait1:FPGA_INTEL_ARRIA10=required
- function1:TENSORFLOW_ACCLERATION_AF=required
- resource2:CUSTOM_ACCELERATOR_FPGA=1
- trait2:FPGA_INTEL_ARRIA10=required
- function2:COMPRESSON_ACCLERATION_AF=required
Flavor 3:
- resource1:CUSTOM_ACCELERATOR_FPGA=1
- trait1:FPGA_INTEL_ARRIA10=required
- function1:IPSEC_ACCLERATION_AF=required
- resource2:CUSTOM_ACCELERATOR_FPGA=1
- trait2:FPGA_INTEL_ARRIA10=required
- function2:PUBLIC_CRYPTO_ACCLERATION_AF=required
- resource3:CUSTOM_ACCELERATOR_FPGA=1
- trait3:FPGA_INTEL_ARRIA10=required
- function3:COMPRESSON_ACCLERATION_AF=required
Some points to note are:
- Grouping of various properties together is achieved by <N> in resource, trait and function keywords.
- Even though FPGA accelerators typically are PCIe cards, PCIe vendor ID/device ID and VFs are hidden. They don't need to be specified.
- Note that in above examples, only the FPGA accelerators are taken as an example. In reality, flavors will have other resources too.
ONAP design considerations
- Leverage HPA functionality added in Multi-Cloud, OOF and A&AI.
- Add new HPA feature called "FPGA-AFaaS-Acceleration"
- Have following HPA feature attributes:
- fpga_device. As per above examples, this attributes takes value "FPGA_INTEL_ARRIA10".
- fpga_AF_function: As per above examples, this attributes takes values such as IPSEC_ACCELERATION_AF, TENSORFLOW_ACCELERATION_AF, PUBLIC_CRYPTO_ACCELERATION_AF or COMPRESSION_ACCELERATION_AF
- fpga_AF_count: This holds the number of AFs supported by the site for each workload
Changes in R4:
Component | Changes |
---|---|
OOF |
|
A&AI |
|
Multi-Cloud |
|
Discovery by Multi-Cloud:
<Take above example and show how those FPGA feature is represented in A&AI>
HPA policies
<Show few examples on how HPA policies would look like>
TOSCA compute requirements
<Show few examples on how TOSCA compute requirements would look like>