The RoadMap to CISCO ACI – Whys and wherefores

Share on twitter
Share on linkedin
Share on facebook
Share on reddit
Share on pinterest

Why CISCO ACI??

Indeed, This is still an unresolved technical question for some network engineers, managers, and even higher-ranking officials, who generally deal with CISCO products in their infrastructure; Why go for CISCO ACI? and What are the advantages of that. I’m going to accommodate these asks in the rest of this article.

I believe that, what you need to know, is the roadmap to CISCO ACI. What basically drives us to ACI, is the evolution of Data Center fabric technologies we’ve dealt with them, the challenges and problems that each of them had, and accordingly, some of them are completely obsolete. Let’s see what the most important of these challenges are :

 

Part One: What are the most common challenges we’ve ever faced in Data Center fabric design?

  • (1) Large Broadcast domains should be avoided.

It’s definitely something we are running away from. A single Broadcast domain is a single Failure domain. It means any failure that occurs within a broadcast domain can affect all services running in that. Furthermore, In a large broadcast domain, unusual traffic propagation often occurs to a host that doesn’t need to listen. Consequently, Network bandwidth and processing capacity are consumed for almost nothing and you always have a busy network. From the security point of view, Large broadcast domains are more vulnerable to traffic sniffing and easier man-in-the-middle kind of attacks.

  • (2) Simple Cabling.

Traditional Data Center Network (DCN) technologies rely on full-mesh cabling to meet the desired level of redundancy. Of course, this makes the cabling more complicated as the network scale is growing.

Based on the explanations provided in the rest of the article, A large-scale Data Center fabric is often split into several points of Delivery or PODs that are communicated via the L3 Core. This helps to simplify cabling and also breaking the large L2 broadcast domains.

  • (3) End-to-end VM mobility and distributed workloads among all server racks within the Data Center.

In contrast to the previous challenges, For a variety of reasons and requirements such as distributed workloads and VM mobility, we have to provide layer 2 connection between all server racks within the data center fabric. Actually, in modern infrastructures, We basically shouldn’t impose restrictions on applications and services that require L2 connection in such a way that Administrators have to place their hardware only on certain racks. But More L2 connections equal larger L2 broadcast domains! As you can see, meeting these requirements in traditional technologies conflicts with each other! especially if you have Multi-POD infrastructure.

Well, It seems we have to find out a solution to establish Layer 2 connections without relying on the Layer 2 network! This is the key point to what new technologies are based on. A Layer 2 connection as the overlay, over an L3 Network as the underlay.

  • (4) Top-of-rack (ToR) switches

It’s one of the most common challenges we face in traditional data center networks. Fewer switches at the top of the racks, mean we have much more complicated cabling. On the other hand, having more switches at the top of the racks equal more configuration workload and more devices to monitor and maintain. the situation gets worse!

  • (5) Loop-Prevention mechanisms: No Spanning-Tree protocol anymore!

People who are dealing with business-critical services such as banking transactions, must have experienced that, Even the fastest types of the spanning-tree protocol can lead to service disruption during the topology changes. And that’s simply because they no longer are fast convergence as expected. The Loop-Prevention mechanism (Blocking state) used by STP is not acceptable in the case of new Data Center switches with 10GE, 40GE, 100GE, and 400GE line cards. This protocol is almost obsolete and no longer has a place in modern infrastructures. So, what are the alternatives?

  • (6) Scalable Fabric across multiple sites. How DCI requirements are supposed to be covered?

Data Center Interconnection (DCI) building blocks, Path Optimization techniques, High Availability, and the considerations related to these items are among the major parts of data center design, which are also covered in detail in the CISCO CCDE course. The challenges in this part, entirely depend on the technology used in the data center fabric. Does the technology itself have solutions to address these challenges? Or network administrators are responsible for all of them! They have to run several third-party protocols corresponding to each part of the requirements. This is also extremely important when you are going to design an underlay infrastructure for a Private Cloud. In other words, Some of the traditional DCN technologies are basically not desired for Multi-Site infrastructure. It’s necessary to support easy extensibility and workload mobility features by the Data Center fabric technology we are considering in Multi-Site infrastructures.

  • (7) A Unified Data Center fabric for both physical and virtual networks

In traditional Datacenter design, we find the physical network and virtual network completely separated from each other. Of course, what we called a ‘Virtual Network’ not only consists of virtual machines, But we may also have the cloud-native infrastructure as well. In this case, We face several different types of Networks: Physical networking, Virtual machine networking, and Container networking. Basically, there is no integrity and visibility from each one to the others. Each network has Its own fabric, administrators, and security considerations that need to be addressed separately. For instance, The Security team needs to have some solutions for physical networking, and some other solutions for container networking. Of course, Who is actually responsible to enforce network and security policies within cloud-native infrastructure?!

Some of these challenges are addressed by establishing a new culture known as DevOps, and following that, a new Team structure is formed which is named “Site Reliability Engineering” or SRE. These new attitudes have been formed to accelerate the development process, eliminate the gap between traditional IT and Software developers, help teams collaborate better, And as a result, deliver reliable service faster. But changing the team structure is not enough and we definitely need to make the infrastructure more agile in the same way.

Therefore, the ideal form of a Data Center fabric is a unified network platform in which doesn’t matter what the type of an Endpoint or a host is. whether our Endpoints are Containers or virtual machines or Bare metal servers, we have one common fabric. all endpoints have complete reachability to the others using the same fabric, and this fabric will provide us the desired level of visibility, even more than before.

  • (8) Security concerns.

Traffic forwarding in traditional DCN technologies is implicitly allowed by default. This means that once a host connects to the network and administrators perform the initial VLAN and IP configuration associated with that, It will be able to communicate with others unless the security policies have been enforced already. This could make security holes and unauthorized access due to misconfigurations in large-scale networks. But this is no longer the default behavior in CISCO ACI, Even if no firewall or security appliance is deployed. Any traffic forwarding across EPGs is denied by default Unless a Contract is deployed between them and explicitly allows the desired traffic.

  • (9) Complex Deployment of L4-7 service appliances (Firewall, WAF, IPS, LB, so on)

As you know, each L4 to L7 service appliance such as a firewall or load balancer has different deployment models that could be challenging during the insertion into a traditional network. Let’s clarify that by answering this vital question: Which is responsible for the Servers gateway? an L3 Firewall? or Switching fabric? Some experts tend to choose the firewall because otherwise, They inevitably have to face Vlan Stitching (which dramatically increases the number of VLANs), VRF sandwiching, MP-BGP route leaking, and some other troublesome principles that exist in traditional networks. But on the other hand, this can not be a good choice for large-scale Enterprises where regardless of having different types of traffic, most of the time, you have multi-tenancy over a shared physical infrastructure, so that there are different operating environments, including “Production”, “Test and Try” and “Develop” with each has It’s own security benchmarks. Consequently, It is very common that you don’t want all traffic to pass through a specific firewall or You may have multiple firewalls with different purposes, It would be more efficient if Switching fabric have control of traffic forwarding within the network. More than that, the firewall resources should only get involved with security-driven operations, not network operations. In the same way, if you have the network and security teams separated from each other, which is often the case in large-scale Enterprises, Each team must do their own duties and there must be a separation of affairs and decision-making between them.

Back to the beginning of the answer, the major problem that needs to be solved is those troublesome principles mentioned above that exist in traditional DCN technologies.

  • (10) Easy Implementation, Configuration, and Troubleshoot

Last but not least, easy implementation, configuration and especially, Troubleshooting with minimum time required, is an influential factor in choosing a solution for a data center fabric. As the fabric is being scale-out, we have a more complex network, So that the concerns about troubleshooting and accidental misconfiguration intensify. This is one of the cases and places where Automation and Codifying the fabric comes into play and we see something called “Programmable Fabric” infrastructure.

In simple words, what to do next if we have some technologies for Data Center fabric that meet most of the challenges I mentioned before, But they are not easy to configure and troubleshoot?? Well, we can go towards Automation, Infrastructure as Code (IaC), or Software-defined networking (SDN). Since we are talking about Data Center, we’ll have the term of Software-defined data center or SD-DC.

We examined some of the most important challenges that have come to my mind so far, If you have more items to add, especially from The Security perspective, leave a comment, It’s my pleasure :).

Now Let’s keep the discussion going on by looking at the evolution of Data Center fabric technologies and investigate the challenges that each technology has.

Part Two: Data Center ‘Fabric’ Journey

First Generation: Typical Collapsed Core design; STP-based Architecture.

This is the first generation of Data Center Network (DCN) design which is called STP-based Network Architecture. Almost all of the challenges I mentioned earlier can be seen in such infrastructure!! The Link redundancy completely relies on Spanning-Tree, so that some interfaces will stay in the blocked state. Since there is no high availability solution for L3 Core switches, The only option is to use FHRP protocols such as HSRP to protect the default gateway used on a subnetwork by allowing the two L3 Switch to provide backup for that address.

First Generation: Typical 3-tier design; STP-based Architecture

As the scale of the fabric is growing, There are also more concerns about Cabling and the L2 broadcast domains! Thus we have to change the topology to what we called “3-tier architecture” including the Access layer, Distribution Layer and Core Layer. As shown in the picture, The default gateways are still on the L3 Core switch. so that we have the layer 2 Distribution layer along with large L2 broadcast domains that still need to be solved. One noteworthy is that at first glance, the 3-tier architecture which we have in Data Center networks is similar to the 3-tier architecture that exists in Campus LAN networks. But don’t forget we have much more concerns and considerations in DCN than Campus networks. How service appliances such as firewalls and load balancers are going to be deployed in DCN? Furthermore, we need to have end-to-end Workload mobility and VM-Mobility within the DC fabric.

First Generation: Typical 3-tier Multi-POD design; STP-based Architecture

To break the large broadcast domains, It’s possible to put the default gateways on the distribution layer as shown in the picture above however, This splits the fabric into multiple Points of Delivery (PODs). In this case, the L2 firewall, WAF, and load balancer have to be connected to each distribution cluster separately, which may affect the cabling volume and impressively increases the Configuration workload corresponding to these service appliances as well. More importantly, It may also break the seamless end-to-end VM mobility requirements and distributed workloads across different PODs. All the architectures discussed so far, are based on Spanning-tree and rely on FHRP protocols to make an L3 gateway cluster.

Second Generation: mLAG-based Architecture

In this generation, high availability protocols such as VSS, Switch Stack, and virtual Stack are started to use in different switch series. As a result, we are no longer required to configure HSRP or VRRP anymore. Further, HA protocols give us the mLAG (Multi-Chassis Link Aggregation) feature that can significantly overcome the Spanning-tree shortcomings, decreases the number of devices that need to be configured and maintained, and provides end-to-end link redundancy based on the LACP as well. This architecture could be implemented based on either collapsed core or 3-tier design, just like to the first generation. Apart from STP and Top-of-rack (ToR), all other challenges still exist in this generation.

Third Generation: Introduction to CISCO NX-OS and Nexus switch family.

Begging with Data Center 3.0, CISCO had many innovations both in hardware and technology. CISCO Nexus devices were introduced as the new Data Center switch family, on which a new branded Linux-based operating system named “NX-OS” is installed. The most significant advantage of this OS is that It brings SAN and LAN together. What it means is a CISCO Nexus 5K switch could be Simultaneously a native FC SAN Switch and a native classical Ethernet Switch. It also could be configured based on the Fiber Channel over Ethernet (FCoE); a New operational mode CISCO introduced in this generation. FCoE is actually a storage protocol that enables Fibre Channel (FC) communications to run directly over Ethernet. This is the idea that CISCO named “Unified Fabric”.

Various series of CISCO Nexus switch family has been marketed so far, of which the 2K, 5K, 7K, and 9K series are the most important. CISCO tried to solve Top-of-Rack (ToR) challenge by introducing nexus 2K backed with FEX technology which has been successful so far, But with the advent of newer technologies like CISCO ACI, It’s virtually no longer needed. To find out why? I’ve given a detailed answer in one another article about ACI. I suggest you read it if you are interested.

CISCO Fabric Extender technology as its names imply, not only focused on ToR, but It was also the beginning of the integration of physical and virtual infrastructure. CISCO Adapter FEX for UCS servers along with VM-FEX and Nexus 1000v were also introduced for this purpose. But unfortunately, this project almost failed, since the main hypervisor vendors in the market such as VMware and Hyper-v no longer support ‘VM-FEX’ in their recent versions.

Another innovation in this generation was the replacement of the VSS technology in Nexus devices, so that CISCO Nexus 5K, 7K, and 9K series no longer support VSS anymore. CISCO introduced vPC (Virtual Port-Channel) in this switch family instead. The Big difference between vPC and VSS is that vPC is not really a high availability protocol, But the VSS is. In vPC we have still two separate devices with each has Its own control plane, Just the same as before. vPC is a technology that enables the “Multi-Chassis link aggregation” or mLAG feature across two separated nexus switches. With this explanation, you may ask: WHY vPC is the replacement for VSS in CISCO nexus devices?? There are basically two answers for that: (1). Nexus switches could be used in unified fabric architecture which means there may be both SAN and LAN connectivity on the switch at the same time. Thus both SAN and LAN design basics need to be considered. In SAN design basics, switches are separated from each other and the FSPF protocol controls the failures instead. (2). In order to use VXLAN over the Leaf and Spine infrastructure supported with the CISCO Nexus family. The CLOS fabric structure, which we know better as the Leaf -and- Spine model, is one of the most widely used architectures in Data Center fabrics. It’s the physical infrastructure for technologies such as VXLAN and of course CISCO ACI as well. In this architecture, As you know, The Leaf switches have the role of Virtual Tunnel End Point (VTEP) in the VXLAN fabric and all the servers and hosts are connected directly to them. From the Leaf and Spine point of view, the Leaf switches must be separated and independent from each other. But we also need to have the link aggregations down to the servers and hosts. This is exactly where the vPC could be functional. That’s why we say VSS is used in Campus Network environments however, The vPC is used in Data Center infrastructures. But this has nothing to do with the third generation of DCN architecture I’m discussing, Leave it for later.

Data Center 3.0 has some improvements rather than before. such as FEX as a Top-of-rack solution, Easier Command-line, and less configuration, Multi-tenancy support on Nexus 7Ks, and ASR 9K routers thanks to Virtual Device Context (VDC), and Programmable hardware along with more powerful resources. But Most of the fundamental challenges we’ve considered in this article are remain unresolved up to this generation. CISCO vPC has one important trouble with this particular generation of Data Center fabric, That it works only between two switches. If you have more pairs of Nexus switches, you have to configure other independent vPC domains. Following that, We will split the fabric into multiple PODs as before. The scenario is identical to the second generation I discussed earlier.

As a result, The technologies such as vPC or VSS alone couldn’t be considered as a solution for Data Center switching fabric. Accordingly, the next generation is Cloud-Based fabric, instead of having pairs of vPC or pairs of VSS.

One another noteworthy about vPC is that Since the Control planes are still separated from each other, Again we need to use FHRP protocols to keep the default gateways on a pair of vPC switches.

Fourth Generation: Cloud-Based fabric architecture.

As I mentioned earlier, The Protocols such as vPC and VSS could only be run between two switches, thus in the case of Scaling out the DC fabric, we will have more than one vPC pair and vPC domain. Ultimately, this leads to split the fabric into multiple PODs with L3 routing between them. On the flip side, we need solutions to provide end-to-end Layer 2 workload mobility but at the same time, we also want to eliminate the large L2 broadcast domains. We reach a contradiction! So what?? These requirements drive us to a new generation of DCN technologies known as Cloud-based architectures. In such infrastructures, we have the terms “Underlay” and “Overlay” networks. What basically happens behind the scenes is we have a number of switches in the Data Center fabric connected to each other; This is the underlay network. On the other hand, the servers and hosts are connected to this network and want to communicate with each other. The server-to-server traffic which is known as the overlay is encapsulated within the underlay network instead of typically being routed or bridged. Depending on what the type of the Underlay network is, there are different types of encapsulations as below

MAC-in-MAC

The most common Protocols and technologies that use this type of encapsulation are TRILL, CISCO FabricPath, and SPB (Shortest Path Bridging). In these protocols, The underlay network is not typical layer 3 nor Classical Ethernet. But they actually have their own Framing structure instead. In plain English, The original Ethernet frame is encapsulated within a new Special frame and is sent towards the fabric. All these protocols leverage IS-IS to perform L2 routing that doesn’t rely on IP for carrying frames. The IS-IS routing is used to distribute link-state information and to calculate the shortest paths through the network to form the underlay network. This calculation is based on Layer 2 Multi-Pathing so that, All links will be available and are used, unlike STP protocol where some interfaces are always blocked. Let’s drill down to the CISCO FabricPath and get through what challenges it has overcome.

  • CISCO FabricPath

CISCO FabricPath offers a flexible and scalable design when applied to the Data Center fabric. A typical FabricPath network uses a Leaf-and-Spine architecture. There is no spanning-tree running and no longer Layer 2 broadcast domain challenge. It retains the easy configuration and Of course, the end-to-end layer 2 VM mobility and distributed workloads are also provided. Ultimately, using The leaf and Spine architecture could simplify the cabling in Data Center Network.

In contrast, FabricPath has some trouble drawbacks which cause it to be almost deprecated. First Off: As I mentioned before, FabricPath has Its own data plane and frame format, so that does not ride within Ethernet frames and does not ride above IP. This makes us inevitably have to connect all FabricPath nodes directly together. as a result, FabricPath is not scalable and extendable across multiple Sites unless the transport media is dark fiber or DWDM.

Another problem is the mechanism for handling BUM traffic in CISCO FabricPath. BUM traffic on the FabricPath network is not flooded. Instead, It follows what’s called a Multi-Destination Tree (MDT), which works very much like a traditional multicast tree. FabricPath automatically builds two separate logical trees for handling Multidestination traffic.

When BUM packets enter the fabric, they have to traverse the root switch to reach the whole network. So, the Placement of the root switch becomes key in a Data Center Interconnection (DCI) scenario. In such a scenario, the root can only be at one site which means the other site(s) need to traverse the DCI for all BUM traffic!.

Finally, CISCO FabricPath has no control plane for the overlay network. End-host information in the overlay network is learned through the flood-and-learn mechanism which is not an efficient method. These challenges have made CISCO FabricPath almost deprecated.

MAC-in-IP

We see ‘IP’, It obviously means the Underlay is a typical Layer 3 Routed network, and the Server-to-Server traffic is going to be encapsulated within that. From this moment on, we are relieved that, this type of Overlay transport could be easily scalable across multiple sites, and unlike FabricPath, we are not restricted to rely only on dark fiber or DWDM! hence the first trouble is solved just as soon.

The primary protocols that work based on this type of encapsulation are ‘VXLAN’, ‘STT’, ‘NVGRE’, and also ‘GENEVE’. Both VXLAN and GENEVE use UDP-based while STT utilizes TCP-based encapsulation. GENEVE is not only designed to support all the capabilities and flexibilities that the other protocols have, But It also could be an upgraded version, thanks to the little changes made to it. It uses the variable-length identifier so that, this field could be assigned to more extra headers and information. It’s one of the reasons that GENEVE is used in the VMware NSX SDN platform. In contrast, VXLAN and NVGRE have fixed 24-bit and STT has fixed 64-bit identifiers. If you’re interested in finding out more information about GENEVE, follow the link below.

Since VXLAN is still widely supported and considered by CISCO, VMware, Red Hat, Citrix, and some other Leaders, We would more focus on it and specify the challenges that are addressed using this significant technology.

  • VXLAN Fabric

VXLAN actually is nothing more than a Layer 2 Tunneling protocol over a Layer 3 IP/UDP underlay network. It follows from the CLOS fabric model for the Physical Infrastructure, known as the “Leaf-and-Spine” architecture. This conveys the concept that VXLAN as a technology for Data Center fabric, relies on Cloud-based fabric architecture. As a result, seamless end-to-end workload distribution is available while the broadcast domains are restricted only on each server rack. Further, Since the underlay infrastructure is just a typical routed network, The VXLAN fabric is easily scalable across multiple sites. Amazing! isn’t it??

One of the other Significant things about VXLAN is that it extends the number of network IDs from 4096 where VLAN scope is, to 16 million ones. As I previously mentioned, VXLAN has a fixed 24-bit identifier, so-called ‘VNI’ or ‘VNID’ instead of VLAN. This could be an impressive evolution for Multi-tenant infrastructure. This identifier is only used when traffic leaves a leaf towards the Spine and then reaches another leaf. The VXLAN encapsulation is performed through a Leaf switch on the initiator host side, and the VXLAN decapsulation process is then performed through a Leaf switch on the destination host side. So the leaf switches are responsible for the encapsulation and decapsulation of VXLAN headers; The Tunneling process I mentioned a while ago. This is a role named ‘Virtual Tunnel Endpoint’ or VTEP. This encapsulation is shown in the picture below. The original ethernet frame is encapsulated into a new UDP within the new IP within a new Ethernet frame corresponding to the underlay network.

Unified Fabric for both Physical and Virtual Network.

One of the main goals of the VXLAN-based SDN technologies is to extend the VXLAN fabric to the virtualization network, remove any dependence on traditional VLANs in the same way, and ultimately, make the physical and virtual fabrics unified.

VXLAN Data Plane and Control Plane

The VXLAN encapsulation and decapsulation process briefly explained earlier, is the data plane for the overlay network. As I mentioned before, CISCO FabricPath relies on the typical Flood-and-Learn mechanism to learn end-host information. This means there is no Control Plane for this protocol in the Overlay network. But What about VXLAN?

VXLAN traditionally utilizes the Flood-and-Learn method in the same way, when transporting data over an underlay network. This method basically relies on IP Multicast for BUM traffics. Therefore the whole infrastructure needs to support multicast and administrators have to configure that. But there is an alternative to IP multicast for handling multi-destination traffic in a VXLAN environment. That is ‘ingress replication (IR)’, which is also called “head-end replication”. With Ingress Replication, every VTEP must be aware of other VTEPs that have membership in a given VNI. The source VTEP generates n copies for every multidestination frame, with each destined to other VTEPs that have membership in the corresponding VNI. This places an additional burden on the VTEPs, but it has the benefit of simplification since there is no need to run multicast in the IP underlay.

On the other hand, The best practice is to Use MP-BGP plus EVPN address family as the control plane along with VXLAN. In simple terms, The EVPN address family allows the host MAC, IP, network, VRF, and VTEP information to be carried over MP-BGP. In this way, as long as a VTEP learns about a host behind it, BGP EVPN distributes and provides this information to all other BGP EVPN–speaking VTEPs within the network. MP-BGP EVPN control plane finally reduces the need for flooding however, It doesn’t remove that completely. some kind of overlay traffic including ‘ARP’, ‘DHCP’, and clients who don’t have any sent or received packets; known as ‘Silent hosts’ may still incur flooding.

Beyond all that, VXLAN has great solutions and instructions for easy scaling out the fabric in multi-Site infrastructures and providing seamless Layer 2 and Layer 3 LAN extensions. It’s one of the most important shortcomings faced by previous technologies. Here is a practical white paper provided by CISCO to get more information about that.

Part Two: Conclusion

There are lots of things to discuss VXLAN, but giving an in-depth explanation of this technology is out of the scope of this article. Let’s conclude this section by answering the question: which of the mentioned challenges in part one does VXLAN address?? As I mentioned earlier, VXLAN MP-BGP EVPN with CLOS fabric infrastructure is a cloud-based fabric technology with a structured cabling model that removes the need for full-mesh connectivities we had before. It easily eliminates the large broadcast domains and STP protocol and provides seamless end-to-end workload mobility across all server racks by relying on Layer 2 tunneling over an L3 underlay network. more than that, It extensively expands multi-tenancy by replacing 24-bit VXLAN identifier (VNI) instead of traditional 12-bit VLAN. Further, VXLAN could be efficiently scaled out in form of a Multi-Site fabric, In terms where you have more than one data center, Or you have one large-scale infrastructure so that, implementing a single Leaf-and-Spine architecture is not the best practice for that. Although the configuration wouldn’t be that easy and straightforward, Anyway there would be no technology-related restriction on either workload mobility or VM mobility in Muti-Site VXLAN fabric.

In Contrast, the complexity of configuration and maintenance is the major problem with this technology. By the way, based on my previous discussion, With the advent of DevOps culture, we need a more agile infrastructure in which the service changes or launch of a new service could be done more quickly, accurately, and almost automatically. These concerns and requirements lead us to a door that opens to another generation, Actually the 5th generation of Data Center networks. The concepts such as SD-DC (Software-defined Data Center), NFV (Network Function Virtualization), and Infrastructure as Code (IaC), or Codifying the Infrastructure are the new terms introduced in this generation. These technologies are an important part of implementing DevOps practices and CI/CD. If the target technology is SDN, Then CISCO ACI would be one of the significant options alongside Open Source solutions such as OpenDayLight.

Part three: The benefits of CISCO ACI; How does it address these challenges?

Application-Centric Infrastructure (ACI) is one of the market-leading ready-to-use SD-DC solutions owned by CISCO which addresses all top 10 challenges mentioned in the first part of this article.

Easy Implementation, Configuration, and Troubleshoot

ACI has a lot to learn, and a lot to consider, but ultimately is easy to configure and easy to troubleshoot as long as you have an appropriate plan, and you know how to implement that. The technology relies on Nexus 9Ks in form of Leaf-and-Spine architecture and automatically builds the IP underlay leveraging IS-IS routing protocol, as well as VXLAN MP-BGP EVPN Overlay during the Fabric discovery process. Further, CISCO ACI constructs follow the Object-Oriented model so that you can create an object one time (Interface Policy Group for instance), but use it as many as possible. The Configurations are all done with just a few clicks, but you can still enjoy the benefits of automation using Ansible in working with ACI. eventually, ACI event logs and fault reports accurately clarify the misconfigurations and the details of failures that occur for any reason.

Complex Deployment of L4-7 service appliances

This is one of the major advantages of CISCO ACI over the traditional networks. ACI introduced the Application-Centric approach as an emerging perspective on network design. and following that, uses the benefits of the Service Graph template along with Policy-based redirect (PBR) to make an amazing change in the way service appliances are deployed on the network. As I previously explained, we will no longer face the limitations and challenges of VLAN Stitching and VRF sandwiching, and of course, there are new features to make policy enforcement more efficient. I have another article in which discussed this topic in-depth. Take a look at it if you’re interested.

Security concerns.

I’ve discussed the default behavior of traffic forwarding in CISCO ACI in the first part of this article. I just have to mention, CISCO ACI significantly increases the visibility and ability of security policy enforcement on the Data Center Network (DCN), Even in terms of having Cloud-Native infrastructure, without the extra cost burden.

A Unified Data Center fabric for both physical and virtual networks

In my opinion, This is the top advantage of CISCO ACI. Unlike some other SDN products that only rely on virtual networks and have no idea about physical infrastructure, ACI covers both physical and virtual fabric and truly conveys the concept we expect from SDN. (Remember that we have another term called “NFV”). ACI fabric could be easily extended to either VM-based virtual network or Container-based virtual network, thanks to the seamless integration It provides with a wide range of market-leading virtual machine hypervisors, Cloud orchestrators, and Container orchestration systems, such as VMware vCenter, Kubernetes, OpenShift, OpenStack, and so on. Ultimately, regardless of having various types of hosts consist of Bare-Metal servers, Virtual Machines, Containers (Pods), They are all connected to one common fabric that provides end-to-end communication, high-security visibility, and high operation speed.

Scalable Fabric across multiple sites

CISCO ACI is an extremely scalable SD-DC solution so that you can use for either small, medium, or large-scale environments, by starting with only 5 or 7 rack units and scale it out as the infrastructure is growing. It has a bunch of solutions that almost meet all your needs. Whether you have more than one data center, or a single large-scale data center, the combination of on-premise and cloud infrastructure, or even if all your services are deployed and rested on multiple cloud platforms (such as AWS, Azure). In cases you have more than one site, doesn’t matter if another DC is a remote site, if dark fiber with very low latency exists or not, and how far apart are the sites. The pictures below show the ACI fabric and policy domain evolution in different versions

Loop-Prevention mechanisms

There wouldn’t be loop problem within the CISCO ACI fabric itself however, the external resources such as an L2 switch that is connected to ACI leaf switches can cause the loop. That’s basically because ACI doesn’t participate in Spanning-tree so that there are no Spanning-tree processes running on any ACI switches in the fabric. Therefore, ACI doesn’t generate Its own BPDU packets but instead forwards STP BPDUs from another port where ever the same VLAN is allowed. CISCO has a best practice for this situation, that is using MCP or MisCabling Protocol. To get more information about how it works, you can read this white paper.

Since CISCO ACI has been built over VXLAN fabric, the other challenges have practically been addressed already.

0 replies on “The RoadMap to CISCO ACI – Whys and wherefores”

Related Post