Tom Herbert, CTO, July 15, 2024
Last time we examined horizontal parallelism, this week we’ll take a look at its counterpart: vertical parallelism. Where horizontal parallelism works by processing packets in parallel, vertical parallelism is about processing different parts of a single packet in parallel. The figure below ilustrates verical parallelism
Example of vertical parallelism for processing packets. There are five threads that can process different protocol layers (the colored blocks) of a packet in parallel
An example of verical paralellism
Consider processing for a simple TCP/IP packet. A TCP/IP packet is composed of two protocol headers, maybe an IPv4 header followed by a TCP header. Per the requirements, the IPv4 header must be processed before the TCP header. However, if we take some liberty, technically we just need to produce the correct effects of in order processing. Internally, we could TCP and IP header in any crazy way as long as we maintain the outward appearance of being processed in order. This is the ticket for vertical parallelism, we process different protocol layers in parallel and make it seem like they were processed in order. But what would it even mean to process a Network Header and Transport Layer in parallel? To answer that question, we need to consider the actual processing.
Processing an IPv4 header is mostly about validating the fields in the header. We need to check the version number is 4, check the header length doesn’t exceed the bounds of the packet. maybe check the TTL and Flags are valid. Computationally, the most expensive processing is probably verifying the Header Checksum is correct. TCP processing is a little more involved, we need to perform similar verifications as IPv4 including validating the TCP checksum is correct, but also need to perform a lookup in the TCP five tuple to find the Protocol Control Block for a connection, and then processed the TCP segment per the TCP state machine.
The key is to determine which processing in IPv4 and TCP can be parallelized. It turns out, that’s pretty simple to do. Any processing that is stateless or doesn’t have ordering constraints is parallelizable. So the processing for IPv4 that validates the header can run in parallel with TCP validations including the TCP checksum. The part that cannot be parallelized is the TCP state machine processing and updating of the TCP PCB– in other words the IPv4 processing must complete and have successfully validated the IPv4 header before TCP processing can affect the connection state. This model of parallelism scales to any number of protocol headers and even includes sub-protocols like TLVs where the effect is that the individual TLVs have been properly processed in order.
Extending veritical parallelism
Applying this model we are able to “find” a whole bunch of fine grained parallelism. But there’s a catch: we need a mechanism to synchronize processing that cannot be parallelized. For instance, in the TCP/IPv4 example, we need a synchronization mechanism to instruct TCP processing to wait for IP processing to complete before committing changes to the PCB. We have tailored a mechanism tailored to the unique characteristics of serial data processing called dependencies. That will be the topic for the next blog.
Vertical parallelism is a form of fine grained parallelism. Processing of a protocol layer can be counted in tens of cycles, so we really need a low overhead infrastructure to create parallel threads and handle the synchronization mechanism. In other words, we need to achieve the benefits we need to architect the system around the solution in a Domain Specific Architecture (we’ll talk about the details of such a solution later). We can also combine horizontal and vertical parallelism in hybrid parallelism (I suppose we could call that diagonal parallelism to be complete ;-) ). This gives us the best of both worlds: horizontal parallelism benefits packet processing throughput and vertical parallelism optimizes per packet latency.
Example of hybrid parallelism. Eight packets are being processed. The are two threads for pocessing in verical parallelism, and two packets can be processed in horizontal parallelism.
SiPanda
SiPanda was created to rethink the network datapath and bring both flexibility and wire-speed performance at scale to networking infrastructure. The SiPanda architecture enables data center infrastructure operators and application architects to build solutions for cloud service providers to edge compute (5G) that don’t require the compromises inherent in today’s network solutions. For more information, please visit www.sipanda.io. If you want to find out more about PANDA, you can email us at panda@sipanda.io. IP described here is covered by patent USPTO 12,026,546 and other patents pending.
ความคิดเห็น