Saturday, December 26, 2009

Network Packet Processing Applications - Example code using RCUs.

Data plane portion of embedded network devices typically does the packet processing of various data paths - Firewall, NAT,  IPsec, routing and bridging.  Close observation of these datapaths follow similar packet processing path.

Data path consists specific packet processing and interfaces with control plane. Structure of the datapath typically consists of:

Packet processing:
  • Packet-In (Get hold of packet).
  • Parse and Extract fields.
  • Check for matching context in 'Context' table.
  • Context would give information to act on the packet.
  • Modify the packet as per the datapath functionality and based on the information in the matching context.
  • Send the packet out.
At various stages, appropriate statistics counters are incremented. If there is no matching context for a given packet, packet is handed over to Control plane.  Control plane, based on its database, creates the context in the datapath.

Datapath-Control Plane interface:

  • Acting on Create/Delete/Modify/Query Context commands from CP
  • Responding to commands after acting on them.
  • Yet times, data paths create unsolicited events to CP
    • Exception packets - Packets that don't have matching context.
    • Local packets - Packets meant for local applications or packets matching context that are specifically marked to send the packets to CP.
    • Logging/Audit events.
In almost all cases, the context database is searched for every packet and context database is updated on behalf of control plane for exceptions.  That is,  search operations are lot higher than the update operations on these databases.  In Multicore environment, this context database is accessed from multiple cores.  To ensure that integrity of the context databases,  mutex locks are normally used.  For databases where the search operations are more frequent than update operations,  read-write locks are used.  As discussed in the previous post on RCUs,  read-write locks are expensive in SMP systems such as Linux.  RCUs are best suited for these databases as rcu_read_lock and rcu_read_unlock are very light and in some CPU architectures, these functions are empty functions. 

RCUs also provide good functionality and supports safe-reference to these contexts when external modules require the reference to the contexts.   As explained in the post, http://netsecinfo.blogspot.com/2009/11/rcus-eliminate-reference-counts.html,  external modules need not store the context reference as pointer. Please note that I have used the term 'record' in that post instead of term 'context'.  You can treat them to be one another same.

With both of these advantages,  I thought it is better to showcase these two techniques using C code (sort of).

Hash table is one of the common data structure used in datapaths.  I will take hash table implementation and showcase how RCUs and safe reference techniques can be applied on hash table.   This hash table implementation is made as generic enough so that multiple datapaths can use these for their own context databases.   By making it generic library, each datapath developer don't have to worry about RCUs and Safe reference techniques and can achieve this using generic library.

My intention of C code is only to showcase the techniques. Code given below is done in a day or so and the code is not tested. There is no guarantee that this code works. 

MTHT Library:

/***

  * DLL Node requires two pointers - Next and Prev. 
  * This structure defines these two pointers.
  * This hash table implementation supports the case
  * where there are two flows for a context and hence
  * there would be two DLL nodes.

***/


typedef struct HashNode_s {
    struct HashNode_s *next;
    struct HashNode_s *prev;
    unsigned int sr_array_index;
}
HashNode_t;

/** HashCommonInfo_t memory is expected to be part of datapath
    application context
    table pointer is required to trace back to table as part of
    RCU callback function.

**/
struct MT_Hash_Table_s;

typedef struct MTHashCommonInfo_s {
     struct rcu_head rcuh;
     HashNode_t  flow1;
     HashNode_t  flow2;
     struct MT_Hash_Table_s *table;
}MTHashCommonInfo_t;

/** Hash Table contains multiple Hash Buckets.
    This structure defines the hash bucket.  In this case
    hash bucket contains the pointer to the first node in the bucket
**/

typedef struct MT_Hash_Bucket_s {
     HashNode_t  head;
}MT_Hash_Bucket_t;

typedef struct MT_Safe_Reference_Node_s {
    unsigned int cookie_val;
    MTHashCommonInfo_t *common_node;
    int array_index;
    struct MT_Safe_Reference_Node_s *next;
}MT_Safe_Reference_Node_t;

/** Hash Table :  Main structure pointing to the hash table instance.
      It consists of buckets (Bucket list and number of buckets)
      lock variable, which is used when the hash table is updated.
      Cookie value and free safe reference list are mainly for safe reference
      implementation.
**/

typedef struct MT_Hash_Table_s {
    int number_of_buckets;
    MT_Hash_Bucket_t *buckets;
    struct futex lock;
    unsigned int  cookie_val;
    unsigned int safe_reference_nodes;
    struct MT_Safe_Reference_Node_s *sr_array;
    struct MT_Safe_Reference_Node_s *free_index_list;
    unsigned int (*match_func)();
    unsigned int (*rcu_app_callback)(); 
}MT_Hash_Table_t;


/** Creates the hash table and returns the pointer to the hash table
**/

MT_Hash_Table_t* MTHT_create_hash_Table( int number_of_buckets,
            int max_number_of_common_nodes, 
            unsigned int (*match)(),  
            unsigned int (*app_rcu_clbk)() )
{

        MT_Hash_Table_t *table;

        /** Allocate hash table **/
        table = calloc(1, sizeof(MT_Hash_Table_t));

        if (table == NULL )
            return (NULL);


        /** Allocate buckets **/

        table->buckets = calloc(number_of_buckets,  sizeof(MT_Hash_Bucket_t));

        if (table->buckets == NULL )
        {
              free(table);
              return(NULL);
        }

    /** Allocate space for safe reference nodes - common information nodes **/

        if ( MTHT_init_sr_array(table, max_number_of_common_nodes) == 0)
        {
              free(table->buckets);
              free(table);
              return (NULL);
        }


         /** Initialize the hash table **/
         /** Each bucket is arranged in circular linked list fashion
             so that the deletion of a hash node in future does not require
             to know the head pointer
         **/
        {
            int ii;
            for ( ii = 0; ii < number_of_buckets; ii++ )
            {
                table->buckets[ii].head.next = table->buckets[ii].head.prev
                     = &table->buckets[ii].head;
            }
        }

        table->number_of_buckets = number_of_buckets;
        futex_init(&table->lock); /** Initialize the lock **/
        table->cookie_val =  1; /*Start the safe reference cookie value at 1*/
        table->match_func = match;
        table->rcu_app_callback = app_rcu_clbk;
        return(table);
}


/**
  MTHT_remove_hash_table assumes that all nodes are freed already.
  This function frees up all the memory allocated during 'create' function.
**/

void MTHT_remove_hash_Table( MT_Hash_Table_t *table)
{
    free(table->sr_array);
    free(table->buckets);
    free(table);
}

unsigned int MTHT_get_free_sr_index(MT_Hash_Table_t *table);

/** This adds the node into hash table. 
    Since ther are two flows, there would bbe two keys passed to it **/

unsigned int MTHT_Add_Node( MT_Hash_Table_t *table,
  MTHashCommonInfo_t *flowsInfo, unsigned int hash_key1,
  unsigned int hash_key2)

{
   unsigned int free_index;

   futex_lock_take(&table->lock);
  

   /** Get free array index **/

   if ((free_index = MTHT_get_free_sr_index(table)) == 0 )
      return 0;
    

   table->sr_array[free_index].cookie_val = table->cookie_val++;
   table->sr_array[free_index].common_node = flowsInfo;
   table->sr_array[free_index].next = NULL;

   /** Adding the flow should be the last step **/
   /** Add flow1 **/
   /** Add flow1 to circular double linked list in the beginning,
       right after head
   **/

   {
      HashNode_t *head;
      HashNode_t *flow;
      head =  &table->buckets[hash_key1 % table->number_of_buckets].head;
      flow = &flowsInfo->flow1;

      flow->sr_array_index = free_index;
     /** DLL Manipulations, but RCU purposes, we need to use
         rcu_assign_pointer to take care of CPUs with weak ordering
      flow->next = head->next;
      head->next = flow;
      flow->prev = head;
      head->next->prev = flow;
     **/

     /** Even though it may not be requierd in some CPUs, this is good
      ** practice. Also, it tells the pointers which are important
      **/
      rcu_assign_pointer(flow->next, head->next);
      rcu_assign_pointer(head->next, flow);
      rcu_assign_pointer(flow->prev,  head);
      rcu_assign_pointer(head->next->prev,flow);


   }

   /** Add flow2 **/
   {
      HashNode_t *head;
      HashNode_t *flow;
      head =  &table->buckets[hash_key2 % table->number_of_buckets].head;
      flow = &flowsInfo->flow2;
      flow->sr_array_index = free_index;

      /** Following statements are converted to RCU
      flow->next = head->next;
      head->next = flow;
      flow->prev = head;
      head->next->prev = flow;
      **/

     /** Even though it may not be requierd in some CPUs, this is good
      ** practice. Also, it tells the pointers which are important
      **/
      rcu_assign_pointer(flow->next, head->next);
      rcu_assign_pointer(head->next, flow);
      rcu_assign_pointer(flow->prev,  head);
      rcu_assign_pointer(head->next->prev,flow);

   }


   futex_lock_release(&table->lock);

   return(1);
}

/**
 * This function removes the node from the hash table
 * hash keys are not required as buckets maintain the double linked list.
**/

void MTHT_RCU_free_fn(struct rcu_head *);

unsigned int MTHT_Remove_Node(MT_Hash_Table_t *table,
    MTHashCommonInfo_t *flowsInfo)
{
 
    unsigned int array_index;

    futex_lock_take(&table->lock);

    array_index = flowsInfo->flow1.sr_array_index;

    /** Invalidate the sr_array node, but don't put in the
        free list yet. Freeing to the linked list do be as part
        of RCU callback.  Otherwise, there is a possibility of
        same index being used by some other context
        Do this before removing the flows.
    **/

    table->sr_array[array_index].cookie_val = 0;
    table->sr_array[array_index].common_node = NULL;


    /** Remove flow1, but don't initialize the pointer to NULL as
     ** other thread will continue to travese the linked list
    **/

    {
       HashNode_t *flow;

       flow = &flowsInfo->flow1;
       /** Make this DLL steps RCU friendly
       flow->next->prev = flow->prev;
       flow->prev->next = flow->next;
       **/

       rcu_assign_pointer(flow->next->prev, flow->prev);
       rcu_assign_pointer(flow->prev->next, flow->next);
    }

    /** Remove flow2 **/
    {
       HashNode_t *flow;

       flow = &flowsInfo->flow2;
       /** Make this DLL steps RCU friendly
       flow->next->prev = flow->prev;
       flow->prev->next = flow->next;
       **/

       rcu_assign_pointer(flow->next->prev, flow->prev);
       rcu_assign_pointer(flow->prev->next, flow->next);
    }


    /** Set up the callback for RCU infrastructure to indicate us when
     ** all threads complete their current scheduled cycle
     **/

    call_rcu(&flowsInfo->rcuh,   MTHT_RCU_free_fn );
    futex_lock_release(&table->lock);
}

/** Seach function.
 ** Up to 4 search keys can be passed
 **/

HashNode_t *MTHT_Search_Node(MT_Hash_Table_t *table, unsigned int hash_key,
               void *search_key1, void *search_key2, void *search_key3,
               void *search_key4, unsigned int (*search)(),
               unsigned int *sf_index, unsigned int *cookie_val  )
{

   HashNode_t *temp, *head;
   unsigned int array_index;
   unsigned int cookie;
   MTHashCommonInfo_t *common;
   rcu_read_lock();

   {

       /** Start from head  **/
       head = &table->buckets[hash_key % table->number_of_buckets].head;
       temp = rcu_dereference_pointer(head->next);
      
       while(temp != head)
       {
           if (search(temp, search_key1, search_key2, search_key3, search_key4)
                   == 1 ) /** Match found **/
           {
             
              array_index = temp->sr_array_index;
              cookie = table->sr_array[array_index].cookie_val;
              common =  table->sr_array[array_index].common_node;
              if((cookie != 0) &&  (common != NULL))
              {
                 /** while traversing the list, deletion process might have
                     started, if you find any of these variable being reset
                     assume that this node is not matched. Since these
                     variable are not reset, break
                 ***/
                 break;  
              }
           }

           temp = rcu_dereference_pointer(temp->next);
       }

       if (temp == head )
       {
          /** No Match found **/
          temp = NULL;
       }
       else
       {
        *sf_index = array_index;
        *cookie_val = cookie;
        
       }

   }

   rcu_read_unlock();

   return(temp);
   
}



void MTHT_free_sr_node(MT_Hash_Table_t *table, unsigned int array_index);

void MTHT_RCU_free_fn(struct rcu_head *ptr)
{
     MTHashCommonInfo_t *flowsInfo;

     flowsInfo = container_of(ptr,  MTHashCommonInfo_t, rcuh);

     futex_lock_take(&flowsInfo->table->lock);
   
     /** Free the array index into free sr list **/
     MTHT_free_sr_node(flowsInfo->table, flowsInfo->flow1.sr_array_index); 

     futex_lock_release(&flowsInfo->table->lock);

     /** Now call applicaton specific callback with common Info pointer
     **/

     flowsInfo->table->rcu_app_callback(flowsInfo);
}

/** Initialize Safe Reference Array (also called Indirection Array)
**/

int MTHT_init_sr_array(MT_Hash_Table_t *table,
              unsigned int max_number_of_common_nodes)
{
    unsigned int temp_index = 1;
    struct MT_Safe_Reference_Node_s *temp;


    table->sr_array =  calloc(max_number_of_common_nodes,
                            sizeof (MT_Safe_Reference_Node_t));

     if(table->sr_array == NULL )
         return 0;

     /** Create the free list of array indices **/

     {
         temp = &table->sr_array[0];
         temp_index = 1;
         table->free_index_list = temp;


         while(temp_index <= max_number_of_common_nodes)
         {
            temp= &table->sr_array[temp_index];
            temp->next =  table->free_index_list;
            table->free_index_list = temp;
            temp_index++;
         }

     }
     return(1);

}

unsigned int MTHT_get_free_sr_index(MT_Hash_Table_t *table )
{
  unsigned int free_index;

  if(table->free_index_list == NULL )
  { 
      return 0;
  }

  free_index =  table->free_index_list->array_index;
  table->free_index_list = table->free_index_list->next;

  return(free_index);
}

void MTHT_free_sr_node(MT_Hash_Table_t *table, unsigned int array_index)
{
  MT_Safe_Reference_Node_t  *node;
 
  node = &table->sr_array[array_index];

  node->next = table->free_index_list->next;
  table->free_index_list = node;
  
}


I hope it helps in understanding how 'RCU's and safe reference techniques can be used using hash table.


No comments: