Sunday, November 8, 2009

RCUs - Eliminate reference counts & increase maintainability

Reference counts in application specific blocks (records) are used for many purposes in Multicore environment. Main reason for reference count variable in records is to ensure that memory block is freed only after every module or core stop using that record. External modules normally increment reference count of the record before storing the pointer in their storage. Also cores using the record also increment the reference count, do processing and decrement once the pointer is no longer required in its processing. This will ensure that any core deleting the record will not free the record to heap prematurely and record gets freed only when its reference count becomes zero. Book keeping of record with reference count has many problems associated with it.

Let us take some dummy packet processing module 'dmod' running in a Multicore SMP operating system. It maintains its records in a hash list - drecList. Its record is called 'drec'. Let us also assume that there is one external module to this module called 'emod'. 'emod' has its own records of type 'erec' maintained in erecList. For discussion sake, let us assume that every time 'drec' is created, it calls a function in 'emod' which inturn stores 'drec' instance in its 'erec' instance. There are two paths for the packets. 'dmod' gets hold of packets from the link layer and also it gets packets from the 'emod'. 'emod' calls the 'dmod' packet processing function with the paacket and 'drec' pointer that was stored in its record.

Traditionally with spinlocks, it would be implemented as follows:

dmod:

'drec' structure contains two variable delete_flag, 'refCount' information. If both delete_flag is true and refCount is 0, then the record gets freed.

CreateDrecInstance()
{
  • Allocate memory for drec.
  • write_lock_bh(drecListLock);
  • Add the drec to the drecList
  • Set the RefCount to 1 and Delete_flag to 0;
  • IncDrecInstance(drec) because it is going to be referred beyond write_lock_bh
  • write_unlock_bh(drecListLock);
  • CreateERecInstance( drec );
  • return 'drec' instance to the caller.
}


DeleteDrecInstance( drec )
{
  • write_lock_bh(drecListLock);
  • Remove from linked list
  • DecRefCount(drec) as it was set to 1 as part of adding to the linked list as part of 'CreateDrecInstance' function.
  • write_unlock_bh(drecListLock);
  • DeleteDrec(drec);
}


SearchDrec(parameters)
{
  • read_lock_bh(drecListLock);
  • Search with parameters for matching drec;
  • if found, IncRefCountIfNotDeleted(drec) - since it is used by caller, reference count needs to be incremented here. If the record is already set for deletion, it is treated as record unmatched.
  • read_unlock_bh(drecListLock);
  • return matching 'drec' instance pointer.
}

IncRefCount(drec)
{
  • spin_lock_bh(drecRefCntLock);
  • increment drec->refCount
  • spin_unlock_bh(drecRefCntLock);
}


IncRefCountIfNotDeleted(drec)
{
  • spin_lock_bh(drecRefCntLock);
  • if ( drec->delete_flag is true) drec = null;
  • increment drec->refCount;
  • spin_unlock_bh(drecRefCntLock);
  • return drec;
}

DecRefCount(drec)
{
  • freeflag = false;
  • spin_lock_bh(drecRefCntLock);
  • decrement drec->refCount;
  • if ( drec->delete is true and drec->refCount is 0 ) freeflag = true;
  • spin_unlock_bh(drecRefCntLock);
  • if ( freeflag is true) Free drec memory to heap.

}

DeleteDrec(drec)
{
  • freeflag = false;
  • Call EmodDereference(drec); Call external module to remove reference to this. Note that this may not remove reference immediately in this function call. This function might generate an event and come out. That event eventually in some other thread context might get processed and reference to drec would be removed.
  • spin_lock_bh(drecRefCntLock);
  • set drec->delete to true;
  • if (drec->delete is true and drec->refCount is 0 ) freeflag = true;
  • spin_unlock_bh(drecRefCntLock);
  • if ( freeflag is true) Free drec memory to heap;
}


DmodPacketProcess(pkt)
{
  • Get search parameters from 'pkt'.
  • SearchForDrec(parameters);
  • if ( drec is null) CreateDrecInstance(parameters);
  • DoPktProcessing(drec, pkt);
  • DecRefCount(drec);
}

It is expected that 'emod' function while storing the 'drec' instance pointer does with its own spin_lock and read by incremeting 'drec' instance pointer.

There are two spin locks are taken on per packet basis - One list lock and another to increment reference count. As you might have known, locks are expensive even if there is no contention. Number of instructions that get executed as part of lock/unlock functions are significant.

RCUs alone will work fine and can be used to eliminate list locks and also to eliminate the need for reference counts as long as no external modules store the records by reference. In earlier example, there is 'emod' which stores the drec pointer. Since external module may not remove the reference to drec in the current processing cycle, reference counts are required. Using arrays with magic numbers for external module purposes will eliminate the need for reference counts.

Reference count mechanisms for one major reason of many bugs in Multicore environment. Elimination of need for having reference count would be welcome by many software developers. It is not a good practice to let external module store the reference (pointer) to the module records. Rest of this article describes one technique.

RCUs with Array level indirection would eliminate the reference counts in records. Let me take 'dmod' to explain this concept. In addition to maintaining the hash list it also needs to have one array of size 'number of records' that are possible in this module. Each array element has 'drec pointer', 'magic number'. External modules are not expected to store the pointer to drec at any time. They are expected to store index to the array and the magic number. External modules are expected to check the validity of array element by matching the magic number stored with the magic number in the array element. If both match, then it can assume that 'drec' pointer is valid in the array and can use the pointer for rest of processing.

struct RCU_indirection {
unsigned int magic; -- Value 0 means it is unused, any value indicates rec is valid.
void *rec;
} ;

dmod:

struct RCU_indirection drecArray[MAX_DRECS];

'drec' need not contain 'refCount' and 'delete' flag. But one variable 'index' is required.

Have global magic count : drecMagic = 1;

CreateDrecInstance()
{
  • Allocate memory for drec.
  • write_lock_bh(drecListLock);
  • Add the drec to the drecList
  • Add the drec and drecMagic in a free array element.
  • Increment drecMagic
  • Put the index into array element in 'drec'
  • write_unlock_bh(drecListLock);
  • CreateERecInstance( index, magic );
  • return 'drec' instance to the caller.
}


DeleteDrecInstance( drec )
{
  • write_lock_bh(drecListLock);
  • Remove from linked list
  • Set the magic to zero in the array element pointed by index in drec.
  • Set the drec to NULL in the array element.
  • wmb();
  • write_unlock_bh(drecListLock)
  • call_rcu ( ) : Callback function associated with it will free the drec.
}


SearchDrec(parameters)
{
  • Search with parameters for matching drec
  • return matching 'drec' instance pointer.
}

DmodPacketProcess(pkt)
{
  • rcu_read_lock_bh();
  • Get search parameters from 'pkt'.
  • SearchForDrec(parameters);
  • if ( drec is null) CreateDrecInstance(parameters);
  • DoPktProcessing(drec, pkt);
  • rcu_read_unlock_bh();
}

When 'emod' calls 'dmod' packet processing function, it should do following:

{
  • rcu_read_lock_bh()
  • if ( drecArray[erec->drecIndex].magic == erec->magic) drec=drecArrayp[erec->drecIndex].drec
  • DoPktProcessing(drec, pkt);
  • rcu_read_unlock_bh();
}

As you see it is simple and eliminate two locks on per packet basis. Even if there are more external modules, this scheme works. If record pointers are required to be stored in any other data structures within the module, then too there is no need for any reference counts as long as those data structures also manipulated within the same spinlock ( 'listlock' in example above).

No comments: