Wednesday, April 30, 2008

Firewall session inactivity configuration - TR-069 Support

Firewalls maintain session entries for each 5 tuple connection. Session entries are created upon first packet of the connection. TCP Sessions are removed whenever TCP RST packet is observed or ACKs for TCP FINs are observed in both directions. Sessions are also removed when there is no traffic for some period of time. This period of time is called inactivity timeout period. Non-TCP sessions such as UDP, ICMP sessions are removed only due to inactivity as they don't have any connection boundaries.

There are multiple application protocols (services) running on TCP and UDP transports such as FTP, Telnet, SSH, HTTP, LDAP, RADIUS, SMTP, POP3, IMAP etc. Some application protocols are interactive applications and many are non-interactive applications. Telnet, SSH, FTP are interactive applications. HTTP, HTTPS and many others are non-interactive. In non-interactive applications, once the connection is made to the server, there is no user input in between until the connection is terminated. Entire user input is taken before making the connection. In interactive applications, user input is taken after connection is established. Inactivity timeout period for non-interactive protocols can be in terms of 10s of seconds. Since interactive applications wait for user input, less inactivity timeout value may remove session if user does not feed any input data for a longer duration. So, interactive application protocols require longer inactivity timeout. If longer inactivity timeout is configured for non-interactive application protocols, there is a danger of keeping stale sessions for a longer time and that may result in session entries exhaustion problem. So, there is need for providing different inactivity timeout values for different protocols.

Please refer document on maximizing firewall availability. One of the techniques suggested there was to provide 'INIT Flow Timer optimization'. This approach suggests to have separate inactivity timeout period during connection establishment phase (3 way TCP handshake) . This inactivity timeout value can be way less than the inactivity timeout needed after connection is established.

Keeping both of above points in mind, session inactivity configuration includes following:
  • TCP Pre Connection inactivity timeout : Inactivity timeout during connection establishment phase.
  • UDP Pre Connection inactivity timeout: UDP does not have any connection establishment phase. For this discussion, UDP connection establishment phase considered complete when it receives at least one packet in both directions of the connection (client to server and server to client).
  • TCP Inactivity timeout: Inactivity timeout value after TCP connection is established.
  • UDP Inactivity timeout: Inactivity tiemout value after UDP session is established.
  • Generic IP inactivity timeout: Inactivity timeout value for non-TCP and non-UDP sessions.
  • TCP FIN timeout: Inactivity timeout after TCP FINs are observed in both directions.
  • Application protocol specific timeout records. Each record containing
    • Application protocol information : Protocol and Port
    • Inactivity timeout value: This inactivity timeout value is used after connection is established. If there is no matching application protocol specific inactivity timeout record, then TCP, UDP or generic IP inactivity timeout value is used.
TR-069 based configuration:
  • internetGatewayDevice.security.VirtualInstance.{i}.firewall.serviceInactivityTimeout P
    • tcpPreConnTimeOut: RW, Unsigned Int, Default : 10 seconds - Value in seconds.
    • udpPreConnTimeOut: RW, Unsigned int, Default: 10 seconds - Value in seconds.
    • tcpTimeOut: RW, Unsigned Int, Default: 60 seconds - Value in seconds.
    • tcpFinTimeOut: RW, unsigned int, default: 10 seconds - Value in seconds.
    • udpTimeOut: RW, Unsigned Int, Default: 60 seconds - Value in seconds.
    • IPTimeOut: RW, Unsigned Int, Default: 60 seconds - Value in seconds.
    • internetGatewayDevice.security.VirtualInstance.{i}.firewall.serviceInactivityTimeout.applicationTimeout.{i} PC
      • name: String(32), RW, Name of the record. Once the record is created, this can't be changed.
      • Description: String(128), RW, Optional - Description of the record.
      • protocol: String(8), RW, Mandatory - Takes values "tcp", "udp"
      • port: String(8), RW, Mandatory - Takes port value
      • inactivityTImeout: Unsigned Int, RW, Mandatory - Inactivity timeout in seconds.

Tuesday, April 29, 2008

Software Timers in Network infrastructure devices

Most of the networking functions in network infrastructure devices require timers on per session basis. Some of the examples:
  • Firewall and Server Load balancing devices require inactivity timer for each session they create. Once the session is established, activity on the sessions would be observed. If there are no packets for certain time (inactivity timeout),  the session would be removed.  In addition,  these functions also remove session upon observing TCP RST or TCP FIN packets.  In case of non-TCP session,  sessions are removed only upon inactivity, 
  • IPsec VPN, ARP, QoS functions start timer for each session/flow. These timers are typically called 'life expiry' timers. These timers almost all the time would expire.  Once the timers expire, these sessions would be removed or revalidated. In case of QoS, most of these timers are periodic timers which would be started immediately after expiry.
  • Protocol state machine timers:  These timers are started on per session basis, some time more than one timer on session, during session establishment as part of signalling protocol.  These timers are started when message is expected from the peer and stopped upon receiving the peer message.  So, in normal cases, these timers always are stopped.
Operating systems like Linux provides timer module. Typical API functions provided by timer module, at high level are:
  • add_timer:  It starts the timer with the timeout value given to it. This function also takes the function pointer of the application and word argument (callback argument).  This function pointer is called when the timeout occurs. This function pointer is passed with the callback argument.
  • del_timer:  This function stops the timer that was started earlier.  Many applications expect that once the timer is stopped,  application function pointer is never called. 
  • mod_timer:  This function is called to restart the timer with new timeout value.

Some other characteristics and requirements of networking functions with respect to timers are:
  • Any device targeted towards Enterprise, data center and service providers would have sessions ranging in Millions. Hence millions of timers are required.  I have observed 
    • 2 M Sessions in case of firewall and SLB devices.
    • 50K IPsec VPN tunnels.
    • 500K of QoS Session (One session per subscriber & type of traffic )
  • Session establishment rate is very important in addition to throughput.  Some of firewall and Server load balancing devices support connection rate in the range of 250 to 500K Connections/sec.  IPsec VPN function in high end market segment require SA establishment rate in the range of 20K/sec.  Hence it is very important that very less number of CPU cycles are used in 'adding the timer', 'removing the timer'. 
  • Firing the timer (timer expiry) happens most of the time.  Let us see some scenarios.
    • As discussed above some networking functions such as IPsec VPN, ARP, QoS timers always expire.  
    • In case of firewall and SLB,  it may appear that expiry does not happen most of the time. In case of non-TCP sessions,  timer expiry always happens.  Even in TCP too,  timer expiry is not uncommon. Many TCP connections are non-interactive. Hence the the inactivity timeout is normally is up to 60 seconds.  In the lab environment,  connection rate and hybrid (connection rate with realistic payload) measurements are always result to timer being stopped using TCP RST or FINs, that is, each connection ends before inactivity timeout expires.   In real work loads,  the connections can stay beyond the inactivity timeout. In these cases, timers get expired and timers get restarted if there was activity. Activity was normally found out using last packet processed time stamp.  
  • Jitter is one important item that these applications should ensure that it is minimum.  What it means is that core should not be doing too much of work while traversing the timers to check for expiration. Also it should not be firing too many timers in a loop.  If these issues are not taken care,  the packets which come in  will have a big latency and hence the jitter.  It is expected that the timer subsystem divides the work evenly there by having uniform latency and less jitter.
  • High end network devices normally has big amount of memory. Note that they also support Millions of sessions. Timer is just one part of it.  Timer mechanism should not be taking too much of memory. If the system support Y Million sessions  and if each timer takes X number of bytes, the amount of memory consumed should not exceed X*Y.  It is okay to expect some minimal amount of memory beyond this for housekeeping. But memory should not be in terms of Multiples of X*Y product.
    • Some implementations, upon deleting the timers, keep some state information until the timer really gets expired.  I guess these implementation do this to improve the performance such as avoiding the locks between add/delete. Though this state information size is less than the timer block size, still this can lead to exponential memory usage. Let us say that this state information is 8 bytes.  Let us also assume that  the system supports 250K connections/sec and has 2M session capacity. That is, the system can start and stop 250K timers every second. Let us also assume that the inactivity timeout is 60 seconds. That is, in 60 seconds, there would be 250K*60 timers are added and deleted.  If 8 bytes of state information is required until expiry, than it requires additional memory of 250k*60*8 = 120Mbytes of memory. This is in additional to memory required for active timers for 2M sessions.  That would be unacceptable. Think of the case where the inactivity timeout is more than 60 seconds. 
Based on above, all timer functions such as addition, deletion and expiry are all need to be performing well with less jitter and does not exponentially increase the memory requirement.   In case of Multicore processors, it is also requirement to avoid locks as much as possible to get the linear performance with number of cores.

Linux new timer implementation  based on 'Cascaded timer wheel' satisfies many of the requirements.  It is not taking care of two items - Minimizing the jitter and avoiding locks.

Linux implements multiple timer wheels with different granularity. Lowest level wheel granularity is jiffy based. Next level timer wheel granularity is previous level granularity * number of buckets in the previous wheel. For example, if the first level wheel has 256 buckets, then second level wheel granularity of each bucket is 256 jiffies.  When the timer is started,  it is kept in appropriate wheel.  Each timer wheel has running index.  Every time jiffy interrupt occurs,  the timers in the bucket indexed by 'running index' of lowest level wheel are expired and then the index moves to next bucket.  When it reaches the end, then it moves all timers in the next wheel are moved to this wheel and running index of next wheel is incremented. Moving the timers from next wheel to current wheel happens every time the current wheel index is being circulated to the beginning of the wheel.   While moving the timers to the current level from the next level of wheel, it distributes the timers based on the timeout value.   Timers are kept in the double linked list. Hence the addition and removal of timers is very fast.

It works fine except the case where it needs to move the timers from the next wheel to the current wheel and distribute them.  This case may not happen if the timers are stopped (deleted).  In some work loads, timers may not be stopped before they expire.  Note that it may involve moving large number of timers and this lead to latency of the packets which are being processed. If it takes longer, this may even result to dropping of the packets.  I do have some unproven ideas and I will get back when I crystalise those ideas in my mind.

Linux implementation also uses locks to protect the bucket lists as different cores add and remove a given timer.  Locks can be avoided with some simple changes - Maintain timer wheels on per core basis. If the timer is being stopped by another core, then indicate this to the original core. This will ensure that only the core who started the timer would delete it from the bucket and it is the one which fires the application callback upon timer expiry.  To ensure that there are no locks for indicating the deleted timer indications to the  original core,  ring based arrays (one array for each core) can be used.

    Tuesday, April 15, 2008

    Network security computing with Session parallelization - Development tips

    Until 2005, performance of network security was a function of processor power. Processors were just keep getting better and faster and thereby the network security performance. In recent past, processors are not getting better, but more processors (cores) are being added to the chip. I guess physics limited the processor technology getting faster. Manufacturers are concentrating on increasing gate count in the chips than increasing the clock rate. Chip vendors are adding more processing cores in one die. These chips are called 'Multi Core Processors' (MCP).

    Before any application takes advantage of MCPs, operating systems should support Multiple cores. Linux OS and other operating systems have been supporting MCPs for last few years. Linux OS calls this feature as 'SMP' (Symmetric Multi Processing). In this mode, all cores share the same memory i.e code and data segments are same across all cores. Since any core can access any part of the code and data, it is necessary to protect the critical data by serializing the access to the code segment accessing critical data.

    Applications are expected to take advantage of multiple cores in the system. Now the mantra for applications is parallelization. There is no telling that parallelization of software is not easy. It is time consuming and takes significant investment.

    One of the serialization technique is to stop cores from processing the critical code while other processor is executing that code. This is typically done in Linux Kernel space using spin locks. Too much of serialization reduces the performance and the application performance does not scale well with the number of cores. At the same time, critical data and data structures (such as binary trees, linked lists etc..) must be protected by serialization.

    Other Difficulties in parallelization of code are:
    • Parallelization is subject to many errors such deadlocks, races conditions and more importantly it is difficult to debug.
    • Reproducibility of problems are difficult. Hence problem identification takes longer development cycles.
    • Maintainability of the code: First time the code might have been written keeping all parallelization problems in mind. As time progresses, different developers work on the code and maintain the code. They may not be aware of parallelization problems and make mistakes.
    • Single CPU based test coverage is not sufficient. Updated test suite should be able to find as many race conditions as possible.

    Any parallelization approach should make it simple to develop, maintain code and yet efficient. 'Session parallelization' approach is one method that makes this task simple for networking applications running in the kernel space.

    Network security primarily consists of firewall, IPsec VPN, Intrusion Prevention, URL filtering and Anti Virus/Spam/Spyware functions. To improve the performance of firewall, IPsec VPN and in some cases IPS functions, they are typically run in Linux kernel space. Each security function has notion of session. In case of firewall and IPS, sessions are nothing but 5 tuple flows and in case of IPsec VPN, session is 'SA Bundle'. Network security functions maintain state information within the sessions. State information, some times, changes on per packet basis and new packet processing depends on the current state. There are two ways to ensure the packet synchronization is ensured with respect to states. One way is serialize the code path that updates and checks the state information. If there are multiple instances where states are checked and modified in the packet path, then there are multiple serializations. Depending on number of serializations, performance impact is smaller or higher.

    Another way is 'Session parallelization'. In session Parallelization, packet synchronization happens at the session level. At any time, only one core owns the session. If any other core receives the packet, then it is queued in the session for later processing by the core owning the session. If no core owns the session, then the core that received the packet starts processing the packet after stamping its ownership. Once the core processes the packet, it checks whether there are any packets pending to be processed. If so, it processes them and if not it disowns the session. Using this method, upon session identification, rest of packet processing does not need to take any locks for serialization. It not only improves the performance, but also less error prone. Having said that, locking of code can't be avoided in some cases such as session establishment, inter session state maintenance.

    During session establishment, access to the configuration information is required. Configuration update can happen in one core context, while session establishment happens in some other core context. Data integrity must be maintained to ensure that core does not get the wrong configuration information. Also, data structure integrity must be maintained. Think of cases where configuration engine is removing a configuration record from a linked list and session establishment code is traversing the linked list. If care is not taken, then session establishment code might access invalid pointer resulting null pointer exception. Since, session establishment phase required read only access to the configuration structures, locks can be taken in read only mode. In read only mode, multiple cores can do session establishment for different sessions and thereby it does not effect the session establishment performance. Since configuration updates are less common, read only locks usage does not degrade the performance.

    Another area where locks to be taken during packet processing is in situations where inter session state is maintained such as 'statistics counter update', session rate control state etc.. Since Many cores work on the different sessions simultaneously, it is necessary that inter session state information is protected using write locks around the code that manipulate the state information. It is very important that amount of code that is executed under locks is as small as possible as possible for better performance.

    Though there are cases where locks are used such as inter session state and configuration data access, these instances are less in number. Session state updates and accesses are more common. As long as session parallelization is done, the SMP problem is manageable. Ofcourse, one could argue that a given session performance is limited by power of one core. But, in typical environment, there are many sessions and hence the system throughput will be proportional to number of cores.

    Cloud Computing and security

    Cloud computing has become popular term in recent past. Cloud computing providers have large number of cloud servers interconnected. They provide services to end users - Renting virtual server with CPU power required, Storage and some specialized services such as PHP, Java, Ruby-on-rail based servers etc..

    Since these servers are outside of offices, it is required that you have very good internet connectivity. Cheap bandwidth and reliable connectivity favors the cloud computing model. From cloud computing provider perspective, this is becoming possible with very high speed, high density multi core processors and virtualisation with its inherent facility to provide isolation and running multiple services on a physical hardware.

    Advantages of cloud computing for users (Enterprises) are same advantages you get with data centers such as
    • Reduce system and network infrastructure administration burden.
    • Save on Electricity cost by selecting data center with lower cost of electricity.
    • Save on real estate.
    Cloud computing provides additional advantages such as
    • Handle peak loads by provisioning computing power with a click of a button.
    • Isolation of application servers from physical machines.

    There are some concerns which are not yet fully matured.

    • Who is going to take care of security aspects of user applications? Is this cloud computing provider or is it the responsibility of users?
    • Who monitors the vulnerabilities of different applications and takes care of patchoing them?
    • Will there by any visibility provided of exploits and attacks to the user?
    • Who takes responsibility of provisioning security infrastructure? Who takes responsibility of tuning IPS/IDS signatures?
    • Who takes responsibility of complaint requirements such as PCIDSS etc..?
    • Who takes responsibility of auditing systems, application etc..?
    • If you have remote users that need access to these services, what kind of security on the wire required and who provides VPN Connectivity?

    When cloud computing provider provides specialized services such as Email etc.., I feel that it is responsibility of cloud computing provider to check for vulnerabilities, hardening, patching, checking for spams and preventing from phishing attacks etc.. Do they do that today? What kind of guarantees provided?

    When cloud computing providers provide generic services such as renting out virtual server, I have a feeling that responsibility of security them may fall on user s's shoulders. Now the questions arise such as:

    - Do Cloud computing SPs provide *Cloud Security* service?
    - Do SPs give flexibility for users to select their own security vendor?
    - Do SPs expect security appliance is provisioned as Virtual service? If so, what kind of virtualization technology SPs provide?
    - Do SPs provide network visibility for user to link the security service with application servers.


    It is not possible for cloud computing providers to provide security for applications which they don't know. Many security problems are specific to each application. Typically Enterprises have their own applications in addition to standard applications. As you see in the questions, there is lot of tuning on security applications, such as adding new signatures in IPS, that happen over time. So, it makes sense for cloud computing providers to provide flexibility for users to create their own security environment. Enterprises also typically provide remote security connectivity for their employees to access critical services. Securing the Enterprise services not only involve exploit detection, tuning, hardening and patching, but also providing VPN service to employees.

    I have a feeling that, Like the way computing services are provided in the cloud, security services will also be provided by cloud computing providers. Cloud Security Service provisioning not only involves security application, but also connectivity between security service and application servers. Even to provide complete security, it may involve multiple security services provisioning such as VPN Service, IPS Service, Firewall Service, Web application firewall service or it could be one UTM service.

    If Service providers are going to provide flexibility for end users to provision their choice of security application, then SPs would provide choice of running Virtual security appliances.

    Yet times, Service provider may not like to provide flexibility of security application and they may provide security as specialized service from them. In those case, SPs may go for mega security appliances supporting multiple instances with instance provisioned for one customer.

    Let us see how this market turn out to be.

    But, in both cases, need of computation power for security services is very high. Multi core processors are going to fill this gap.

    Central Management Systems - Critical Missing features

    Central Management systems /Network Management Systems are used to configure and monitor multiple network elements from a central location. Some of the features that are supported by many CMS solutions are:
    • Ability to configure multiple network elements.
    • Ability to collect log information.
    • Ability to analyze log and generation of periodic reports.
    • Ability to monitor critical events.
    • Ability to issue diagnostic commands to network elements and getting the results.
    • Ability to allow multiple administrators to use CMS solutions - Role based Management.
    • Ability to view the Audits - Who changed what and when.
    CMS solutions architecturally contain
    • UI console: Allows users to configure network elements and also allows users to view the reports/analyze logs.
    • Policy Server with Policy repository: Stores the configuration for each network element.
    • Element Adapter: Convert configuration information to device understandable format. Sends the configuration information via protocol supported by devices.
    • Log Collector and Report Generator: Collects logs from network elements and also can generate reports.
    To provide scalability i.e to support large number of network elements, multiple element adapters and log collectors are used which each of them supporting fixed number of network elements. Typically one policy server is used, but many UI consoles are used at any time.

    There are some pieces which CMS vendors tend to ignore, but in my view they are very important. One of them is to do with 'Configuration session'. Traditionally, each administrative change in the configuration results to a command for the device. That is, if administrator changes the configuration X number of times, then X different commands are prepared for the device. It is observed, if the administrator changes a rule (let us say firewall rule) and un-modifies the the changes, then two commands are generated. These commands are eventually sent in order of the changes to the device. Many times, this is not a problem. But this could be a problem in some instances where:
    • First modification when applied changes the state of the device such as removing some run time states etc.. Second command which unmodifies the previous changes does not bring back the run time states that were destroyed.
    • Some times, the first modification might even stop the traffic in network elements.
    Administrators should be given chance to make configuration changes (additions, modifications or deletions), review them and commit them. Only when it is committed, it should send the consolidated changes to the device. Configuration modifications and commands must be de-linked. When the commit is issued, it should generate commands on the configuration differences. Some examples would help in understanding the feature better.
    • Administrator added a rule and changes his mind and deleted the rule before committing. In this case, no command should be generated for this rule.
    • Administrator deleted a rule and changed his mind and revoked the configuration change. In this case, no command should be generated for this rule.
    • Administrator changed few parameters of a rule and again changed to newer values. Only new values should go to the device.
    This feature requires following support from CMS solutions:
    • Configuration session should have start and End. End can be complete revoke or commit.
    • At any time, administrator can check for errors during configuration sessions. CMS solution is expected to provide 'Validate'.
    • Checking for duplicate configuration session. At any time, only one configuration session is allowed. Note that multiple user can view the committed configuration. If new configuration session is started, then CMS solution should return warning to the user that configuration session from 'user' at 'date&time' from 'ipaddress' is already started. It can give options such as 'take over the existing configuration session' or 'start from scratch by revoking previous session'.
    • At the time of 'commit', CMS solution can take new version string, which is prepended by date&time.
    • At the time of 'commit', CMS solution should generate commands to be sent to the device. It should do this by reading the information from configuration session, but not by the sequence of actions user did.
    • New configuration session should be allowed even if commands generated out of previous configuration session were not synchronized with the device.
    • Clearly show in the UI of the configuration which is part of current configuration session.

    In addition, CMS solution should support:
    • Listing down the configuration versions of a particular network element.
    • Facility to migrate the network elements to previous versions.
    • Facility to remove very old configuration versions to preserve the space in database.

    Botnets using fast flux and double flux techniques - IPS devices

    Access to the botnet servers, malware servers and servers serving other objectionable content from corporate networks is being thwarted by IP black lists by IPS/IDS and UTM devices. Cyber criminals started using single flux and double flux DNS techniques to make this kind of blacklisting ineffective. These techniques change the IP address of malicious servers very frequently. In some cases IP addresses are changed every 5 minutes. Please check this link to get more information about fast flux and double flux techniques.

    Criminals take advantage of compromised servers to act as redirection servers. Domain name of criminal servers get resolved to these compromised servers. When innocent users connect to this domain name (through social engineering attacks), the HTTP request lands on redirection servers. Redirection servers get the content from original malicious server (Honeynet white paper calls it Mother ship server) and serve to the innocent users. The list of compromised servers to be given in a DNS instance is determined by many factors such as whether the compromised redirector is online, bandwidth of the link that internet link of the compromised redirector etc.. Since the cyber criminals run the DNS server along with malicious content server, they have control over which IP addresses to send as part of DNS response. This technique is called fast flux as the IP addresses of the domain name registered by criminal changes very often.

    From the attack description provided in honeynet link, Cyber criminals rent botnet for redirection servers. Botnets owners would have compromised unhardened victim machines for their nefarious activities. It appears that some botnets have thousand of compromised systems. Many of home users PCs are typically infected with botnets.

    Since the IP addresses of domain name keeps changing, traditional blacklisting technique that uses IP addresses is ineffective. Also, it becomes difficult to identify the mothership servers. To thwart this kind of attack, security developers also started creating blacklists for DNS Servers. This thwarting technique also depends on IP addresses.

    Now attackers started using double flux technique, where DNS Server IP addresses also change very frequently. This requires change of Name Server IP addresses in DNS registrars or resellers. As some registrars making this facility available through programming interfaces (web based interface), this is being automated by cyber criminals (I need to verify this statement). Some service providers are lax and don't follow guidelines of checking credentials while registering the domain names or while changing the name server IP addresses.

    Since two kinds of IP addresses are changed - Name server IP address change, DNS resolution IP addresses, this technique is called double flux. This technique can't be solved by IP address blacklisting.

    Mitigation:

    IP address based blacklisting is some what effective when used with single flux. Double flux technique makes that ineffective. It appears now that domain name is fixed in both kinds of techniques. Mitigation is possible if domain names are checked.

    DNS domain name blacklists are required to thwart this attack. www.malwaredomains.com
    provides the list of domain names hosting malware content. IDS/IPS devices should have intelligent DNS application engine to extract domain names from DNS query and check against this list.


    Some characteristics of DNS replies when these techniques were used. They are:

    - TTL is around 5 minutes to 30 minutes.
    - Multiple IP addresses in Answer section.

    Since this kind of DNS replies are possible in some normal cases, this information can't be used to stop the DNS traffic by IPS devices. But it provides valuable information for offline analysis and to track the ultimate malware server with the help of service providers.

    IPS/IDS Buyer - Deployment recommendations

    CSO and security professionals goal is to ensure network security of their business networks and resources without any discontinuity in business operations. IPS/IDS devices solve one of network security problems - that is Intrusion detection and protection. Selection of IPS/IDS device is complex. I am trying to address considerations with respect to deployment while selecting IPS/IDS devices for your network.

    Network Deployment modes:

    Tap mode (IDS mode) : In this mode, IPS device is used mainly to detect intrusions/attacks. It does not block the attack traffic. IPS/IDS device is typically connected to hub or SPAN port in managed switches so that this device gets entire traffic in that network. Since it is not passing the traffic, network performance is not impacted.

    Though tap mode is least intrusive as it does not come in the way of normal traffic, it may not detect all attacks, if traffic is not received by it, either due to congestion at the SPAN port or due to the processing ability of device. Many recent IDS/IPS devices are stateful in nature. If packets are lost or not processed, newer packets or out-of-state packets either don't get processed or IDS device may generate false positives. Inline IDS deployment mode takes care of some of these problems.

    Inline IDS mode: In this mode, all packets pass through the device. In this mode, IDS/IPS device does not drop packets or terminate sessions upon attack detection. IDS/IPS devices are installed behind Enterprise core routers or perimeter routers. Since it is inline of traffic, it observes entire traffic before sending it out. Unless traffic is analyzed, it is not sent out.

    Since it analyzes entire traffic that is passing through the device, the detection rate of attacks is only limited by IPS/IDS device functional capability. If the traffic is faster than it processes, then the excessive traffic gets dropped.

    Though detection rate is going to be high, there is some impact on traffic -
    • Traffic may get dropped due to processing capability of IDS/IPS device.
    • Packet latency increases.
    • Packet jitter also may be impacted.
    Inline IPS mode: This is similar to Inline IDS mode, except that it can be configured selectively to stop the attack traffic. This mode inherits all advantages and disadvantages of Inline IDS mode.

    As long as only attack traffic gets dropped, then it is perfectly fine for security professionals. IPS/IDS technology, though came long way, still continue to have problems such as false positives and false negatives. Significant technology of IPS/IDS devices depend on signatures or rules. Signatures are of two types. Many signatures are developed by IPS vendors to stop known attacks. Another type of signatures detect protocol, data and traffic violations. With sophistication of attacks, yet times, signatures created to detect attacks may result into false positives. One of the difficult problems in IDS/IPS world is to detect client side attacks without any false positives. Look for IPS/IDS functional capabilities that detect attacks with less or zero false positives.

    Based on your requirements , you should find out the different deployment modes you require and look for devices supporting the required modes.

    If you decide on inline mode, you should look for following capabilities when deploying IDS/IPS device.
    • Granularity of blocking action: Look for this selection on protocol category basis and also on per rule (signature) basis.
    • Ensure that Inline IPS/IDS mode work transparently without any changes to the network addressing of existing network.
    • Traffic continuity when session resources get exhausted: Many IPS/IDS devices are stateful devices. They maintain sessions entries for connections. These session entries are removed only upon inactivity or due to TCP RSTs/FINs. When there is session exhaust DDOS attack targeting IPS/IDS device it could stop legitimate traffic, there by disrupting business operations.
      • Look for traffic throttling functionality so that IPS/IDS devices don't exhaust its resources.
      • Look for session timeout configuration functionality so that you can configure different session timeouts for different applications.
      • Look for session timeout configuration during session establishment (Pre-Connection timeout)
      • Also look for control on behavior of new traffic when sessions indeed get exhausted. Fail Open and fail close upon resource exhaust are two options you should check for. Fail open option lets the new traffic pass through without inspection. Fail close selection drops the packets.
    • Mode Change capability: Look for provision to change mode from Inline IDS to Inline IPS. Each network is different. They have different types of traffic, servers, desktops and mobiles. Security professionals first need to get confidence of effectiveness of IPS/IDS device in their network. Though their ultimate use of IPS device is to stop the attacks immediately, to get confidence and understand its usage, as a security professional, you may like to deploy it in inline IDS mode first and then change the mode in Inline IPS.
    • Control on CPU utilization: In Inline modes, traffic gets dropped due to non-availability of CPU. Look for controls that limit the CPU utilization. In particular, check for following capabilities.
      • Signature selection capability: More CPU power is used to when the rules are higher in number. Look for facilities to disable specific signatures by deselecting the family of signatures and individual signatures.
      • Control on quantum of data to inspect: Some detections are very expensive - such as malware detection. Since these are not protocol related vulnerabilities, this detection requires data inspection. Typically, these signatures have very complex patterns and hence it takes significant number of CPU cycles. It appears that many attacks can be detected within 16K bytes of connections. This observation can be used to limit the traffic inspection, thereby saving CPU cycles. Look for capability in IPS/IDS devices where administrator can control the amount of data to be inspected on per protocol basis and also across protocols. One word of advice is to start with inspection of all data, analyze and tune/configure this configuration item based on type of applications and traffic in your network.
    • Latency, Jitter and throughput: Figure out the type of applications for your business in your network and note down your requirements of throughput, latency and jitter tolerance and ensure that IPS/IDS deployment does not disturb these parameters beyond the limits you set.
    • Behavior of IDS/IPS device on unrecognized traffic: IDS/IPS devices may not have capability to inspect all kinds of traffic. Many IDS/IPS device limit themselves to inspect IP traffic. If your network has traffic such as multicast, IPv6, IPX, Apple Talk or proprietary protocol traffic, then check the behavior of IPS/IDS device. At the minimum, you should expect these devices to pass this traffic, even though they don't inspect the traffic.
    • Network monitoring: Many Enterprises use common SNMP based monitoring tools. IDS/IPS device becomes one of the network elements. If this is important for you, then ensure that IDS/IPS device you select support SNMPv3 agent supporting MIB-II.
    High Availability:
    Availability of IPS/IDS device functionality is very important in inline modes. Your IDS/IPS device can become a failure point in your network. Hence ensure that IDS/IPS device can support high availability functionality.

    LAN bypass functionality: This functionality short circuits all Ethernet ports, basically making it as Ethernet hub' when there is any failure in the software/hardware of IPS/IDS device. This functionality is typically implemented in hardware. In recent times, this also can be implemented using virtualization. Please see this link: http://network-virtualization.blogspot.com/2008/03/lan-bypass-using-xenvmware-kind-of.html. Look for this function if this is good enough for your network.

    Redundant devices: LAN bypass function ensures that connectivity is not lost, but it bypasses security inspection. That may not be a choice for some business environments. In those cases, look for IDS/IPS devices supporting redundancy. Two or more devices (typically two are good enough) can be installed in parallel. When one device goes down, another device starts processing packets. Optionally, some IDS/IPS devices even support takeover of existing sessions. With that capability, existing connections will continue to work even after other devices takes over. In addition, Some of the items, administrator should look for are:
    • Amount of time it takes a new device to become active. It should be in terms of seconds in single digit.
    • Amount of it takes for Central Management System to understand the switch.
    • Ensure that Central Management System populates all backup devices with signatures even before it becomes active.

    Disaster Recovery:

    High availability will not help upon major disaster. It may require procuring new devices. Security professionals would have spent significant effort over time to tune the IPS/IDS devices. If this work is lost, then it takes significant time to re-tune the devices. That is where disaster recovery functionality provided by IPS/IDS devices is very important.

    Security professionals should look for facility in IDS/IPS devices to store the configuration and restore it whenever it is required.

    Central Administration:

    Large Enterprises require more than one IDS/IPS sensor devices. They are placed at different places in the network. Central Management reduces the configuration burden. It also provides corelation of events and logs. If your network requires many sensors, look for Central Management system. Typical features one should look for:
    • Multiple administrator accounts.
    • Role based management.
    • Multiple UI consoles.
    • Corelation of logs and events.
    • Traffic Reports
    • Attack Reports.
    • Alerts.
    • Audit Reports.
    There are many choices of IDS/IPS devices in the market. Selection of device depends not only on its functionality, accuracy of detection but also how easy for you to deploy and monitor. I try to address some common items to be considered in your buying decision.

    ALGs - Firewall/NAT Travrersal Control and PortMap and TR-069 support

    Firewall/NAT Traversal Control:
    Application Layer Gateway modules (ALGs) in firewall and NAT devices interpret the protocol data, transform IP addresses based on NAT configuration and open pinholes in firewall to allow new connections. For example, FTP ALG function is expected to interpret 'PORT', 'EPRT' and 'PASV reply' messages, modify IP addresses if required and open the pin holes to allow FTP data connections. Many protocols require this kind of ALG functions for firewall and NAT traversal. Some of the protocols requiring ALGs : SIP, H.323, MGCP, some gaming applications, NetBIOS, SUNRPC, MSRPC, L2TP, PPTP, IPsec VPN etc..

    Newer versions of protocols are designed such a way that they traverse through firewall/NAT devices even if they don't support ALGs. For example, SIP has some extensions where there is no need for ALG function in firewalls between SIP UA and SIP proxy. IPsec VPN working group added NAT-T extensions to IKE and IPsec and it does not require any ALG function in firewall and NAT devices between IPsec peers. But, they introduced newer problems. Some of these NAT-T extensions in newer versions of protocols don't work well with firewall/NAT devices which support ALG function already. Hence, it is required that firewall/NAT devices provide ability for administrators to control the ALG function for different protocols. One simple control that is expected at the minimum is boolean control ie. Enable/Disable. Ideal control of configuration would take end point IP addresses into consideration. Imagine cases where some end points support new NAT-T extensions and some not. But, for this discussion, I am taking simpler configuration i.e ALG function enable/disable for each protocol.


    ALG port map:

    Yet times, companies install server application on non-standard ports. Though 5060 is standardized for SIP, yet times, SIP servers are run on non-standard ports. In these cases, the ALG functions in the firewall that is protecting these SIP servers should know about these ports for its operation. Port map record functionality of firewall/NAT devices facilitates the administrators to feed this information. For example, if SIP server is run on port 5061, administrator can create a port map record with Port 5061 and map it to SIP ALG function.

    Both of above functionalities and their configuration require definition of ALG names. I propose following names for ALGs.

    "ftp", "tftp", "oracleDbNet", "sunRpc", "msRpc", "udpDns", "tcpDns", "netbios", "udpSip", "tcpSip", "h323", h323GateKeeper", "rtsp", "udpNet2Phone", "tcpNet2Phone", "mgcpCallAgent", "mgcpGW", "msnIM", " microsoftILS", "aolIM", "irc", "pptp", "l2tp", "ikev1", "mszone", "quake", "udpMicrosftGames", "tcpMicrosoftGames'.

    TR-069 representation of above configuration:

    • internetGatewayDevice.security.VirtualInstance.{i}.ALGTraversalControl.{i} P : New entries can't be added by ACS. ACS can only change the 'featureControl' parameter.
      • name : String(32), Read Only - Name of the ALG. It takes one of values mentioned above.
      • featureControl: Boolean, RW - Take 1 (Enable) or 0 (Disable). Default value is 1.
    • internetGatewayDevice.security.VirtualInstance.{i}.ALGPortMap.{i} PC
      • name: String(32), RW - Name of the port map record. Once the record is created, it can't be changed.
      • description: String(128), RW - Description of the record. Optional parameter.
      • algName: String(32) RW, Mandatory parameter- Name of the ALG function. It must be one of the values mentioned above.
      • mappingProtocol: String(4), RW, Mandatory parameter - Protocol value. Either "tcp" or "udp".
      • mappingPort: String(8), RW, Mandatory parameter - Port number.

    Monday, April 7, 2008

    Penetration testing directory

    I saw this email in pen-test mailing list. www.penetrationtesting.com site is trying to be a directory for all pen testers. It seems to be promising. One consolidated place to see all topics, tools related to penetration testing.

    From: listbounce@securityfocus.com [mailto:listbounce@securityfocus.com]
    On Behalf Of Victor DaViking
    Sent: Saturday, April 05, 2008 2:43 PM
    To: pen-test@securityfocus.com
    Subject: Update on the penetration testing directory project

    Hi list,

    Quick update. I wanted to thank everyone who's been
    helping out with the pentest directory project
    (www.penetrationtests.com) during these 5-6 months in
    terms of submitting new links, providing feedback,
    suggesting categories, etc.

    As of today, we already have 230 different
    pentest-related links manually organized in the
    following categories:

    - http://www.penetrationtests.com/Blogs/
    Blogs related to security (Company blogs, Personal
    blogs, Group blogs)


    - http://www.penetrationtests.com/Business/
    Business links (Scoping, SoWs, Invoicing,
    Deliverables)

    - http://www.penetrationtests.com/Companies/
    Security Companies (1-99 employees, 100-500, 501-more,
    unknown size)

    - http://www.penetrationtests.com/Documents/
    Security Documents/papers (Database servers, webapp
    testing, secure programming)

    - http://www.penetrationtests.com/Frameworks/
    Penetration testing Frameworks (for a price, free)

    - http://www.penetrationtests.com/Mailing-Lists/
    Mailing lists (Penetration testing)

    - http://www.penetrationtests.com/Methodology/
    Methodology (Links related to methodology definitions)

    -
    http://www.penetrationtests.com/Security-conferences/
    Security conferences

    - http://www.penetrationtests.com/Security-standards/
    Security standards (PCI, SOx Act)

    - http://www.penetrationtests.com/Tools-Software/
    Tools & Software (hundreds organized in sub
    categories)

    - http://www.penetrationtests.com/Websites/
    Websites (Project sites, Proxys)

    Sunday, April 6, 2008

    IPsec VPN load balancing - technical bit

    Many Enterprises are having multiple WAN links for sharing the WAN load and also to provide redundancy. Enterprises are increasingly going for WAN links from different providers for continuous connectivity even when there are failures at service provider end. There are many solutions in the market which take advantage of multiple links - by bonding these WAN links. Since these WAN links belong to different service providers, load sharing is done typically at the connection level. That is, all packets of a 5 tuple connections go through the same WAN link, but packet belonging to different connections go via different links.

    IPsec VPN tunnel between two offices is considered as one connection by these solutions. Due to this, even if both offices have multiple links, the bandwidth for IPsec traffic is limited to maximum bandwidth provided by selected link. Actually, it would be the minimum of office 1 link and office 2 link bandwidth. In many situations, the traffic among offices (branch offices to head office) is mainly IPsec traffic.

    IPsec VPN load balancing functionality is expected to solve above problem and fully utilizes the maximum bandwidth provided by multiple links.

    Let us examine different deployment scenarios:
    • Sceneario 1: Branch office having one WAN link and Head office having multiple WAN links. And entire VPN traffic in each side serviced by one VPN router (or UTM box)
    • Scenario 2: Both offices having multiple WAN links and all the VPN traffic in each side serviced by one VPN router (UTM box)
    • Scenario 3: Branch office having multiple low bandwidth links and head office having one high bandwidth link.
    • Scenario 3: Both offices having multiple WAN links and as many UTM VPN routers as number of links.
    IPsec VPN functionality is typically used in tunnel mode. In tunnel mode, IP packets traversing between the sites are encapsulated. As part of encapsulation, newer IP header is added and it is called outer IP header. Outer IP header is mainly used for routing of packets across Internet to reach the right VPN router. The IP addresses of outer IP header, hence must be public IP addresses. These IP addresses are called 'gateway' IP addresses. Local gateway IP address is used as 'source IP' and remtoe gateway IP address is used as 'Destination IP' in the outer IP header. These two gateway IP addresses are called 'gateway pair'. WAN link load balancing function uses these IP addresses (mainly source IP address, as both WAN links have default routes) to route the packets on appropriate WAN link. Hence, it is required that VPN router create multiple tunnels with different local gateway IP addresses and balance the outbound traffic across these tunnels to use the bandwidth of local WAN links efficiently. It is expected that the remote router also does same thing on its outbound traffic.

    In all scenarios, the IPsec VPN functionality of UTM box should take multiple gateway pairs configuration in appropriate SPD policy records. In theory, there could be M * N pairs with M being number of links in site 1 and N being number of links in site2 2. But, in reality, number of gateway pairs one would configure is some where between minimum of M and N to M*N. For example, In scenario3, to take advantage of multiple WAN links in branch office, both head office router and branch office router can be configured with two gateway pairs. On head office side, local gateway IP address is same across these two gateway pairs and in branch office side remote gateway IP address is same across gateway pairs.

    How to load balance traffic among multiple tunnels between two sites:

    Many VPN routers today have stateful security functions such as firewall, IPS etc.. UTM boxes when used in place of VPN routers have many stateful security functions. If entire VPN traffic is handled by one device, packet based load balancing works fine. But, if the WAN links are handled by two different VPN routers having stateful functionality, then packet based load balancing does not work. Even connection based load balancing can have problems for applications having multiple connections. Hence, I suggest not to use load balancing, but use manual load balancing by creating multiple SPD policy records between two sites, with SPD policy record having different gateway pairs and with different selectors.

    In summary: Packet based load balancing across different tunnels of one SPD policy record is applicable if VPN routers don't have stateful security function or if the VPN router is terminating all tunnels.

    Access Control list in firewalls : TR-069 support

    Access control lists are heart of firewall. ACLs control the traffic among different zones of organizations. Typical firewall implements multiple ACLs with each ACL implementing multiple access control rules. Each rule is defined with L3 & L4 protocol information and in some cases with L7 protocol fields - L3 fields are typically source IP, destination IP and Protocol and L4 fields are typically TCP ports, UDP ports and ICMP type/code values. Some of examples of L7 protocols are HTTP, SMTP, NNTP etc..

    Access rule typically contains 'Selectors' and 'Actions'. First packet of every session is matched against the rules. 'Selector' fields are checked as part of matching operation. 'Actions' of matching rule is applied. If there is no match, then the packet is dropped or rejected based on whether stealth mode is enabled or disabled. Note: If stealth mode is enabled, then the packet is dropped. It generates TCP resets to both end points in case stealth mode is disabled. Since the matching operation is terminated upon first match, the access control list is organized as ordered list with first entry being higher priority rule and last entry being lowest priority rule.

    'Selectors' fields are typically categorized into 'primary selectors' and 'secondary selectors'. Both kinds of selectors are checked against the packet values and other values in matching operation. If primary selectors are changed during the course of sessions, then the existing sessions are revalidated. In case of secondary selectors, this revalidation does not happen. Zone information (From zone and To Zone) and 5 tuple values in the rules are primary selectors. 'Time window' or 'time schedule' typically considered as a secondary selector. 'Time window' in rules is used to allow/deny connections during some period of week. For example, some connections may not be allowed during day time, but allowed in night time.

    'Actions' in the rule decides the connection traversal through the firewall. 'Allow' action lets the connection through the firewall. 'Drop' action drops the packets and 'Reject' action sends the TCP reset (in case of TCP connection) to the client. Allow, Deny and Reject actions are mutually exclusive and these are primary actions. More sub actions are also can be defined.
    • 'Log' is one sub-action. This indicates whether the connection is to be logged with logging system. If this option is selected, firewalls typically send 'Connection Start' and 'Connection End' messages to the logging system in case of allowed connections. In case of 'Drop/Reject' actions, log is sent to indicate that the connection was not allowed.
    • Packet mangling: TOS (Type of Service) or DSCP parameter - This takes new TOS value. Packets are updated with this TOS value if it is configured. This is typically used to increase or reduce the priority of the packet for traffic management purposes. This feature is specifically provided where VOIP/Video devices behind the firewall don't differentiate between data and real time traffic. Another packet mangling parameter that is supported is setting of MSS value in TCP packets having 'SYN' flag. If this parameter is set, then all TCP connections falling on this rule would be changed to this MSS value (if this MSS value is less than the MSS value that is being negotiated in SYN packets). This setting is specifically useful when the WAN uplink bandwidth is is less than 256kbytes/sec. When this value is low, the TCP packets sent by both end points are small in size and hence the packet transmission time is less. Once the packet is submitted to the hardware to transmit, it can't be preempted and that is any new packet has to wait until this packet is sent out. Due to this, VOIP packets also need to wait and this might give rise to lateny and thereby jitter. To reduce the latency and jitter, it is necessary that the data packets which are queued to hardware are small enough. MSS value helps in ensuring that data packets are small.
    • Rate control: Another sub-action that is supported in rules are controlling the rate in terms of packets, bytes, connections. In addition, even the connection limits also can be controlled by the administrator. If this is configured in the rule, if the traffic falling on this rule exceeds these parameter values, then packets or connections are dropped. If packet or byte rate exceeds across all sessions of this rule is exceeded, then the packets are dropped. If connection rate is exceeded, then the connection establishment does not succeed. Similarly, if number of existing connection due to this rule exceed the maximum connections allowed by this rule, then the connection establishment does not succeed.
    • Application protocol command filtering: I am not really fan of keeping the application command filtering as part of each rule. With IPS becoming part of many security devices, this kind of filtering can be achieved through IPS rules/signatures.
    • IPS Signature based Intrusion Detection function Control: Many firewall devices are adding Intrusion Prevention feature. As we all know, intrusion detection is CPU consuming function. To reduce the load on the CPU, I feel that firewall function needs to provide flexibility for administrators to disable Intrusion detection on per rule basis.
    • IPS Traffic Anomaly detection & throttling function control: IPS function provide multiple detection methods. Traffic anomaly is one detection method supported. This function also takes significant CPU cycles and also takes memory to maintain traffic states - some times on per connection basis. Due to this, having control at firewall policy rule helps in tuning the system for performance as well as for the deployments.
    • Inactivity timeout: Each session created from this rule inherits this inactivity timeout. If there are no packets within this inactivity timeout period, then the session is deleted. This value is in seconds. If this is not configured i.e if the value is 0, then it session inactivity timeout period determined from other configuration database information.

    Multiple Access Control Lists:

    Firewalls have come long way. Initially, firewall used to implement one ACL. Now, firewalls provide multiple lists to cater to different requirements.
    • Normal ACL: This is traditional list. It contains rules for traffic going across zones.
    • Dynamic ACL: This list is populated by other services and applications. That is, this list is not generated by the administrator. Rules are typically created in this list when configuration of some other service happens. uPNP and MIDCOM kind of applications create dynamic rules. These dynamic rules go to Dynamic ACL.
    • User Group specific ACL: Normal ACL and Dynamic ACL rules are applied to entire traffic by default. Yet times, Enterprises require to provide user specific access control. That is, some privileged users might need to be given access to some important resources, which are prohibited for general users. Similarly, some users might need to be give access only to some particular resources and nothing else. To provide this flexibility, firewalls typically authenticate the user first and activate user specific rules. Providing and creating user specific ACL is big burden for administrators, if the organization has more than 10 users. Many a times, it is possible to categorize users into small number of groups. Administrator only needs to create as many ACLs as number of groups. I am calling these ACLs as 'User Group ACLs'.
    Ordering of rule search: Firewall searches dynamic ACL, user group specific ACL and finally generic ACL. If no match, then packet gets dropped.

    TR-069 and ordered lists: TR-069 does not have any specific RPC methods for ACS to move the position of rules in devices. Due to this, it is required that data models for any ordered lists have their own parameters to represent the priority. My suggestion is to have 'position' parameter for ordered lists. Lower the position number, higher the priority. ACS in its user interface need not provide 'position' as a configurable parameter. It can rather have intuitive drag and place UI for changing the relative position of the records with respect to others. Internally, ACS can change the 'postition' values of affected records and send them to the devices. Based on movement of records, 'position' value of many records may change though.


    With above background, time window and firewall ACL representation in TR-069 can be represented in following way.

    • internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.timeWindowObject.{i} PC
      • name : String(32), RW - Name of the object. Once this variable is set, this can't be changed.
      • Description: String(128) - RW - Value describing the object.
      • Day1Begin: String, RW - Staring day of the week - Takes values 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday' and 'Sunday'.
      • Day2Begin: String, RW
      • Time1Begin: String, RW: Starting time in hour and minutes. 1:00PM is represented as 13:00 and 2:00PM is represented as 14:00PM and son on.
      • Time2Begin: String, RW
      • Day2Begin: Same as Day1Begin.
      • Day2End: Same as Day1End.
      • Time2Begin: Same as Time1Begin.
      • Time2End: Same as Time1End.
      • Day3Begin:Same as Day1Begin.
      • Day3End:Same as Day1Begin.
      • Time3Begin: Same as Time1Begin.
      • Time3End: Same as Time1End.
      • Day4Begin: Same as Day1Begin.
      • Day4End:Same as Day1Begin.
      • Time4Begin: Same as Time1Begin.
      • Time4End: Same as Time1End.
    • internetGatewayDevice.security.VirtualInstance.{i}.firewall P
      • MaximumNumberOfRules: Read Only. Unsigned Int - This determines the rule ID in each ACL. Rule ID can't exceed this number.
      • internetGatewayDevice.security.VirtualInstance.{i}.firewall.generalACLRules.{i} PC
        • RuleID: Unsigned Int, RW - Identification to the rule. Its value can't exceed 'NumberOfRulesPerACL'. Once reord is created and ruleID is set, this can't be changed. This should be unique within the ACL.
        • Description: String(128), RW - Description about this rule.
        • Position: Unsigned Int, RW - This indicates the position of this rule in this list. Note that position values need not be consecutive. Lower the position number, higher the priority of the rule.
        • Enable : Boolean, RW: 0 or 1 - Indicates whether this rule is enabled or disabled.
        • FromZone: String(32), RW - One of the Zone IDs. It takes value of ZoneName from internetGatewayDevice.securityDomains.VirtualInstance.{i}.Zone.{i} table.
        • ToZone: String(32), RW - One of the Zone IDs. It takes value of ZoneName from internetGatewayDevice.securityDomains.VirtualInstance.{i}.Zone.{i} table.
        • SourceIPType: String(32), RW - It represents the source IP of the selector. It takes values such as 'immediate', 'ipobject'. Immediate indicaets that IP addresses are given as values and 'ipobject' indicates the IP address information points to one of the IPObjects.
        • SourceIPValue: String(64), RW - If the type is immediate, then it can be single IP address in dotted decimal form, subnet by providing network IP address and prefix in terms of number or range of IP addresses with '-' in between low and high values. If the type is 'ipobject', then it has one of ipobject names from internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPValueObject.{i} table or internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPFQDNObject.{i} table. 'any' is special value indicating all source IP values. Examples: 10.1.5.10 or 10.1.5.0/24 or 10.1.5.1-10.1.5.254
        • DestinationIPType: Same as 'Source IP Type'. This represents destination IP selector information.
        • DestinationIPValue: Same as 'SourceIPValue'.
        • ServiceType : String(64), RW - Represents the Protocol, source port and destination Port part of selectors of the rule. It takes values 'immediate' or 'serviceobject'. In case of 'immediate' type, protocol, source port and destination port values are part of the rule.
        • ServiceObject: String(32), RW - One of the values of Service Records from the same virtual instance. This parameter is valid only if 'ServiceType' has value 'serviceobject'. 'any' is special value.
        • Protocol : String(16), RW - It takes values such as 'udp', 'tcp', 'udptcp', 'icmp', 'esp', 'ah', 'ospf', 'ipinip' and integer value in string format representing the protocol value. This parameter is valid only if 'ServiceType' is 'immediate'.
        • SourcePort: String(16), RW - It takes a single value or range of port values. Examples: 1214 or 1214-1230. This parameter is valid only if 'ServiceType' is 'immediate'.
        • DestinationPort: String(16), RW - It takes a single value or range of port values. Examples: 1214 or 1214-1230. This parameter is valid only if 'ServiceType' is 'immediate'.
        • TimeWindow: String(32), RW - It takes 'name' value from timewindow object table. 'none' indicates no timewindow.
        • Action : String(16), RW - Action to be taken on the connection matching this rule. It takes values 'allow', 'drop', 'reject'.
        • EnableLog: Boolean, RW - If the value 1, it generates logs upon session creation and session termination. Takes value 1 or 0.
        • EnableTOSMangling: Boolean, RW - If value is 1, then firewall sets the TOS value in the IP header with the value of 'TOS' parameter.
        • TOS: unsigned int, RW - Value can't exceed 255. Applicable only if 'EnableTOSMangling' is set to 1.
        • EnabelMSSMangling: Boolean, RW - Take values 1 or 0. If set to 1, TCP Option MSS is set with the minimum of value given in 'MSS' parameter and the value in TCP packet.
        • MSS: Unsigned int, RW.
        • EnableBandwidthRateControl: Boolean, RW - Takes value 1 or 0. If set to 1, 'ByteRate' parameters is valid.
        • ByteRate : String(32), RW - It takes form of X/Y - X being number of Kbytes and Y being number of seconds. Example: 10/5 means limit the traffic falling to this policy to 10Kbytes for 5 seconds. This parameter is valid only if 'EnableBandwidthRateControl' is set to 1.
        • EnableConnectionRateControl: Boolean, RW - Takes values 1 or 0.
        • ConnectionRate: String(32), RW - It also takes form of X/Y - X being number of connections and Y being number of seconds. Example: 1000/3600 limits number of connection establishments to 1000 per hour. This parameter is valid only if 'EnableConnectionRateControl' is set to 1.
        • EnableMaxConnectionsControl: Boolean, RW - takes values 1 or 0.
        • MaxConnections: Unsigned Int, RW - Maximum number of connections allowed at any time. Example: 1000 indicates that number of connections falling in this policy rule will not exceed 1000.
        • EnableSigBasedIntrusionDetection: Boolean, RW - Takes values 1 or 0. Value 1 enables intrusion analysis.
        • EnableTrafficAnomalyDetection: Boolean, RW - Takes values 1 or 0. Value 1 enables traffic anomaly detection.
        • inactivityTimeout: Unsigned Int, RW - Default is 0. Value represented in seconds.
      • internetGatewayDevice.security.VirtualInstance.{i}.firewall.dynamicACLRules.{i} P : It is repetition of above table, except that all values are read only.
        • RuleID: Unsigned Int, Read Only.
        • Description: String(128), Read Only.
        • Position: Unsigned Int, Read Only.
        • Enable : Boolean, Read Only
        • FromZone: String(32), Read Only.
        • ToZone: String(32), Read Only.
        • SourceIPType: String(32), Read Only
        • SourceIPValue: String(64), Read Only.
        • DestinationIPType: String(32), Read Only.
        • DestinationIPValue: String(64), Read Only.
        • ServiceType : String(64), Read Only
        • ServiceObject: String(32), Read Only.
        • Protocol : String(16), Read Only
        • SourcePort: String(16), Read Only
        • DestinationPort: String(16), Read Only
        • TimeWindow: String(32), Read Only
        • Action : String(16), Read Only
        • EnableLog: Boolean, Read Only
        • EnableTOSMangling: Boolean, Read Only
        • TOS: unsigned int, Read Only
        • EnabelMSSMangling: Boolean, Read Only
        • MSS: Unsigned int, Read Only
        • EnableBandwidthRateControl: Boolean, Read Only
        • ByteRate : String(32), Read Only.
        • EnableConnectionRateControl: Boolean, Read Only.
        • EnableMaxConnectionsControl: Boolean, Read Only.
        • MaxConnections: Unsigned Int, Read Only.
        • EnableSigBasedIntrusionDetection: Boolean, Read Only.
        • EnableTrafficAnomalyDetection: Boolean, Read Only.
        • inactivityTimeout: Unsigned Int, Read Only.
    • internetGatewayDevice.security.VirtualInstance.{i}.UserGroups.{i} PC
      • Name: String(32), RW - Name of the user group. Once this is set, this can't be changed.
      • Enable : Boolean, RW - Takes value 1 or 0.
      • internetGatewayDevice.security.VirtualInstance.{i}.firewall.ACLRules.{i} PC - Following section is repetition of genericACLRules.
        • RuleID: Unsigned Int, RW - Identification to the rule. Its value can't exceed 'NumberOfRulesPerACL'. Once reord is created and ruleID is set, this can't be changed. This should be unique within the ACL.
        • Description: String(128), RW - Description about this rule.
        • Position: Unsigned Int, RW - This indicates the position of this rule in this list. Note that position values need not be consecutive.
        • Enable : Boolean, RW: 0 or 1 - Indicates whether this rule is enabled or disabled.
        • FromZone: String(32), RW - One of the Zone IDs. It takes value of ZoneName from internetGatewayDevice.securityDomains.VirtualInstance.{i}.Zone.{i} table.
        • ToZone: String(32), RW - One of the Zone IDs. It takes value of ZoneName from internetGatewayDevice.securityDomains.VirtualInstance.{i}.Zone.{i} table.
        • SourceIPType: String(32), RW - It represents the source IP of the selector. It takes values such as 'immediate', 'ipobject'. Immediate indicaets that IP addresses are given as values and 'ipobject' indicates the IP address information points to one of the IPObjects.
        • SourceIPValue: String(64), RW - If the type is immediate, then it can be single IP address in dotted decimal form, subnet by providing network IP address and prefix in terms of number or range of IP addresses with '-' in between low and high values. If the type is 'ipobject', then it has one of ipobject names from internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPValueObject.{i} table or internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPFQDNObject.{i} table. 'any' is special value indicating all source IP values. Examples: 10.1.5.10 or 10.1.5.0/24 or 10.1.5.1-10.1.5.254
        • DestinationIPType: Same as 'Source IP Type'. This represents destination IP selector information.
        • DestinationIPValue: Same as 'SourceIPValue'.
        • ServiceType : String(64), RW - Represents the Protocol, source port and destination Port part of selectors of the rule. It takes values 'immediate' or 'serviceobject'. In case of 'immediate' type, protocol, source port and destination port values are part of the rule.
        • ServiceObject: String(32), RW - One of the values of Service Records from the same virtual instance. This parameter is valid only if 'ServiceType' has value 'serviceobject'. 'any' is special value.
        • Protocol : String(16), RW - It takes values such as 'udp', 'tcp', 'udptcp', 'icmp', 'esp', 'ah', 'ospf', 'ipinip' and integer value in string format representing the protocol value. This parameter is valid only if 'ServiceType' is 'immediate'.
        • SourcePort: String(16), RW - It takes a single value or range of port values. Examples: 1214 or 1214-1230. This parameter is valid only if 'ServiceType' is 'immediate'.
        • DestinationPort: String(16), RW - It takes a single value or range of port values. Examples: 1214 or 1214-1230. This parameter is valid only if 'ServiceType' is 'immediate'.
        • TimeWindow: String(32), RW - It takes 'name' value from timewindow object table. 'none' indicates no timewindow.
        • Action : String(16), RW - Action to be taken on the connection matching this rule. It takes values 'allow', 'drop', 'reject'.
        • EnableLog: Boolean, RW - If the value 1, it generates logs upon session creation and session termination. Takes value 1 or 0.
        • EnableTOSMangling: Boolean, RW - If value is 1, then firewall sets the TOS value in the IP header with the value of 'TOS' parameter.
        • TOS: unsigned int, RW - Value can't exceed 255. Applicable only if 'EnableTOSMangling' is set to 1.
        • EnabelMSSMangling: Boolean, RW - Take values 1 or 0. If set to 1, TCP Option MSS is set with the minimum of value given in 'MSS' parameter and the value in TCP packet.
        • MSS: Unsigned int, RW.
        • EnableBandwidthRateControl: Boolean, RW - Takes value 1 or 0. If set to 1, 'ByteRate' parameters is valid.
        • ByteRate : String(32), RW - It takes form of X/Y - X being number of Kbytes and Y being number of seconds. Example: 10/5 means limit the traffic falling to this policy to 10Kbytes for 5 seconds. This parameter is valid only if 'EnableBandwidthRateControl' is set to 1.
        • EnableConnectionRateControl: Boolean, RW - Takes values 1 or 0.
        • ConnectionRate: String(32), RW - It also takes form of X/Y - X being number of connections and Y being number of seconds. Example: 1000/3600 limits number of connection establishments to 1000 per hour. This parameter is valid only if 'EnableConnectionRateControl' is set to 1.
        • EnableMaxConnectionsControl: Boolean, RW - takes values 1 or 0.
        • MaxConnections: Unsigned Int, RW - Maximum number of connections allowed at any time. Example: 1000 indicates that number of connections falling in this policy rule will not exceed 1000.
        • EnableSigBasedIntrusionDetection: Boolean, RW - Takes values 1 or 0. Value 1 enables intrusion analysis.
        • EnableTrafficAnomalyDetection: Boolean, RW - Takes values 1 or 0. Value 1 enables traffic anomaly detection.
        • inactivityTimeout: Unsigned Int, RW - Default is 0. Value represented in seconds.

    UTM Message logging capabilities

    UTM devices are multi function security devices. These functions include firewall, IPsec VPN, SSLVPN, IPS, Anti Virus and Anti Spam. Some devices even include web application firewall function. Each security function is different from others. So, one would expect the information in logs is different across different security functions and even sub functions within a security function. Logs are generated for many reasons. Logs are generated not only to indicate the policy violations, intrusion detections, virus or spam detections, but also generated to indicate session information, configuration changes, login failures, system errors, system warnings etc.. Type of information in each of these types of log messages is different from each other. To facilitate analysis of logs by external log analyzers, each family of log messages should have its own format. That is, each family should have its own keywords to represent the values of different parameters. There could be many message families.

    Each message family contains multiple different types of logs. For example, there could be multiple types of logs in intrusion message family. Typically, each signature in IPS is one type of log. In signature based IPS, there are as many logs as number of signatures. Each type of log message is typically represented by message IDs.

    UTM devices generate logs during its operation. There could be large number of incidents which generate similar type of log multiple times. Messaging system component of UTM devices provide multiple controls for administrator to control logs. Some of the controls are:
    • Message ID level control on enable/disable: If a message ID is disabled, any log incidents of that message ID are not processed, that is, they are not stored or exported to external log collectors.
    • Message ID based Log frequency control: It allows logs, but controls the number of logs generated for processing. Typically it takes two parameters - Log threshold count and log threshold time. At most one log is processed for every 'log threshold count' logs or within the 'log threshold time'. That is, if log threshold count is 100 and log threshold time is 5 minutes, then if number of logs generated are more than 200, but less than 300 within 5 minutes, then it emits three logs - 1st log, 101st log and 201st log.
    • 5 Tuple based Log frequency control: Message ID based log frequency control is good for non-network based logs such as system errors, warnings etc.. But, for connections, this kind of control is not good enough. If there is same intrusion detected in traffic going to multiple victims, then administrator would like to know at least one instance of intrusion going to each victim. With message ID based log frequency control, all victims will not be reported if the intrusions happen within 'log time threshold'. To avoid this, 5 tuple based log frequency control is required. 'Source IP', 'Destination IP' , 'Protocol', 'Source Port' and 'Destination Port' constitute 5 tuple. Each item in tuple can be enabled/disabled. 'Log threshold count' and 'log threshold time' described above is individually valid for each combination of 5-tuple items enabled. Let us say that, if administrator enabled 'source IP' and 'destination IP' addresses in the 5 tuple for a given message ID, then for logs generated for this combination of network connections are processed as per 'log threshold count' and 'log threshold time'. Using this, log systems don't miss attacks and other events happening on individual victim machines and also don't miss reporting 'attackers'.
    In multi function security devices like UTM, there could be many message IDs. It is no surprise, even the number of message IDs are in the tune of 1000. Controlling individual message ID would be night mare for administrators. UTM devices need to provide control of logs at higher level than the control on message ID basis. That is where, sub-family of message IDs come in handy. Sub-family is nothing but group of message IDs. This grouping is mainly for log control. This grouping does not define the message format. The controls 'Enable/Disable', '5 tuple based log frequency control' can be specified on per sub-family basis. All message IDs inherit these controls. Of course, administrators are provided control on message ID basis too, when he/she thinks that some message IDs can't inherit the controls from their sub-families.

    Though many aspects of this article is based on Intoto iGateway product family concepts, these are generic concepts and valid for any security products. The typical flow of logs is:
    • Logs are generated by applications such as firewall, IPS, AV/AS, WAF etc..
    • Log throttling system discards logs and processes only some logs based on controls configured by administrator.
    • Logs are then stored, exported.
    • Log analyzers (local or external) analyzes logs and even create reports. Also, they provide 'search' functions based on different criteria.
    Logs are exported by either using 'syslog' or 'email'. iGateway UTM also has facility to store logs locally in a database (postGres).

    Exported logs are sent in a format for log analyzers to easily extract values for different fields. WELF is one format that became quite popular. Though iGateway product family uses WELF syntax, but it does not use keywords as specified by WELF as they are incomplete. Some of the rules it follows are:
    • Each log forms one line with each line containing multiple fields.
    • Each field is formed as keyword=value. Keywords are defined by corresponding message family. Each keyword and value pair is separated by one or more spaces.
    • keyword and values should not have any spaces. They must not have any quotes and '=' sign characters. If the value needs to have spaces, then the value string must be enclosed in double quotes.
    • Mandatory keywords across all message families
      • time: Date and time in double quotes.
      • priority: Priority of the message. One of values from 1 to 7.
      • id : Identity of the device sending the logs. It is configured by administrator on each device.
      • mtype: Message family.
      • mid: Message ID
    • Generic keywords: There are some generic keywords which are valid across multiple message families. Though these are not mandatory, these generic keywords must be used wherever they are needed.
      • vsg : Virtual instance.
      • fromzone: Zone in which the connection is originated.
      • tozone: Zone in which the connection is terminated.
      • userid : User name, if this information is known.
      • usergroup: User group.
      • sip : Source IP address in dotted decimal form.
      • dip: Destination IP address in dotted decimal form.
      • protocol: Protocol of the connection as integer.
      • sport: Source port
      • dport: Destination port.
    • Other keywords are specific to each message family.

    Saturday, April 5, 2008

    SMT threading in Multicore processors

    There is one good article I found on concepts of Simultaneous Multi-threading. See here.

    As a software Engineer, it is easy to get confused between threading implemented in hardware and software. Though concept are similar, there is no one-to-one correspondence.  Even in single core chips that don't have hardware threads can support software threads. Software threads are really operating system concept.  Where I reference 'thread' in rest of this article, it means hardware thread.

    Hardware threads takes very less Die space compared to adding new core to the die. You are mistaken if you that each hardware thread is equal to one core. Note that operating systems such as Linux expose virtual CPU for each hardware thread, but its performance can't be equated with physical CPU.  Operating systems do this for convenience.

    Hardware threads are part of each core.  If a Multicore processor having 8 cores support 2 threads per core, then it means that there are 16 threads in total.

    Basic idea behind the SMT:  Cores typically have multiple execution units for different operations such as arithmetic units,  branch units, floating point units and load/store units etc..  Each unit runs in multiple stages in a pipeline fashion.  Any work that is given to the unit takes multiple processing cycles.  Each stage is normally executed in one processing cycle. Each stage is independently executed at the same time.  Processor utilization is highest if all stages are filled up all the time.  Typical work loads don't fill up all stages of any given unit.  Many of these unused stages by one thread can be used by executing software programs of other threads.  Probability of using all execution stages is higher with more number of threads.

    What are other situations where hardware threads are useful?

    Processor Delay during Cache miss:  L1 and L2 Caches are smaller compared to DDR. When instruction are executed or data operations are done,  target information may not be there in the cache, the condition called as Cache miss.  Cache miss leads to reading data from DDR.  This takes few cycles (around 100 cycles or so) to refresh the cache. During this time, if there is only one thread per core, the core literally waits and does not do anything.  If there are other threads in the core, they can do some other operations which does not require reading from external memory.

    Processor waiting for results from accelerator devices:  Many times software programs use accelerator devices in synchronous fashion.  That is, command is given to accelerator and waits for the result in continuous loop for the result.  During this time, that is, until the result is back from the accelerator, processor thread is not using any of its resources.  If there are multiple hardware threads,  these threads can use core resources and there by utilizing the core to its maximum.
    •  One might say that accelerators should not be used in synchronous fashion. Software programs need to use them in asynchronous fashion and do some thing else before results come in. In theory, it  sounds good, but in many cases it is not possible to change the software architecture  due to software developed in interpreter languages or developers don't like to change the software. 
    • One also might say that software programs, rather than the waiting for the result by making processor spin on result, yield to the operating system and get control through interrupt when the result is ready. Again, this sounds good, but it is not an option in many cases where hardware is accessed directly from the user space daemon (by mmaping) in Linux kind of operating system as interrupt latency, waking up the user process and user process getting scheduled can lead to significant latency. 
    • Some Multicore processors and Intel avoided these problems by introducing co-processor model with additional instructions.  For example, Crypto acceleration is implemented as new set of instructions in the core.  Though the situation is better here, still multithreading helps as co-processing unit might also have some pipeline stages.  Multithreading helps in utilizing these stages as well.

    What is the performance improvement expected with  and without Mulithreading?

    Depends on work load.  You might even see some performance degradation if application is single SW threaded.  But in general there would be performance improvement with threading, but mileage might vary. Networking application such as firewall, Ipsec see performance improvement in the range of very minor to 30% based on number of sessions between single threaded cores to dual threaded cores.  If the number of sessions/tunnels used to measure the performance is very less, then performance improvement with dual threaded cores would not be very high.  With more number of sessions, one could see significance performance improvement.  When there are less number of sessions, most of the session might be within the cache and DDR access may be very less. I guess that could be the reason why performance does not improve with number of threads.  But in real life,  DDR access, accelerator accesses would be there and performance improvements would be seen.

    How many threads are ideal to be there in the core?

    Based on network applications I mentioned above (Firewall, NAT and IPsec),  more than two threads per core does not provide enough improvement to justify the cost of adding thread to the core.  In the best case, we saw 30% performance with two threads and saw only 40% improvement with 4 threads when experimentation was done.  I personally expected more than 40%, but I can't explain why it is only 10% more with 4 threads. It could be that some other factors are playing role. 

    What kind of role OS can play?

    Note that all threads in a core are sharing L1 and L2 Caches. If threads run different software programs, then there could be lot of cache thrashing.  If same program is run on both threads of any given core then there is a possibility of cache data sharing.  Linux SMP keeps this in mind while scheduling the tasks to the cores and hardware threads. As much as possible, OSes tend to assign similar software thread to the another thread of same core.

    What is it I would like to say from applications perspective:

    • 2 hardware threads per core.
    • Larger L2 Cache due to Multithreading.
    • Coprocessor based Accelerators 
    Make no mistake - Hardware thread is not a replacement for core.

    Thursday, April 3, 2008

    Firewall and NAT using same rule base Vs. different rule bases?

    Many SMB security devices combined firewall and NAT in one rule base. That is, firewall rule itself can be configured with NAT configuration. I feel this is fine with when you have small number of rules. In Enterprise environment, having two rules bases provide very good flexibility.

    NAT and firewalls are two different functions. Firewall rule base is mainly meant for providing access control for different networks, machines and services. SNAT is mainly meant for providing internet access for multiple computers with less number of public IP addresses and also to hide internal IP addressing to the outside world and DNAT is meant for providing access to servers in private network from Internet.

    Granularity of firewall rules is very high. Yet times, firewall rules are created with single IP address or service. Also firewall rules are activated upon user login. If NAT configuration is part of firewall, then the NAT configuration needs to be duplicated many times in the firewall rule base.

    Granularity of NAT rules is very low. Typically, the granularity is at the network level. So, the number of rules for NAT would be small and independently manageable.

    There is another advantage in having independent rule bases. That is to do with role based management. In many Enterprises, NAT configuration is treated as network function, not security function. Firewall function is considered as security function. Organizations having different personnel for security and network administration find it convenient.

    Of course there is small disadvantage of having two rule bases. The connection rate performance is typically less than the single rule base. Since the number of rules in NAT rule base will be small in number, this would not be a big disadvantage.

    If I were the administrator, I prefer to go with security devices implementing these two function in two different rule bases.

    Traffic shaping and Real time traffic - Tips

    Don't assume that traffic shaping provided by many Residential and SMB gateways is enough for clear VOIP qulaity. It certainly improves the voice quality, but traffic shaping alone is not sufficient in many scenarios. Try downloading or uploading a file from popular sites in Internet, your voice quality suffers. Why is this? Hopefully this technical tip encourages the device vendors to start thinking towards providing better traffic management in their next firmware versions.

    Introduction:
    Many existing Residential and SMB gateways are based on SoCs (System On Chip). SoCs combine processor, Ethernet MACs, in some cases wireless MAC, Memory controller, Crypto accelerator and other peripheral buses into one single chip. These processor speed is in the tune of 400 to 500 MIPS. Recent generation of SoCs include gigabit MACs and DDR2 memory controller and also improving performance using fast path acceleration technologies. Though main processing speed has been going up, but not the extent of your Desktop PC processors. Many of these gateways typically drive upto 10Mbps links and hence it is thought this processing power is good enough to saturate bandwidth on WAN links. And it is fair assumption.

    These gateways typically installed behind DSL/Cable/T1 modems via Ethernet. Though Ethernet speed is in terms of 100Mbps, but the actual speed of the WAN link is limited to few Mbps. Traffic shaping functionality of RG and SMB gateways actually consider effective bandwidth while shaping the outbound traffic. Within this bandwidth, these devices prioritize the outbound traffic based on multiple conditions. More often or not, it is based on TOS (Type of Service) field of IP header. Many devices also provide functionality and configuration to set the TOS value based on 5 tuple rules. This helps in cases where VOIP TA or VOIP phone behind the gateway does not set the higher values in TOS for voice traffic (RTP traffic) by providing option for administrator to set higher TOS value for the traffic coming from known phones and adapters. Many devices also support setting TOS values on RTP traffic dynamically.

    Traffic shaping with this priority based scheduling works in many cases. But, it may not work in cases
    • where CPU power is limited: Many routers are coming with 1Gbps ports on LAN side. If somebody pumps the traffic at very high rate, CPU is busy in processing these packets and they get dropped at the WAN link due to traffic shaping. Since CPU is busy, it might not process some real time traffic.
    • where incoming bandwidth of WAN link is used up: Traffic shaping helps in shaping and scheduling outbound traffic, but it has no control over the packets coming from WAN link. If one of PCs locally is downloading a big file or movie, it might affect the VOIP traffic and you may not hear the remote party well.
    That is why, it is very important to have Traffic policing in addition to Traffic shaping. Traffic policing functionality typically throttles the traffic at ingress side. Traffic policing can limit the bandwidth usage of incoming traffic and prioritize the traffic. The bandwidth of the traffic that needs to be allowed should depend on CPU power. This policing should happen as soon as packet enters into the system to save CPU cycles. On the WAN link too, traffic policing should be done to make the sender send the traffic little slowly. This may not work for non-TCP traffic, but again, majority of traffic is TCP, so it should work fine as a system.

    For developers, I suggest to go with simple token bucket algorithm to detect and throttle traffic. Similar to traffic shaping, multiple rules can exists with each rule having set of throttling parameters, with each rule identifying the traffic by 5 tuple selectors and TOS values.