The Simplicity and Serenity of DHCP Fault Tolerance
Given the simplicity and value of setting up fault tolerance for your DHCP services, it's surprising how many network admins fail to do so. Drew Bird explores how a simple matter of planning your implementation and putting it into place can save you potential aggravation down the road.
Fault tolerance is a factor that’s considered in the provision of almost every network service. Understanding that down-time costs money, our need to provide fault tolerance for networked environments has gone beyond just making sure that users have uninterrupted access to the network. It is universally accepted that the level of fault tolerance can affect the viability and bottom-line success of an organization.
Nevertheless, while some network services get ample attention when it comes to fault tolerance, others such as DHCP, often do not. Some would claim this is because DHCP, for reasons we’ll explore shortly, serves a less important role to the critical on-going operation of the network as opposed to other services like DNS. While this may be true, try taking down a DHCP server and see what happens. The problems may not appear immediately, but ultimately the result will be the same — the network will stop operating.
In many organizations, provision of fault tolerance for DHCP is seen as a minor consideration. After all, if the DHCP server goes down, it only takes about 10 minutes to install the service on another server and redefine the address scopes. Although this is a simple solution, it goes against the grain of today’s highly controlled network environments.
For example, what server do you install the DHCP service on? The server running the corporate accounting system, or perhaps the one servicing the e-commerce website that is the lifeblood of the company? OK, so perhaps that’s a little over dramatic, but the fact remains that in today’s world of micro-managed and controlled networks, you do not install a new application, or a new server for that matter, without a healthy measure of consideration and planning.
The Inner Workings of DHCP
One of the reasons that DHCP is often not as well protected, from a fault tolerant viewpoint, is that a DHCP failure is generally not an immediate mission-critical concern. The mechanics of DHCP are such that the failure of a DHCP server may not have an impact on the network for hours or even days. This is due to the way in which DHCP leases work.
When a client system obtains an IP address via DHCP, the address is given to the system, or leased, for a given period. At various points during the lease (normally 50% and 85%), the client system will attempt to renew the lease with the DHCP server. If it cannot renew the lease, it will still use 100% of the lease term before ceasing to use the address. With an address lease duration of 3 days (which is quite common), this would mean that a system could go a total of 3 days before the inability to contact the DHCP server becomes an issue.
Problems can arise, though, when DHCP address leases are configured for a particularly short period, such as a few hours. In these cases, a DHCP server failure can create more of an issue, as a few hours may not be enough to recognize the failure and bring another DHCP server online and into service. A simple example of this might be if the failure occurred overnight, and was only realized in the morning when users were unable to log on to the system. You could say that the simple solution to this problem is not to use short DHCP leases, but that’s not always possible.
Another aspect of DHCP leases that justifies the need for fault tolerance is the way that addresses are handled after they are issued. Once an address is leased out to a system, that address cannot then be assigned to another system until it is released, harvested, or the lease expires. In an environment where there are a large number of system changes on the network, this can cause problems.
For example, in a highly mobile workforce that connects and disconnects to the network frequently, available addresses can get used up quickly. The common solution to this problem is to shorten the DHCP lease duration. As we just discussed, though, this puts you at a higher level of risk in the event of a DHCP server failure.