The easiest way to learn a few new four-letter words is to kill the power in an office full of PCs. In today’s office, however, dealing with actual loss of power is only one aspect of a disaster plan. Many other aspects of power loss need to be included in any comprehensive disaster prevention program.
The complexity of protected equipment can create headaches for someone installing a power backup. Robert Bauer, president of Libert Americas, Columbus, Ohio, sums up that aspect of the problem by saying, “In many cases, data processing systems are comprised of a curious mix of leading-edge and aging technology,” he said. “The platforms have not necessarily been designed for maximum power availability or power protection.”
To compound the problem, the increasing use of high-speed technologies and attachment of equally sensitive components such as hubs and routers, has actually increased the entire system’s sensitivity. Some people caution against putting all of one’s eggs into one basket. But there is value in having everything in one spot as long as it can be watched with a hawk’s eye. That was the strategy followed in the days of the single-site mainframe, which can be protected inside the “glass house.” Today it is commonplace for computing equipment to be scattered in every corner of a building or campus. But the electrician is still expected to protect each and every piece.
On one hand, distributed networking gives the user effective tools to increase productivity. On the other, these systems are far less robust than their mainframe predecessors.
“Computer users almost expect crashes,” agreed Ben W. Tartaglia, executive director of the International Disaster Recovery Association, Shrewsbury, Mass. (www.idra.com). Yet, he argues that data service is much easier to manage than voice services like telephones. And, in a perverse sense, a computer outage is less of a disaster than a telephone outage.
“With data, you are dealing with things instead of people,” he said. “When you talk to top executives, voice service still is more in front of their minds. If a computer goes down, it’s a minor problem to them. If their phone goes down, they are very upset.”
Yet the costs for lost time on a data network are substantial. A study done at the University of Wisconsin showed that computer downtime costs U.S. businesses $4 billion a year, primarily through lost revenue. The average company’s computer system goes down nine times per year, for an average of four hours each time.
There are answers.
“Design a solution that provides the greatest amount of field hot-swappable options. In the field or in the network room, the more parts that are serviceable and hot swappable without bringing down the system, the better,” said Ron Mann, director of engineering for the rack and power systems group of Compaq Computer Corp., Houston, Texas (www.compaq.com/ups).
He said another key design issue for structuring networks for reliability is “modularity.” Modularity is similar to hot swapability. High-availability systems contain modular products for ease of expansion, service, and reconfiguration.
UPS under siege
Both for data and for voice networks, large uninterrupted power supply (UPS) systems continue to provide critical protection. “Yet they, too, are under siege to process more and more critical information with no allowance for even the slightest system failure,” Bauer said. “Demands with servers, e-commerce/Internet and data/telecommunications equipment have made the ‘no downtime’ operating parameter an achievable goal. The challenge is being able to develop a facility infrastructure that can have online maintenance and repair without impacting the processing operation.”
This demands more than sticking a UPS or generator between the utility and the network, especially for critical banking, surveillance, and security systems. Bauer recommends a “more holistic approach,” one taking into account the protection equipment, site evaluation, and proper maintenance.
“First and foremost,” Bauer said, “the UPS should be online. As opposed to offline or line-interactive UPS, an online system eliminates a wider range of potential power problems such as spikes, surges, and difficult-to-track harmonics common with standby generator options,” he explained.
Shutdown abilities can range from ability to perform routine alerts of imminent power loss to load shedding and staged shutdowns, which turn off nonessential applications to preserve power for mission-critical systems.
“It is now possible to supply event-specific alarm messages that provide instant recognition of a power problem,” Bauer said. These messages can be sent directly to certain areas so an engineer can respond and resolve the problem before any real disruption occurs.
Another handy feature is the ability to put a UPS in manual or automatic bypass mode. This keeps the system up and running but allows servicing the UPS system. Without some sort of bypass switch/mode, the system and the equipment would need to be powered down.
No matter how good the UPS, be sure the system is backed up and backed up correctly. Tartaglia talked about a large metropolitan hospital that, several years ago, irrevocably lost its entire pharmacy database, including current patient information, following a crash. The hospital discovered that the backup tapes it had been producing nightly for over two years were of the wrong files. Therefore, no backups had ever been made of the lost database.
The same kind of problem can occur if the wrong systems are on the UPS, or the system is not configured to make best use of the UPS.
“System configuration makes the difference between 99.9999 percent and total assurance of power in virtually every circumstance,” Bauer said. In its simplest form, distributed redundancy involves creating two UPS system busses and redundant power distributed systems. This should eliminate most single points of failure, all the way up to the load equipment’s input terminals.
To protect against fast power system failures, such as circuit breaker trips or power system fault, a fast switching method is required. Bauer recommends static transfer switches (STSs) for fast break-before-make transfers between AC power sources. “It is important that the two AC power sources be designed as independent as practical to eliminate common failures,” he said. “Keep in mind that redundancy needs to be as close to the load as possible to keep power available at the load equipment level.”
For ultra-critical loads, a power system needs to be about 10 times more reliable than the load and it must be redundant. Look to parallel redundancy or distributed redundancy configurations. This uses two independent UPS power distribution systems with dual-input load equipment as redundant AC power is provided up to and inside the load equipment.
“This way,” Bauer explained, “distributed redundancy not only provides the best assurance of power reliability and availability, but it also paves the way for an easy migration path as more dual-bus loads become deployed.”
Other areas that should be on the checklist include the physical security of both the protected equipment and the UPS system. Lock-and-key security is a minimum level required for each critical component.
Check the environmental security of the system. Heating and cooling are required in many applications. Check for waterproofing. In a disaster at a Cleveland publishing office, extensive water damage occurred when major pipes broke. Likewise, in the New York World Trade Center bombing, much of the damage was the result of water drenching offices and electrical facilities, rather than from direct injury from the actual bomb blast.
Be sure the customer has a preventive maintenance program in place for the UPS and other systems. This includes checking the battery charge; topping off fuel levels in generators; replacing filters; checking for contamination like dirt, dust, and overheating; and other regular maintenance.
“All batteries eventually will fail,” conceded Mann. However, he notes that it is possible for a contractor to use systems that make sure the customer gets reliable advance notification prior to battery failure. Microprocessors track the charge and discharge characteristics of the battery. These characteristics are compared to an ideal battery state, indicating, well in advance, when battery replacement is necessary.
One way to get longer battery backup time is by controlling individual load segments. “Shutdown non-critical devices first to allow battery backup time where it is needed most,” Mann advises. For example, a system can be configured so the storage peripherals can be shut down first to allow additional backup time for the servers.
Mann said that most power outages last less than 10 minutes. “The time gained can carry the network through the power outage without anyone even noticing that there was a power problem,” he said. This is accomplished with power management software.
It is all part of the package any contractor should recommend to prevent potential problems before they become true disasters.
HARLER, a contributing editor to Electrical Contractor, is based in Strongsville, Ohio. He can be reached at (440) 238-4556 or firstname.lastname@example.org.