Disaster Recovery Post 9/11 --
A Customer Service Perspective

by
Charles E. Day CMC, FIMC, President, Charles E. Day & Associates
for
At Your Service Magazine, a publication of the International Customer Service Association
December 2001

Introduction

Disasters, natural (such as floods, fires, earthquakes) or other (such as hacker intrusion, internal security breaches, equipment failures, construction disasters) have a dramatic impact on business. The loss experienced by customers, employees, vendors and stakeholders expands beyond the financial realm. For example, it's not too difficult to conger up news headlines and memories from the aftermath of Hurricane Andrew, devastating forest fires in Washington and Ohio, mid-west floods, the first bombing of New York's World Trade Center, the Northridge, California earthquake and the Oklahoma City bombing to name only a few. The latest terrorist attacks on September 11, 2001, however, created a heightened awareness by businesses of the need to more fully plan for the protection of company assets, customer access and business continuity. The scope of the tragedy and the devastation that ensued changed business priorities instantly, and strategic crisis planning became an imperative reality and not a canned, lip service response from top management.

Customer service management is only one component of disaster recovery programs. Under day-to-day business conditions, there are essential steps for protecting customer relationships, but in times of crisis, extraordinary measures need to be established, communicated, validated and refined. During disasters, companies of all sizes and types, regardless of industry or organizational structure, need to go beyond what is normal and be equipped to handle "what if" scenarios with minimal disruption of service. 

Disaster recovery programs, also known as business continuity or strategic crisis planning, become even more complex when automated systems and teleservices are involved. They require the implementation of multi-faceted, integrated steps that apply to applications development, systems and network designs and procedures in order to maintain readiness for alternative service to customers should business disruptions occur. Opportunities for developing plans to minimize risks include six key areas of consideration:

  • Systems
  • Networks
  • Data and information
  • Procedures
  • Facilities
  • Web-enablement

Systems

Whether companies employ client server technologies, Web-based applications or legacy host processors for customer service operations, there are fundamental must do's that have become even more critical to protect business and its relationships. While many of the more traditional disaster recovery or business continuity planning steps still apply, it is increasingly important to re-think existing system operations and include more proactive planning for the changed security risks now imposed upon management.

Stemming from mainframe systems protection procedures of the past, hot sites or off-site facilities equipped with customer control centers, work areas and telecommunications systems, are a stable service and the strength of successful outsource companies. These services are provided by several vendors, including IBM Corporation, (White Plains, New York), COMDISCO. Inc. (Rosemont, Illinois), and SunGard Recovery Services Inc. (Wayne, Pennsylvania). 

The concept of hot sites is to prearrange for transport of computer operating systems and transfer of network facilities to a nearby or remote facility operated, ideally, outside the natural disaster area. Employees and staff would be transferred on a temporary basis until service can be re-established at the principal location. 

IBM's Business Recovery Services, in conjunction with Teloquent Communications Corporation, a supplier of call center software solutions, has introduced CallProtec. The combined services provide a full-featured, switchless Automatic Call Distributor (ACD) to provide organizations with an end user recovery facility that minimizes down time caused by disasters. The CallProtec concept enables organizations to re-route calls to one of IBM's regionally located call management centers where trained agents can receive telephone calls until fulltime agents can arrive on site or to another alternative site to receive customer calls. The concept is called fault-tolerant because of the linking of ACD network capability to locations with mainframe functionality to continue operations for mission critical applications. Capabilities such as "mirroring" for both mainframe and database systems should be considered for effective ACD operations. IBM's business recovery program is touted as one of the few, if not only, ISO 9000 certified applications. The hot site arrangement generally requires an initial fee, along with a monthly subscriber's fee. Additional payment is due in the event that activation of hot site facilities is required. 

The recent 9/11 disaster caused SunGard (Wayne, PA) to active several hot site facilities in the New York City areas immediately following the WTC attach. Two firms which subscribe to SunGard disaster recovery services telephonde them for facilities where their employees could be temporarily relocated. Space was afforded at a New Jersey location. SunGard immediately pulled out customer computing system configuration information and telephone PBX/ACD requirements. The first step was to establish an emergency telephone announcement on the companies' main lines to inform customers of the potential delay in getting service. Within hours facilities were prepared for some 220 telephone users to continue conducting business as close to normal as possible. 

Networks

When it comes to network configuration designs, both voice and data, the dominant consideration has been bottom line cost. Decisions are driven by the goal of how to get the voice network cost under $0.04 per minute and how to increase the bandwidth and committed information rate of data network less expensively. A more ubiquitous and secured approach may be better able to cope with the increased risks in today's operations, especially given the spontaneity with which outages occur. 

Several long distance companies have developed capabilities for re-routing teleservices calls to alternative locations. For example, AT&T's disaster recovery program has the capability to transport specifically designed tractor-trailer rigs, equipped with sophisticated equipment, to the desired location for fallback recovery. AT&T will have several warehouse locations across the country for storing these rigs. While the equipment generally travels by road, it can be shipped by rail or air in extreme emergencies. An exercise to test this capability in Farmington, Massachusetts by AT&T with NYNEX and Telport Communications Group was completed earlier.

Advanced 800 services by major long distance common carriers provide the ability for the long distance company to redirect 800 traffic or for users who have DACS (Dynamic Allocation Control System) to automatically effect changes from one location to another. Local exchange companies offer CENTREX service with call forwarding capability, which is another process for rerouting calls more quickly than with normal point to point telecommunication networks. SONNET (Synchronous Optical Networks), a fiber optics high bandwidth telecommunication facility, has been incorporated by several local exchange companies into a fiber ring within large metropolitan areas. These rings have the ability to re-route traffic automatically in and around failed points, particularly those distant from the physical facility of a using teleservices organization. 

Another technique used to protect a portion of facilities that may be disrupted by natural disasters is redundant network facilities, with twice the capacity normally required for business. Some companies have specifically developed interactive voice response applications, which have limited mission critical applications in lieu of live agents fielding calls during a natural disaster. Network firms as well as outsource agencies can provide capabilities for such applications should they not be developed by the client at an available site. 

I am reminded of an incident that occurred In Downers Grove, IL in the late 1980's when an Ameritech (Illinois Bell) central office caught fire and affected services at most area businesses including a client I was working with as a consultant. The outage and restoral process took weeks for re-routing by the Telephone Company. However, my client through either some genius, or pure luck, had both AT&T and Sprint long distance services. Connection to one of the two points of presence (POP) was completely lost while the other enable the outbound telephone contact center to continued functioning, albeit at 40% of full capacity. While predictive dialers became inappropriate, the business continuing with the entire staff dialing on an automatic or manual basis.

Data and Information

The challenge of preserving customer relationships comes into play when data and information are compromised. For example, when customer ability to access self-service applications, check order status, retrieve information and document transactions is interrupted, business stops. Simple rules to protect data and information are no longer a low priority and nicety. They are very serious business choices in protecting companies and customers from damaged systems and closed facilities. 

One solution that allows for data and information protection is to establish "mirror data". This is a process by which companies replicate information at different locations on an online basis or on different storage media. It is a proven method for maintaining duplicate records in the event of system problems, which tend to occur more frequently than a total disaster. Off-site storage of operating systems and the use of vaults for protecting customer data and operating system programs is a more common choice for disaster recovery protection at a much lower cost than a hot site operation. The International Disaster Recovery Association in Shrewbury, MA is an organization set up specifically to assist vendors and users in making business connections for disaster recovery measures. 

Procedures

Logical steps for moving data, information and employees must be mapped out ahead of time and with a higher level of awareness for security issues. More genuine intentions for outsourcing must be fostered in building alliances within and outside of the organization before emergencies occur. If a company loses capability or capacity within a department or location to conduct business, management and staff need to know what agreements are in place to rapidly move customer services elsewhere with minimum disruption. 

In the days following the World Trade Center attach the Red Cross was seeking donations of food, bottled water, clothing and dust mask for rescue workers on the scene at Ground Zero. The response to the Red Cross' call center and hot line, 800-Help Now, was so great their telecommunication network and call center staff were overwhelmed with the large increase in calling volumes. Procedures for accepting volunteer support services from other telecommunications and call center operations were in place and executed. 

A telecommunications vendor of Red Cross contacted EDS (Plano, TX) and by mid-night on September 12 EDS' call center in Mecahnicsburg, PA was assisting with overflow calls. EDS also reportedly borrowed agents from one of its client's call center for whom EDS conducts outbound dialing campaigns. By September 13, EDS was also routing calls to its Troy, MI call center and later to two others in Tucson, AZ and Plano, TX. By the way if you applaud the process used, you should also salute the many employees in these call centers who volunteer their time to answer calls during the peak disaster recovery period.


Facilities

A physical location remains a pivotal point in most firms for normal operations. When companies must temporarily relocate, use telecommuting, occupy hot sites or revert to more interactive voice and recognition applications and Web enablement, how many are really prepared to deliver quality services?

Some straightforward and less costly approaches for developing a disaster fallback program would include the use of multiple common carriers and different points of presence (POPs) in accessing long distance networks without traversing the same common points. Secondly, diverse routes from a building with ducting systems to the public network from teleservices call center have been found to be another good step for protection. Uninterruptible power sources (UPS) or standalone power generators vary in cost, but they serve the purpose of protecting critical components of both telephone and computer systems during momentary power outages with generator capability useful for more extended periods. Also, uninterruptible power sources provide a more level voltage to critical computing and telephony equipment, thereby serving a useful purpose to protect from damage. 

The use of multiple call centers with the capability for networks that look ahead and search for the next most available agent or the availability of a particular center has great fallback potential, especially when the locations are fairly remote. A concept introduced by a number of telephone PBX and ACD vendors is called Remote Agent. This feature allows homebound employees to field calls by dialing into a system. Also, with newer capabilities for multiple lines, employees can begin to use computing systems along with receiving customer telephone calls at home during a disaster where access to a building or transportation has been disrupted. 

A simpler form of disaster recovery is the use of multiple carriers whenever possible, so that capability for re-routing traffic or at least fielding portions of traffic over one long distance company in the event of a problem with another provides some measure of security. A concept recently used successfully in the airline business for economies of scale versus installing permanent ACD equipment was to use include long distance network based ACDs, which have a natural disaster recovery capability for re-routing traffic across offices and precluding potential for power and other incidental failures at a teleservices office. 

Another technique for assuring ongoing fallback recovery is to establish an ongoing relationship with an outsource agency or service bureau which can field, for example, ongoing overflow calls for customers or sales. This is done with the understanding that increased volume can be re-directed automatically in the event of a problem with systems or facilities at the teleservices call center. 

Web Enablement

Introduction of customer self-service offerings using Internet and Web enabled applications has found its way superficially in operations in many instances. This means that there is not the same level of support, understanding and commitment it takes to both preserve its use and extent its application to shore up other impacted functions in customer service as required in emergencies. Vigilance in genuinely planning operations and support on an integrated basis with cyber applications will likely have more significant impact in surviving business disasters and to a larger extent advancing the appeal of this customer access channel.

The example above in using outsourcing and disaster recovery procedures also has with it a perfect illustration of effective interactive web site enablement. EDS was able to swing it's staff from several remote call centers with minimal training because the employees at other call centesr were already equipped to log onto the Red Cross' web pages. Since the web site was previuosly set up for interactive use by the public, it was a simple training lesson for the agent volunteers to pick up on the procedures and process to read the scripts and enter required information. Having web sites like this for your organization can also be used to broaden customer access during normal times. This is both invaluable and speedier in the event of a true disaster and/or should the the need to get outside assistance occur.

From the beginning, disaster recovery has been a very expensive program to develop because the plan needed to include duplicate offsite facilities and systems, operating system offline/offsite storage and complete network re-routes. Although, based on risk, cost is still a factor and must be considered in business planning, it is important to understand more complete and yet practical measures to protect business loss. By example the Strategic Research Corp. (Santa Barbara, CA) has estimated the cost of corporations downtime to be as high as $6.6 million per hour in unrealized revenue or services. The cost could be expanded to included compensation to employees or affected staffs, damage to equipment, and even customer relationship management and confidence. Condisco (Rosemont, IL) completed a study in year 2000 and discovered that only 30% of some 200 companies survey had published disaster recovery and business continuity plans in place. Amazingly, however, nearly 25% of these firms actually experienced a disaster of large enough proportion to shutdown computer operations.

Disaster recovery for teleservices organizations is making more and more sense considering the investment and the potential impact on business and customers. It is probably more important to document the process and consciously discuss measures than it is to implement the most expensive system or approach up front. A migration plan for increasing protection for critical components can be developed over time starting from the more simple back ups of having duplicate records of vital information. 

 


 

Website by: Capital Leadership Group