Network Redundancy for Industrial Control Systems
Key Takeaway
Network redundancy in industrial control systems ensures continuous SCADA and DCS communication despite cable cuts, switch failures, or device faults. Common redundancy protocols include RSTP for general switching, MRP and DLR for ring topologies, HSR and PRP for zero-switchover hitless redundancy, and ERPS for utility-grade Ethernet rings. Proper redundancy design is critical for safety and production continuity.
Why Network Redundancy Matters in ICS
Industrial control systems manage physical processes where communication failures have immediate and potentially dangerous consequences. A lost network connection to a pipeline pressure sensor, a refinery burner management system, or a water treatment chemical dosing controller can result in equipment damage, environmental release, or safety hazards. Network redundancy provides alternative communication paths so that a single cable cut, switch failure, or device fault does not interrupt process control. Design standards like IEC 62439 define redundancy protocols specifically for industrial Ethernet networks.
Rapid Spanning Tree Protocol (RSTP)
RSTP (IEEE 802.1w) is the most widely deployed redundancy mechanism in industrial Ethernet networks. It operates at Layer 2, detecting loops in switched networks and placing redundant ports in a blocking state. When a link fails, RSTP recalculates the spanning tree and activates the blocked port, typically converging in 1-3 seconds. While RSTP is well-understood and supported by virtually all managed switches, its convergence time is too slow for many industrial applications where even a 1-second communication gap causes alarms or process upsets.
RSTP Limitations in Industrial Networks
- Convergence time: 1-3 seconds typical, though larger networks with many switches may take longer. Process control applications often require sub-200 ms failover
- Topology dependency: Convergence time increases with network size and complexity. Each additional switch adds delay to the recalculation
- Traffic disruption: All traffic on the affected segment stops during reconvergence, including time-critical I/O and safety communications
- Best use: Non-critical monitoring networks, IT/OT convergence layers, and applications tolerant of brief outages
Media Redundancy Protocol (MRP)
MRP (IEC 62439-2) is designed specifically for industrial Ethernet ring topologies. A designated Media Redundancy Manager (MRM) monitors the ring by continuously sending test frames around both directions. When a failure is detected, the MRM opens the previously blocked redundant port, restoring communication through the alternate path. MRP achieves switchover times of 10-200 ms depending on the ring size and configuration, meeting the requirements of most PROFINET and industrial automation applications.
MRP Characteristics
- Topology: Single ring of up to 50 devices. Each device connects to two ring ports
- Switchover time: Less than 200 ms guaranteed; typically 10-30 ms in practice for rings under 50 devices
- PROFINET integration: MRP is the standard redundancy protocol for PROFINET networks and is natively supported by Siemens, Phoenix Contact, and Hirschmann switches
- Configuration: One device is designated MRM (manager); all others are MRC (clients). Ring ports must be configured on each device
Device Level Ring (DLR)
DLR is the ring redundancy protocol for Ethernet/IP networks, specified by ODVA. Like MRP, DLR creates a logical ring with a supervisor node monitoring ring health. DLR is unique in that it operates at the device level, embedded in Ethernet/IP devices like Allen-Bradley I/O modules, drives, and sensors, requiring no external switches. This reduces cost and simplifies wiring in device-level networks. DLR provides switchover times under 3 ms for rings up to 50 devices.
Parallel Redundancy Protocol (PRP)
PRP (IEC 62439-3) provides zero-switchover, hitless redundancy by connecting every device to two completely independent Ethernet networks. Each device sends duplicate frames on both networks simultaneously. The receiving device accepts the first frame to arrive and discards the duplicate. Because both networks operate continuously, there is no switchover time at all. If one network fails, the other continues without any interruption or reconvergence.
PRP Architecture
- Switchover time: Zero (0 ms). Truly hitless redundancy with no frame loss during any single-point failure
- Infrastructure: Requires two completely separate Ethernet networks (switches, cabling) plus dual-port network interfaces on all devices
- Cost: Approximately double the network infrastructure cost due to duplicate switches, cables, and ports
- Use case: Electric utility substation automation (IEC 61850), safety-critical process control, nuclear power systems
High-availability Seamless Redundancy (HSR)
HSR (IEC 62439-3) achieves zero-switchover redundancy similar to PRP but in a ring topology rather than parallel networks. Each HSR device sends duplicate frames in both directions around the ring. The destination device accepts the first copy and discards the duplicate. HSR eliminates the cost of duplicate network infrastructure but requires all devices in the ring to support HSR. It is widely used in IEC 61850 substation automation where protection relays and merging units require guaranteed zero-loss communication.
Ethernet Ring Protection Switching (ERPS)
ERPS (ITU-T G.8032) provides carrier-grade ring redundancy with guaranteed switchover under 50 ms for rings of any size. Originally developed for telecommunications networks, ERPS is increasingly used in utility and pipeline SCADA networks that span large geographic areas. ERPS supports multiple rings, interconnected rings (ladder topology), and VLAN-based traffic management. Unlike MRP, ERPS can scale to rings with hundreds of switches and thousands of kilometers of fiber.
Designing Redundant Industrial Networks
Effective redundancy design starts with identifying the recovery time requirement for each network zone. Safety and protection systems demand zero-loss (PRP/HSR). Process control networks typically require sub-200 ms (MRP/DLR). Monitoring and enterprise networks can tolerate 1-3 seconds (RSTP). Separate physical cable paths are essential to prevent a single event (cable tray fire, excavation damage) from disrupting both primary and backup paths. NFM Consulting designs multi-layer redundant networks that match the redundancy protocol to the criticality of each zone while optimizing infrastructure cost.
Frequently Asked Questions
PRP and HSR provide zero-switchover (0 ms) redundancy by sending duplicate frames on parallel paths simultaneously. This is the fastest possible redundancy. Among ring-based protocols, DLR achieves sub-3 ms switchover, MRP provides 10-200 ms, and RSTP takes 1-3 seconds. The choice depends on cost tolerance and recovery time requirements.
MRP is the standard redundancy protocol for PROFINET ring networks, providing switchover under 200 ms with native support from all major PROFINET device manufacturers. For applications requiring zero-loss redundancy, PROFINET also supports system redundancy (S2) with redundant controllers and PRP for the network layer.
Both PRP and HSR provide zero-switchover hitless redundancy using duplicate frames. PRP uses two separate parallel networks, requiring double the switching infrastructure but no special requirements on intermediate switches. HSR uses a ring topology, saving infrastructure cost but requiring every device in the ring to be HSR-capable. PRP is better for mixed-device environments; HSR suits homogeneous device rings.