Geo SCADA Support SLA Best Practices
Key Takeaway
Effective Geo SCADA support SLAs define response times by severity level, specify escalation paths, include measurable uptime targets, and address both business-hours and after-hours coverage. Best practice SLAs separate incident response from planned maintenance and include provisions for ERCOT compliance events.
Quick Answer
Effective Geo SCADA support SLAs define response times by severity level, specify escalation paths, include measurable uptime targets, and address both business-hours and after-hours coverage. Best practice SLAs separate incident response from planned maintenance and include provisions for ERCOT compliance events.
Why SLAs Matter for SCADA Support
A service level agreement establishes the baseline expectations between your organization and your managed SCADA provider. Without clearly defined SLAs, there is no objective way to measure service quality, hold the provider accountable, or plan your internal operations around expected support response times. For critical infrastructure running Geo SCADA, the consequences of ambiguous support expectations can include extended telemetry outages, compliance violations, and unplanned operational risk.
Severity Classification
The foundation of any SCADA support SLA is a severity classification system that categorizes issues by their operational impact. A well-designed classification typically includes three to four levels:
- Critical (P1) — Complete platform outage, total loss of telemetry, or active ERCOT compliance violation. Response target: 30 minutes or less.
- High (P2) — Partial telemetry loss, communication driver failure affecting multiple sites, or historian not recording. Response target: 2 hours.
- Medium (P3) — Single-site communication issue, non-critical alarm failure, or ViewX display problem. Response target: 4 hours during business hours, next business day after hours.
- Low (P4) — Cosmetic issues, documentation requests, point add requests, or planned configuration changes. Response target: next business day.
Response Time vs. Resolution Time
A common SLA mistake is conflating response time with resolution time. Response time measures how quickly the provider acknowledges the issue and begins working. Resolution time measures how long until the issue is fully resolved. SLAs should define both, with the understanding that resolution targets are goals rather than guarantees — some issues require vendor involvement or hardware replacement that extends resolution beyond the provider's control.
Uptime and Availability Targets
SCADA platform uptime targets should account for planned maintenance windows. A 99.5% uptime target excluding scheduled maintenance translates to roughly 44 hours of unplanned downtime per year. For mission-critical deployments, 99.9% (8.7 hours per year) is a more appropriate target, though achieving it requires redundant server configurations and proactive monitoring.
ERCOT Compliance Provisions
Organizations operating ERCOT-registered load resources need SLA provisions specific to telemetry compliance. This includes accelerated response for telemetry failures that affect real-time generation reporting, defined procedures for ERCOT communication during outages, and documentation support for compliance inquiries. ERCOT compliance incidents should be classified as Critical (P1) regardless of the technical scope.
Reporting and Review
SLAs should include monthly or quarterly reporting that tracks response time compliance, ticket volume by severity, recurring issues, and uptime metrics. Regular SLA reviews (at least annually) ensure the agreement evolves with your operational needs and the provider's service capabilities.
Frequently Asked Questions
A best-practice Geo SCADA SLA includes tiered response times: 30 minutes for critical outages, 2 hours for high-severity issues, 4 hours for medium, and next business day for low priority requests.
Yes. ERCOT telemetry failures should be classified as critical (P1) in any SLA, with provisions for accelerated response, ERCOT communication procedures, and compliance documentation support.
99.5% uptime (excluding planned maintenance) is a reasonable baseline. Mission-critical deployments with redundant servers should target 99.9%, which allows roughly 8.7 hours of unplanned downtime per year.