AI-Assistant for Power Grid Operation

Van Tuan Dang

AI/ML Scientist & Data Solution Architect

November 30, 2024 30 min read

Multi-Agent System Improved from L2RPN2023 Challenge Victory

This article presents two main contributions: (1) A modular multi-agent system design for power grid management, and (2) Application of imitation learning to predict top-k actions from a knowledge base, improving decision-making speed and supporting real-time grid operations.

What You'll Learn

The Power Grid Management Challenge
System Architecture Overview
Key Components and Their Functions
Machine Learning for Grid Optimization
Decision Making Process
Performance Evaluation on L2RPN2023 Dataset

The Power Grid Management Challenge

Modern power grids face unprecedented challenges: increasing renewable energy integration, volatile demand patterns, and the constant risk of equipment failures. Grid operators must make rapid decisions to prevent cascading failures while optimizing for efficiency, stability, and cost.

The most critical challenge in power grid management is responding to unexpected events within seconds to prevent cascading blackouts.

Traditional approaches often fall short in handling the complexity and speed required for modern grid management. Manual decision-making processes can't keep up with the millisecond-level response times needed to prevent cascading failures, especially in large interconnected networks with hundreds of components.

Critical Challenges in Power Grid Management

N-1 Security: The grid must remain stable even if any single element fails.
Overload Prevention: Power lines have thermal limits that must not be exceeded.
Topology Optimization: Finding the optimal grid configuration among billions of possibilities.
Real-Time Decision Making: Decisions must be made within seconds during critical events.
Renewable Integration: Dealing with variability and uncertainty of renewable energy sources.

These challenges are the driving force behind the L2RPN (Learning to Run a Power Network) competitions, including the 2023 Paris Region AI Challenge for Energy Transition, which evaluates AI-based approaches to grid management under realistic conditions.

System Architecture Overview

The AI-Assistant for Power Grid Operation employs a multi-agent architecture where specialized components work together to handle different aspects of grid management. Each agent is designed to excel at a specific task, creating a comprehensive system capable of addressing the diverse challenges of power grid operation.

classDiagram class PowerGridAgent { +env: BaseEnv +do_nothing: BaseAction +config: dict +rho_danger: float +rho_safe: float +action_space_n1: List[BaseAction] +action_space_overload: List[BaseAction] +imitation_N1: Imitation +imitation_Overload: Imitation +act(observation, reward, done) -_load_mapping_dict() -_load_topo_actions() -_load_topk_actions() -_handle_safe_rho() -_reset_dispatcher() } class AgentTopology { +env: BaseEnv +do_nothing: BaseAction +config: dict +line_to_sub_id: dict +areas_by_sub_id: dict +strategies: dict +get_topology_action() +recover_reference_topology() +change_substation_topology() +revert_topo() +act() } class AgentReconnection { +env: BaseEnv +action_space: ActionSpace +lines_in_area: List[List[int]] +area: bool +time_step: int +verbose: int +recon_line_area() +reco_line() +combine_actions() +act() } class AgentRecoverTopo { +env: BaseEnv +find_best_line_to_reconnect() +is_legal() +check_convergence() +revert_topo() +act() +get_from_dict_set_bus() +extract_action_set_from_actions() } class DispatcherAgent { +env: BaseEnv +do_nothing: BaseAction +config: dict +time_step: int +verbose: int +compute_optimum_unsafe() +compute_optimum_safe() +to_grid2op() +run_dc() +reset() +act() } class Imitation { +config: dict +mapping_dict: dict +inverse_mapping_dict: dict +n_class: int +device: torch.device +model: torch.nn.Module +predict_actions() +predict_actions_id() +predict_from_obs() +calculate_topk_accuracy() +evaluate_dataset() } class GraphTransformerModel { +device: torch.device +encoder: GraphTransformer +node_feature_encoder: nn.Linear +edge_feature_encoder: nn.Linear +pred_heads: nn.ModuleList +encode_features() +forward() } class PowerGridModel { +gat1: TransformerConv +gat2: TransformerConv +gat3: TransformerConv +fc: nn.Linear +relu: nn.ReLU +dropout: nn.Dropout +forward() } class EncodedTopologyAction { +data: str +to_action() +encode_action() +decode_action() } PowerGridAgent --> AgentTopology : uses PowerGridAgent --> AgentReconnection : uses PowerGridAgent --> AgentRecoverTopo : uses PowerGridAgent --> DispatcherAgent : uses PowerGridAgent --> Imitation : uses (N-1) PowerGridAgent --> Imitation : uses (Overload) PowerGridAgent --> EncodedTopologyAction : uses Imitation --> GraphTransformerModel : uses for N-1 Imitation --> PowerGridModel : uses for Overload

Decision Time

0.055s

95th percentile response time

Components

Specialized agents

Success Rate

150/208

Scenarios with Sequential approach

Simulations

~69

Average per decision step

The system's architecture follows a hierarchical decision-making approach where the main PowerGridAgent orchestrates the actions of specialized sub-agents. This modular design allows each component to focus on its specific expertise while the main agent handles the integration of their outputs.

flowchart TD subgraph Input obs[Grid Observation] --> grid_state{Grid State} end subgraph PowerGridAgent grid_state -->|rho > danger| danger[Overload State] grid_state -->|rho < safe| safe[Safe State] grid_state -->|safe < rho < danger| normal[Normal State] danger --> imit{Imitation Learning?} imit -->|Yes| topk[Predict Top-k Actions] imit -->|No| fullspace[Use Full Action Space] topk --> topology[AgentTopology] fullspace --> topology topology --> topo_result{Action Found?} topo_result -->|Yes| apply_topo[Apply Topology Action] topo_result -->|No| dispatch{Dispatching?} dispatch -->|Yes| dispatcher[DispatcherAgent] dispatch -->|No| do_nothing1[Do Nothing] safe --> recover[AgentRecoverTopo] normal --> do_nothing2[Do Nothing] end subgraph Actions apply_topo --> final_action[Final Action] dispatcher --> final_action recover --> final_action do_nothing1 --> final_action do_nothing2 --> final_action end subgraph Reconnection recon[AgentReconnection] --> final_action end obs --> recon

Key Components and Their Functions

The AI-Assistant for Power Grid Operation consists of several specialized components, each handling a specific aspect of grid management. Understanding these components is crucial to appreciating how the system tackles the complex challenges of power grid operation.

PowerGridAgent (Main Controller)

The PowerGridAgent acts as the central orchestrator, coordinating all other components and making final decisions. It evaluates the grid state based on key indicators such as maximum load ratio (rho) and calls the appropriate specialized agents based on current conditions.

            Key Responsibilities:
            Evaluating grid state using rho values (load ratios)
Managing action spaces for N-1 (contingency) and overload scenarios
Coordinating specialized agents based on grid conditions
Making final decisions on which actions to take
Handling reconnection of disconnected lines

        

The main agent uses two key thresholds to determine grid state:

rho_danger (typically 0.99): When exceeded, the grid is in an overload state requiring immediate intervention
rho_safe (typically 0.9): When below this value, the grid is considered safe enough to revert to original topology

AgentTopology

The AgentTopology specializes in finding the optimal grid topology configurations to alleviate overloads. It can search through possible substation configurations to find actions that reduce line loads below critical thresholds.

Changing grid topology (the configuration of how substations connect components) is one of the most effective ways to manage power flows without requiring expensive redispatching of generation. However, the search space is enormous, with billions of possible configurations.

The AgentTopology implements three different strategies for handling zones within the grid:

SingleAgentStrategy: Treats the entire grid as a single zone
MultiAgentIndependentStrategy: Each zone acts independently
MultiAgentDependentStrategy: Zones coordinate actions based on priority

AgentReconnection

When power lines get disconnected due to failures or protective actions, the AgentReconnection is responsible for safely bringing them back online. This component ensures that reconnections don't create new overloads in the process.

It supports two modes of operation:

Area-based reconnection: Considers lines within specific geographical areas
Global reconnection: Evaluates all disconnected lines without area constraints

For each potential line reconnection, the agent simulates the resulting grid state and selects the option that minimizes the maximum line load ratio (rho). This ensures that reconnections improve grid resilience without creating new problems.

AgentRecoverTopo

The AgentRecoverTopo focuses on returning the grid to its original configuration when conditions allow. Operating in non-standard topologies for extended periods can increase maintenance needs and operational complexity.

This agent:

Identifies when grid conditions are safe enough to revert to normal topology
Assesses which substations can be safely reconfigured
Checks for the legality of actions (considering equipment cooldown periods)
Validates that reverting won't cause new overloads

DispatcherAgent

When topology changes alone cannot resolve overloads, the DispatcherAgent steps in to optimize power generation, energy storage usage, and renewable energy curtailment through convex optimization techniques.

Dispatcher Functions	Description
Redispatching	Adjusting conventional generator outputs to alleviate line overloads
Storage Control	Managing charge/discharge of energy storage systems to balance grid loads
Curtailment	Reducing renewable energy production when necessary to maintain grid stability
DC Power Flow Optimization	Using convex optimization (CVXPY) to calculate optimal power flows

The DispatcherAgent uses a sophisticated mathematical model to minimize a cost function that considers:

Line thermal limits
Generator capabilities and constraints
Energy storage limits and state of charge
Penalty factors for different interventions (with curtailment being most expensive)

Machine Learning for Grid Optimization

Imitation Learning: Rapid Top-k Action Recommendation

One of the main contributions of this research is using imitation learning to quickly predict potential actions from an existing knowledge base, significantly reducing the search space and decision-making time.

The system employs two specialized imitation learning models:

GraphTransformerModel for N-1 Scenarios

The GraphTransformerModel is specifically designed to handle N-1 contingency scenarios (where a single component has failed). It uses a graph-based transformer architecture that naturally captures the topology of the power grid.

As implemented in the codebase, the GraphTransformerModel processes the power grid as follows:

Node features (dimension: 8) include voltage, power generation/consumption data
Edge features (dimension: 26) include impedance, thermal limits, and current flow
The model uses a transformer architecture with self-attention mechanisms (with configurable depth=3)
Feature encoding is performed through dedicated linear layers for both nodes and edges

The model processes grid observations through:

The encode_features() method transforms raw node and edge features into high-dimensional representations
These features are passed to the GraphTransformer encoder which applies multiple transformer layers
The resulting node embeddings are pooled using scatter_mean to produce a graph-level representation
Finally, prediction heads estimate the likelihood of different actions being optimal

PowerGridModel for Overload Scenarios

For overload situations, the system uses a specialized PowerGridModel that employs a multi-layer GNN architecture with TransformerConv layers. As implemented in the code, this model features:

Three TransformerConv layers with hidden units=256 and num_heads=8
ReLU activation functions and dropout (p=0.5) for regularization
Global mean pooling to aggregate node features into a graph representation
A final fully-connected layer for classification

This architecture has proven particularly effective at identifying actions that can quickly reduce overloads in critical situations, as evidenced by the performance metrics in the evaluation section.

Machine Learning Challenges in Power Grids

Applying machine learning to power grids presents unique challenges:

Power grids are highly complex dynamic systems with strict physical constraints
The action space is enormous, with many invalid or unsafe actions
Rare but critical events (blackouts) must be handled correctly despite limited training examples
Decisions must be interpretable for operator trust and regulatory compliance

The imitation learning approach addresses these challenges by:

Learning from expert demonstrations rather than exploring randomly
Predicting only the most promising actions (top-k) to be evaluated further
Combining machine learning predictions with physics-based simulations
Maintaining human oversight and intervention capabilities

Key Innovation: Combining Machine Learning with Physics-Based Models

The system's strength comes from combining the speed of machine learning predictions with the accuracy of physics-based simulations. The ML models quickly narrow down the search space, while the simulation-based evaluation ensures safety and optimality.

Decision Making Process

The AI-Assistant follows a sophisticated decision-making process that adapts to different grid conditions. Understanding this process reveals how the system integrates its various components to maintain grid stability.

sequenceDiagram actor Environment participant PowerGridAgent participant AgentReconnection participant Imitation participant AgentTopology participant DispatcherAgent participant AgentRecoverTopo Environment->>PowerGridAgent: observation alt current_step == 0 PowerGridAgent->>PowerGridAgent: _reset_dispatcher() end PowerGridAgent->>PowerGridAgent: _update_prev_por_error() PowerGridAgent->>PowerGridAgent: action = action_space() PowerGridAgent->>PowerGridAgent: _simulate_initial_action() PowerGridAgent->>AgentReconnection: act(observation, action) AgentReconnection->>AgentReconnection: recon_line_area() or reco_line() AgentReconnection-->>PowerGridAgent: updated action PowerGridAgent->>PowerGridAgent: Calculate max_rho alt max_rho > rho_danger alt imitation is enabled PowerGridAgent->>Imitation: _load_topk_actions(observation, topk) Imitation->>Imitation: predict_from_obs() Imitation-->>PowerGridAgent: action_space_n1, action_space_overload end PowerGridAgent->>AgentTopology: get_topology_action(observation, action, action_space_n1, action_space_overload) AgentTopology->>AgentTopology: recover_reference_topology() AgentTopology->>AgentTopology: change_substation_topology() AgentTopology-->>PowerGridAgent: topo_action, topo_obs, etc. alt topo_action found PowerGridAgent->>PowerGridAgent: action += topo_action else no topo_action and dispatching enabled PowerGridAgent->>DispatcherAgent: _update_storage_power_obs() PowerGridAgent->>DispatcherAgent: update_parameters() PowerGridAgent->>DispatcherAgent: act(observation, action) DispatcherAgent->>DispatcherAgent: compute_optimum_unsafe() DispatcherAgent->>DispatcherAgent: to_grid2op() DispatcherAgent-->>PowerGridAgent: updated action end else max_rho < rho_safe PowerGridAgent->>AgentRecoverTopo: act(observation) AgentRecoverTopo->>AgentRecoverTopo: revert_topo() AgentRecoverTopo-->>PowerGridAgent: action to recover topology else normal state PowerGridAgent->>PowerGridAgent: action += do_nothing end PowerGridAgent-->>Environment: final action

The decision process follows these key steps:

1. Initial Assessment

When the system receives a new observation from the environment:

The dispatcher is reset on the first step
Previous error values for redispatch decisions are updated
An initial "do nothing" action is created and simulated

2. Line Reconnection Check

Before addressing other issues, the system checks for disconnected lines that can be safely reconnected:

The AgentReconnection evaluates all disconnected lines
It simulates reconnecting each line to assess the impact
If safe reconnections are found, they are added to the action

3. Grid State Evaluation

The system evaluates the maximum line load ratio (rho) to determine the grid state:

Overload State (rho > rho_danger): Requires immediate intervention
Safe State (rho < rho_safe): Allows recovery of original topology
Normal State (rho_safe ≤ rho ≤ rho_danger): Maintained with minimal intervention

4. Action Selection Based on Grid State

Grid State	First Response	Fallback Strategy	Expected Outcome
Overload	Find optimal topology action	Apply dispatching if no topology solution	Reduce max_rho below danger threshold
Safe	Recover original topology	Maintain current state if recovery unsafe	Return to standard operations when possible
Normal	Do nothing	Monitor for changes	Maintain stable operation

5. Overload Handling Process

In overload situations, the system follows a sophisticated approach:

If imitation learning is enabled, the system uses machine learning models to predict the most promising actions.
The AgentTopology evaluates these actions to find the best topology change.
If a suitable topology action is found, it is applied.
If no topology solution is found and dispatching is enabled, the DispatcherAgent calculates optimal redispatching, storage, and curtailment actions.

"The strength of our approach lies in its adaptive nature. By combining multiple specialized agents with machine learning, we can rapidly respond to changing grid conditions while maintaining stability and efficiency."

Performance Evaluation on L2RPN2023 Dataset

The AI-Assistant for Power Grid Operation has been evaluated using the dataset from the L2RPN2023 "The Paris Region AI Challenge for Energy Transition." This evaluation provides concrete performance metrics for different configurations of the system.

Multi-Agent Strategy Comparison

For our evaluation, we used the L2RPN 2023 dataset consisting of 208 scenarios, with each scenario representing one week of grid operation and each step corresponding to 5 minutes of operational time. The Imitation Learning model was configured to predict the top-20 actions for each situation (N0 overload and N-1 attacked line scenarios). The results below compare three different coordination strategies:

Performance Metric	Multi-Agent Independent	Multi-Agent Sequential	Single-Agent
Overall Score	57.65	61.33	60.80
Operational Score	58.93	61.50	61.29
NRES Score	90.14	88.56	88.65
Assistant Score	35.07	44.61	42.94
Evaluation Duration	14,986 seconds	17,219 seconds	17,349 seconds
Maximum Decision Time	6.72s	6.99s	7.23s
99th Percentile Decision Time	1.38s	1.76s	1.78s
95th Percentile Decision Time	0.057s	0.055s	0.055s
Average Simulations Per Step	58.32	69.69	68.84
Successful Scenarios	135	150	149
Mean Steps Completed	1,627.62	1,685.17	1,680.62

Strategic Value and Implementation Roadmap

From a product strategy perspective, the AI-Assistant for Power Grid Operation represents not just a technical solution but a transformational approach to grid management that offers significant business value:

Business Impact Assessment

The implementation of this system could deliver value across multiple dimensions:

Operational Efficiency: Reduced need for manual analysis of complex grid scenarios
Risk Mitigation: Lower probability of cascading failures through improved N-1 analysis
Renewable Integration: Enhanced ability to accommodate variable generation sources
Resource Optimization: More efficient use of generation, storage, and transmission assets
Knowledge Preservation: Capture of expert knowledge in the imitation learning models

Implementation Pathway

A phased approach to implementation would maximize value while managing risk:

Phase 1: Shadow Mode Deployment - Deploy the system as an advisory tool that runs alongside existing operations but has no direct control authority, allowing for performance validation in real conditions.
Phase 2: Limited-Scope Integration - Integrate the recommendation engine with existing SCADA/EMS systems for specific use cases (e.g., day-ahead planning).
Phase 3: Expanded Functionality - Extend to additional use cases, including real-time contingency analysis and post-disturbance recovery.
Phase 4: Continuous Learning - Implement mechanisms for the system to learn from operator decisions and outcomes over time.

Success Factors and Organizational Considerations

Technical excellence alone will not ensure successful adoption. Key non-technical factors include:

Change Management: Structured approach to operator training and workflow integration
Cross-Functional Collaboration: Partnership between IT, OT, and operational teams
Regulatory Compliance: Ensuring alignment with grid codes and reliability standards
Metrics and Evaluation: Clear KPIs for measuring system impact and value
Feedback Mechanisms: Processes for continuous improvement based on operational experience

By addressing both the technical and organizational dimensions of implementation, utilities can maximize the value of AI-assisted grid management while managing the risks inherent in adopting new operational technology.

Practical Implementation Considerations

When transitioning from the L2RPN competition environment to real-world power grid operations, several important distinctions must be considered:

L2RPN Challenge Environment	Real-world Grid Operation Application
Fully autonomous system operation	Human-in-the-loop decision support system
Evaluation based on predefined metrics	Operator selection from recommended actions
Complete system optimization	Focus on prediction and simulation
Simplified contingency handling	Complex N-1 analysis and day-ahead planning

In a practical implementation, the system would likely be deployed as a decision support tool rather than a fully autonomous controller. Based on the architecture described in this paper, such a tool could:

Provide action recommendations: Use the Imitation Learning models to identify promising actions based on current grid conditions
Present simulation results: Use Grid2Op to simulate the outcomes of recommended actions
Support operator workflows: Integrate with existing SCADA and EMS systems
Enable contingency analysis: Assist with N-1 security assessments
Facilitate day-ahead planning: Support operators in planning future grid configurations

The primary advantage of this approach would be reducing the cognitive load on operators during complex grid events, while still ensuring human oversight of critical decisions. Integration with existing systems would need to be carefully designed to ensure seamless operation.

Conclusion

This research has successfully developed a modular multi-agent system for power grid management, combined with imitation learning to support rapid decision-making. Performance evaluation on the L2RPN 2023 dataset demonstrates the effectiveness of this approach, particularly the Multi-Agent Sequential coordination strategy.

The key contributions of this work are twofold: (1) A flexible, modular agent architecture that separates concerns between topology management, reconnection, recovery, and dispatching; and (2) An effective application of imitation learning that significantly speeds up the action selection process while maintaining high-quality outcomes.

For practical applications, these techniques can be integrated into operator decision support tools, providing recommended actions with detailed simulation results to assist with both daily operations and contingency management.

Future Research Directions

Despite its impressive capabilities, there are several promising directions for further development:

🧠

Reinforcement Learning Integration

Extending the system with reinforcement learning capabilities to optimize for long-term objectives rather than just immediate response.

🌐

Multi-Area Coordination

Enhancing coordination between neighboring grid areas to optimize power flows across regional boundaries.

🔍

Explainable AI Techniques

Developing better explanations for system decisions to build operator trust and support regulatory compliance.

Practical Implementation Considerations

Implementing the AI-Assistant in real-world environments requires addressing several practical considerations:

Integration with SCADA Systems: Ensuring seamless communication with existing grid monitoring and control infrastructure
Operator Training: Developing training programs for grid operators to effectively work with AI-assisted decision support
Regulatory Compliance: Ensuring the system meets relevant regulatory requirements for grid operations
Fallback Mechanisms: Implementing robust fallback strategies in case of system failures

Conclusion

The AI-Assistant for Power Grid Operation represents a significant advancement in applying AI to critical infrastructure management. By combining specialized agents, machine learning, and physics-based simulations, it achieves a balance of speed, adaptability, and reliability that is essential for modern power grid operations.

As power systems continue to evolve with increasing renewable penetration and distributed resources, such intelligent management systems will become indispensable for maintaining grid stability while maximizing efficiency and sustainability.

References

Marot, A., et al. (2021). "Learning to Run a Power Network Challenge." arXiv:2103.03104
L2RPN (Learning to Run a Power Network) Competition. https://l2rpn.chalearn.org/
Grid2Op Framework Documentation. https://grid2op.readthedocs.io/
Donnot, B., et al. (2019). "Introducing machine learning for power system operation support." IEEE Transactions on Smart Grid