Using Neural Networks for Predicting Solutions to Optimal Power Flow

In a previous blog post, we discussed the fundamental concepts of the optimal power flow (OPF). In their principal form, AC-OPFs are non-linear and non-convex optimization problems that are in general expensive to solve. Due to their large-scale size, solving even their linearized approximation (DC-OPF) is a challenging optimization task.

On the other hand, integration of renewable resources increased uncertainty in grid conditions that requires electricity grid operators to solve OPF near real-time in order to have the most accurate state of the system. In recent years, there has been intense research in machine learning (ML) to shift the computational effort away from real-time to offline training, providing an almost instant prediction. This blog post focuses on a specific set of machine learning approaches that applies neural networks (NNs) for predicting (directly or indirectly) solutions to OPF.

General form of OPF

OPF problems can be expressed by the following concise form of mathematical programming: \begin{equation} \begin{aligned} & \min \limits_{y}\ f(x, y) \\ & \mathrm{s.\ t.} \ \ c_{i}^{\mathrm{E}}(x, y) = 0 \quad i = 1, \dots, n \\ & \quad \; \; \; \; \; c_{j}^{\mathrm{I}}(x, y) \ge 0 \quad j = 1, \dots, m \\ \end{aligned} \label{opt} \end{equation} where $x$ is the vector of grid parameters and $y$ is the vector of optimization variables, $f(x, y)$ is the objective (or cost) function to minimize, subject to equality constraints $c_{i}^{\mathrm{E}}(x, y) \in \mathcal{C}^{\mathrm{E}}$ and inequality constraints $c_{j}^{\mathrm{I}}(x, y) \in \mathcal{C}^{\mathrm{I}}$, where for convenience we introduced $\mathcal{C}^{\mathrm{E}}$ and $\mathcal{C}^{\mathrm{I}}$ that denote the sets of equality and inequality constraints with corresponding cardinalities $n = \lvert \mathcal{C}^{\mathrm{E}} \rvert$ and $m = \lvert \mathcal{C}^{\mathrm{I}} \rvert$, respectively. We emphasize that the objective function is optimized solely with respect to the optimization variables and grid parameters only parameterize the objective and constraint functions. For example, for a simple economic dispatch (ED) problem $x$ includes voltage magnitudes and active powers of generators, $y$ is a vector of active and reactive power components of loads, the objective function is the cost of the total real power generation, equality constraints include the power balance and power flow equations, while inequality constraints impose lower and upper bounds on certain quantities.

The most widely used approach to solve the above optimization problem (that can be non-linear, non-convex and even mixed-integer) is the interior point method [Boyd04][Nocedal06][Wachter06]. Interior-point (or barrier) method is a highly efficient algorithm that looks for the solution by an iterative manner. However, it requires the computation of the Hessian (second derivatives) of the Lagrangian of the system with respect to the optimization variables at each iteration step. Due to the non-convex nature of the power flow equations appearing in the equality constraints the method can be prohibitively slow for large-scale systems.

The formulation of eq. $\ref{opt}$ gives us the possibility to look at OPF as an operator that maps the grid parameters ($x$) to the optimal value of the optimization variables ($y^{*}$). In a more general sense, the objective and constraint functions are also arguments of this operator. Also, in this discussion we assume that the exact solution of the OPF problem is always provided by the interior-point method. Therefore, the operator is parameterized implicitly by the starting value of the optimization variables ($y^{0}$). The actual value of $y^{0}$ can significantly affect the convergence rate of the interior-point method and so the total computational time, and for non-convex formulations, where multiple local minima might exist, even the optimal point can differ. The general form of the OPF operator can be written as: \begin{equation} \Phi: \Omega \to \mathbb{R}^{n_{y}}: \quad \Phi\left( x, y^{0}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = y^{*}, \label{opf-operator} \end{equation} where $\Omega$ is an abstract set within the values of the operator arguments are allowed to change and $n_{y}$ is the dimension of the optimization variables. We note that a special case of the general form is the DC-OPF operator, whose mathematical properties have been thoroughly investigated in a recent work [Zhou20]. In many recurring problems most of the arguments of the OPF operator are fixed and only (some of) the grid parameters vary. For these cases we introduce a simpler notation, the OPF function: \begin{equation} F_{\Phi}: \mathbb{R}^{n_{x}} \to \mathbb{R}^{n_{y}}: \quad F_{\Phi}(x) = y^{*}, \label{opf-function} \end{equation} where $n_{x}$ and $n_{y}$ are the dimensions of grid parameter and optimization variables, respectively. We also introduce the feasible set of all feasible points of the OPF: $\mathcal{F}_{\Phi}$. Apparently, $y^{*} \in \mathcal{F}_{\Phi}$ and depending on the grid parameters the problem may be infeasible: $\mathcal{F}_{\Phi} = \emptyset$.

The daily task of electricity grid operators is to provide solutions for the constantly changing grid parameters $x_{t}$. The standard OPF approach would be to compute $F_{\Phi}(x_{t}) = y_{t}^{*}$ using default values of the other arguments of $\Phi$. However, in practice, this is used rather rarely as we usually possess additional information about the grid that can be used to obtain the solution more efficiently. For instance, it is reasonable to assume that for similar grid parameter vectors the corresponding optimal points are also close to each other. If one of these problems is solved then we can use its optimal point as the starting value for the optimization variables of the other problem, which then can converge significantly faster compared to default initial values. This strategy is called warm-start and might be useful for consecutive problems, i.e. $\Phi \left( x_{t}, y_{t-1}^{*}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = y_{t}^{*}$. Another way to reduce the computational time of solving OPF is to reduce the problem size. At the solution, not all of the constraints are actually binding and there is a large number of non-binding inequality constraints that can be removed from the mathematical problem without changing the optimal point, i.e. $\Phi \left( x_{t}, y^{0}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{A}_{t} \right) = y_{t}^{*}$, where $\mathcal{A}_{t}$ is the set of all binding inequality constraints of the actual problem. The reduced problem is called reduced OPF and it is especially useful for DC-OPF problems. The three main approaches are depicted in Figure 1.

Figure 1. Main approaches of solving OPF.

NN based approaches for predicting OPF solutions

The ML methods we discuss here apply either an estimator function ($\hat{F}(x_{t}) = \hat{y}_{t}^{*}$) or an estimator operator ($\hat{\Phi}(x_{t}) = \hat{y}_{t}^{*}$) to provide a prediction to the optimal point of OPF based on the grid parameters.

We can categorize these methods in different ways. For instance, based on the estimator type they use we can distinguish between end-to-end (aka direct) and hybrid (aka indirect) approaches. End-to-end methods use a NN as an estimator function and map the grid parameters directly to the optimal point of OPF. Hybrid or indirect methods apply an estimator operator that includes two steps: in the first step, a NN maps the grid parameters to some quantities, which are then used in the second step as inputs to some optimization problem resulting in the predicted (or even exact) optimal point of the original OPF problem.

We can also group techniques based on the NN predictor type: NN can be used either as regressor or classifier.

Regression

End-to-end methods

End-to-end methods [Guha19] apply NNs as regressors mapping the grid parameters (as inputs) directly to the optimal point of OPF (as outputs): \begin{equation} \hat{F}(x_{t}) = \mathrm{NN}_{\theta}^{\mathrm{r}}(x_{t}) = \hat{y}_{t}^{*} \end{equation} where the subscript $\theta$ denotes all parameters (weights, biases, etc.) of the NN and the superscript $\mathrm{r}$ indicates that NN is used as a regressor. We note that once the prediction $\hat{y}_{t}^{*}$ is computed other dependent quantities (e.g. power flows) can be easily obtained by solving the power flow problem [Guha19] [Zamzam19]—given the prediction is a feasible point.

As OPF is a constrained optimization problem, the optimal point is not necessarily a smooth function of the grid parameters. Also, the number of congestion regimes—i.e. the number of distinct sets of active (binding) constraints—increases exponentially with grid size. Therefore, in order to obtain sufficiently high coverage and accuracy of the model a substantial amount of training data is required.

Warm-start methods

For real power grids, the available training data is rather limited compared to the system size. As a consequence, it is challenging to provide predictions by end-to-end methods that are optimal. Even worse, there is no guarantee that the predicted optimal point is a feasible point (i.e. satisfies all constraints), and violation of important constraints could lead to severe security issues for the grid.

Nevertheless, the predicted optimal point can be utilized as a starting point to initialize an interior-point method. Interior-point methods (and actually most of the relevant optimization algorithms for OPF) can be started from a specific value of the optimization variables called warm-start. The idea of the warm-start approaches is to use a hybrid model that first applies a NN for predicting a set-point ($\hat{y}_{t}^{0} = \mathrm{NN}_{\theta}^{\mathrm{r}}(x_{t})$), from which a warm-start optimization can be performed [Baker19][Jamei19]: \begin{equation} \hat{\Phi}(x_{t}) = \Phi \left( x_{t}, \hat{y}_{t}^{0}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = \Phi \left( x_{t}, \mathrm{NN}_{\theta}^{\mathrm{r}}(x_{t}), f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = y_{t}^{*} \end{equation} The flowchart of the approach is shown in Figure 2.

Figure 2. Flowchart of the warm-start method (yellow panel) in combination with an NN regressor (purple panel, default arguments of OPF operator are omitted for clarity).

It is important to note that warm-start methods provide an exact (locally) optimal point as it is eventually obtained from an optimization problem identical to the original one. Predicting an accurate set-point can significantly reduce the number of iterations (and so the computational cost) to the optimal point compared to the default heuristics of the optimization method.

Although the concept of warm-start interior-point techniques is pretty attractive, there are some practical difficulties as well that we discuss below. First, because only primal variables are initialized, the duals still need to converge, as interior-point methods require a minimum number of iterations even if the primals are set to their optimal values. Trying to predict the duals with NN as well would make the task even more challenging. Second, if the initial values of primals are far from optimal (i.e. inaccurate prediction of set-points), the optimization can lead to a different local minimum. Finally, even if the predicted values are close to the optimal solution, as there are no guarantees on feasibility of the starting point, this could locate in a region resulting in substantially longer solve times or even convergence failure.

To demonstrate the first of the above issues, let us consider a warm-start model in combination with a "perfect" regressior, i.e. a hypothetical regressor that provides the exact optimal point: $\overline{\mathrm{NN}}_{\theta}^{\mathrm{r}}(x_{t}) = y_{t}^{*}$. The "perfect" warm-start model then can be written as: \begin{equation} \hat{\Phi}^{\mathrm{p}}(x_{t}) = \Phi \left( x_{t}, \overline{\mathrm{NN}}_{\theta}^{\mathrm{r}}(x_{t}), f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = \Phi \left( x_{t}, y_{t}^{*}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = y_{t}^{*} \end{equation} This is actually an ideal set-point as the vector of primals is not only feasible but also an optimal point. However, due to the non-optimal duals, some optimization is still required. Therefore, this hypothetical warm-start model can set an empirical upper limit of the computational gain of any warm-start model compared to the original optimization problem, where we define the gain as: \begin{equation} \mathrm{gain}(t_{\mathrm{ML}}) = 100\frac{t_{\mathrm{f}} - t_{\mathrm{ML}}}{t_{\mathrm{f}}}, \end{equation} where $t_{\mathrm{f}}$ and $t_{\mathrm{ML}}$ are the computational times of the original full OPF problem and the specific machine learning based approach, respectively. In Table 1 we present the maximum achievable gain of several synthetic DC- and AC-ED cases using the Ipopt [Wachter06] solver. The averages along with two side 95\% confidence intervals are based on 1000 samples with varying grid parameters [Robson20], As it can be seen, the gain is higher for DC formulations compared to the corresponding AC ones but in general it is rather moderate for the investigated grids (for a single case it is even negative). Also, for both formulations, the gain seems rather system-specific and shows no correlation with the system size. Nevertheless, warm-start methods might be still useful for some large-scale unit commitment or security constrained OPFs.

Table 1. Maximum achievable gain of warm-start with primal variables ("perfect" regression) for several synthetic DC- and AC-ED girds using 1000 samples and the Ipopt solver [Robson20].
Case DC-ED Gain AC-ED Gain
24-ieee-rts $30.9 \pm 0.7$ $27.0 \pm 0.6$
30-ieee $33.9 \pm 0.5$ $7.9 \pm 0.8$
39-epri $52.7 \pm 0.4$ $46.0 \pm 0.6$
57-ieee $27.1 \pm 0.6$ $21.4 \pm 0.7$
73-ieee-rts $29.7 \pm 0.3$ $33.5 \pm 0.7$
118-ieee $22.4 \pm 0.5$ $15.8 \pm 0.6$
162-ieee-dtc $55.4 \pm 0.4$ $40.4 \pm 1.0$
300-ieee $44.1 \pm 0.4$ $37.2 \pm 1.4$
588-sdet $28.5 \pm 0.5$ $-18.3 \pm 1.0$
1354-pegase $47.6 \pm 0.4$ $1.6 \pm 1.3$
2853-sdet $34.8 \pm 0.3$ $9.9 \pm 0.5$

Classification

Predicting active sets

An alternative hybrid approach using an NN classifier ($\mathrm{NN}_{\theta}^{\mathrm{c}}$) leverages the observation that only a fraction of all constraints is actually binding at the optimum [Ng18][Deka19], so a reduced OPF problem can be formulated by keeping only the binding constraints. Since this reduced problem still has the same objective function as the original one, the solution should be equivalent to that of the original full problem: $\Phi \left( x_{t}, y^{0}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{A}_{t} \right) = \Phi \left( x_{t}, y^{0}, f, \mathcal{C}^{\mathrm{E}}, \mathcal{C}^{\mathrm{I}} \right) = y_{t}^{*}$, where $\mathcal{A}_{t} \subseteq \mathcal{C}^{\mathrm{I}}$ is the active or binding subset of the inequality constraints (also note that $\mathcal{C}^{\mathrm{E}} \cup \mathcal{A}_{t}$ contains all active constraints defining the specific congestion regime). This suggests a classification formulation in which the grid parameters are used to predict the active set: \begin{equation} \hat{\Phi}(x_{t}) = \Phi \left( x_{t}, y^{0}, f, \mathcal{C}^{\mathrm{E}}, \hat{\mathcal{A}}_{t} \right) = \Phi \left( x_{t}, y^{0}, f, \mathcal{C}^{\mathrm{E}}, \mathrm{NN}_{\theta}^{\mathrm{c}}(x_{t}) \right) = \hat{y}_{t} \end{equation} Technically, the NN can be used in two ways to predict the active set. One approach is to identify all distinct active sets in the training data and train a multiclass classifier that maps the input to the corresponding active set [Deka19]. Since the number of active sets increases exponentially with system size, for larger grids it might be better to predict the binding status of each non-trivial constraint by using a binary multi-label classifier [Robson20].

Iterative feasibility test

Imperfect prediction of the binding status of constraints (or active set) can lead to similar security issues as imperfect regression. This is especially important for false negative predictions, i.e. predicting an actually binding constraint as non-binding. As there may be violated constraints not included in the reduced model, one can use the iterative feasibility test to ensure convergence to an optimal point of the full problem [Pineda20][Robson20]. The procedure has been widely used by power grid operators and it includes the following steps in combination with a classifier:

  1. An initial active set of inequality constraints ($\hat{\mathcal{A}}_{t}^{(1)}$) is proposed by the classifier and a solution of the reduced problem is obtained.
  2. In each feasibility iteration, $k = §1, \dots K$, the optimal point of the actual reduced problem ($\hat{y}_{t}^{(k)}$) is validated against the constraints $\mathcal{C}^{\mathrm{I}}$ of the original full formulation.
  3. At each step $k$, the violated constraints $\mathcal{N}_{t}^{(k)} \subseteq \mathcal{C}^{\mathrm{I}} \setminus \hat{\mathcal{A}}_{t}^{(k)}$ are added to the set of considered inequality constraints to form the active set of the next iteration: $\hat{\mathcal{A}}_{t}^{(k+1)} = \hat{\mathcal{A}}_{t}^{(k)} \cup \mathcal{N}_{t}^{(k)}$.
  4. The procedure repeats until no violations are found (i.e. $\mathcal{N}_{t}^{(k)} = \emptyset$), and the optimal point satisfies all original constraints $\mathcal{C}^{\mathrm{I}}$. At this point, we have found the optimal point to the full problem ($\hat{y}_{t}^{(k)} = y_{t}^{*}$).
The flowchart of the iterative feasibility test in combination with NN is presented in Figure 3.

Figure 3. Flowchart of the iterative feasibility test method (yellow panel) in combination with an NN classifier (purple panel, default arguments of OPF operator are omitted for clarity).

As the reduced OPF is much cheaper to solve than the full problem, this procedure (if converged in few iterations) can be in theory very efficient.

Similarly to the warm-start approach we can compute the empirical maximum achievable gain of the iterative feasibility test by setting $\hat{\mathcal{A}}_{t}^{(1)} = \mathcal{A}_{t}$, i.e. using the actual active set (note that in this case the feasibility test converges in the first step). Again, the same synthetic cases and formulations were investigated using 1000 samples with varying grid parameters [Robson20] and the results are collected in Table 2. In general, the gain of the perfect classifier of these cases is slightly higher than that of the warm-start approach with a perfect regressor. For the DC formulation, there is a moderate correlation between the gain and the system size. The reason is that in DC formulation all constraints are linear and the gain is primarily governed by the ratio of the number of active (i.e. equality plus binding inequality constraints) and the number of all constraints of the full OPF problem. However, unfortunately this is not true for the AC formulation, where the computationally most expensive part is the calculation of the first and second derivatives of the non-convex equality constraints that are always binding. As a result, the AC-ED gains are in general significantly lower.

Table 2. Maximum achievable gain of the iterative feasibility test ("perfect" classification) for several synthetic DC- and AC-ED girds using 1000 samples and the Ipopt solver [Robson20].
Case DC-ED Gain AC-ED Gain
24-ieee-rts $29.9 \pm 0.7$ $25.2 \pm 0.6$
30-ieee $28.3 \pm 0.5$ $32.0 \pm 0.9$
39-epri $28.0 \pm 0.4$ $29.7 \pm 0.6$
57-ieee $38.8 \pm 0.3$ $30.6 \pm 0.7$
73-ieee-rts $36.8 \pm 0.3$ $27.6 \pm 0.5$
118-ieee $47.6 \pm 0.4$ $31.1 \pm 0.4$
162-ieee-dtc $47.3 \pm 0.3$ $21.9 \pm 0.7$
300-ieee $45.7 \pm 0.3$ $17.4 \pm 0.6$
588-sdet $57.0 \pm 0.3$ $12.2 \pm 0.8$
1354-pegase $47.0 \pm 0.4$ $35.1 \pm 0.4$
2853-sdet $54.6 \pm 0.2$ $27.4 \pm 0.3$

Technical details of models

In this section, we provide a high level overview of the most general technical details used in the field.

Systems and samples

As discussed earlier, both the regression and classification approaches require a relatively large number of training samples, that varies between few thousands and hundreds of thousands depending on the OPF type, system size, and varied grid parameters. Therefore, most of the works use synthetic grids of the Power Grid Library [Babaeinejadsarookolaee19] for which the generation of samples can be obtained straightforwardly. The size of the investigated systems usually varies between few tens to few thousands of buses and both DC- and AC-OPF problems can be investigated for economic dispatch, security constrained, unit commitment and even security constrained unit commitment problems. The input grid parameters are primarily the active and reactive power loads, although a much wider selection of varied grid parameters is also possible. The standard technique is to generate feasible samples by varying the grid parameters by a deviation of $3-70\%$ of their default values and using multivariate uniform, normal, and truncated normal distributions.

Finally, we note that given the rapid increase of attention in the field it would be beneficial to have standard benchmark data sets in order to compare different models and approaches [Robson20].

Loss functions

For regression based approaches, the most basic loss function optimized with respect to the NN parameters is the (mean) squared error. In order to reduce possible violations of certain constraints, an additional penalty term can be added to this loss function.

For classification based methods, cross-entropy (multiclass classifier) or binary cross-entropy (multi-label classifier) functions can be applied with a possible regularization term.

NN architectures

Up to now, most of the NN models applied a fully connected NN (FCNN) from shallow to deep architectures. However, there have been recently attempts to take the grid topology also into account, and convolutional (CNN) and graph (GNN) neural networks have been also used for both regression and classification approaches. GNNs, which can use the graph of the grid explicitly, seemed particularly successful compared to FCNN and CNN architectures [Falconer20].

Table 3. Comparison of works using neural networks for predicting solutions to OPF. From each reference the the largest investigated system is shown with corresponding number of buses ($\lvert \mathcal{N} \rvert$). Dataset size and grid input types are also presented. For sample generation $\mathcal{U}$ and $\mathcal{TN}$ denote uniform and truncated normal distributions, respectively and their arguments are the minimum and maximum factors multiplying the default gird parameter value. FCNN, CNN and GNN denote fully connected, convolutional and graph neural networks, respectively. SE, MSE and CE indicate squared error, mean squared error and cross-entropy loss functions, respectively, and cvpt denotes constraint violation penalty term.
Ref. OPF System $\lvert \mathcal{N} \rvert$ Dataset Input Sampling NN Predictor Loss
[Guha19] AC-ED 118-ieee 118 813k loads $\mathcal{U}(0.9, 1.1)$ FCNN regressor MSE + cvpt
[Fioretto19] AC-ED 300-ieee 300 236k loads $\mathcal{U}(0.8, 1.2)$ FCNN regressor SE + cvpt
[Zamzam19] AC-ED 118-ieee 118 100k active load
reactive load
$\mathcal{TN}(0.3, 1.7)$
$\mathcal{U}(0.8, 1.0)$
FCNN regressor MSE
[Pan19] DC-SCED 300-ieee 300 55k load $\mathcal{U}(0.9, 1.1)$ FCNN regressor MSE + cvpt
[Owerko19] AC-ED 118-ieee 118 13k loads $\mathcal{U}(0.9, 1.1)$ FCNN
GNN
regressor MSE
[Deka19] DC-ED 24-ieee-rts 24 50k load $\mathcal{U}(0.97, 1.03)$ FCNN classifier CE
[Chatzos20] AC-ED France_Lyon 3411 10k loads $\mathcal{U}(0.8, 1.2)$ FCNN regressor SE + cvpt
[Pan20] AC-ED 30-ieee 30 12k loads $\mathcal{U}(0.8, 1.2)$ FCNN regressor SE + cvpt
[Venzke20] DC-ED 300-ieee 300 100k load $\mathcal{U}(0.4, 1.0)$ FCNN regressor MSE
[Robson20] DC-ED 1354-pegase 1354 10k load + 3 other params $\mathcal{U}(0.85, 1.15)$
$\mathcal{U}(0.9, 1.1)$
FCNN classifier CE
[Robson20] AC-ED 300-ieee 300 1k loads + 5 other params $\mathcal{U}(0.85, 1.15)$
$\mathcal{U}(0.9, 1.1)$
FCNN classifier CE
[Falconer20] AC-ED 300-ieee 300 10k loads + 5 other params $\mathcal{U}(0.85, 1.15)$
$\mathcal{U}(0.9, 1.1)$
FCNN
CNN
GNN
regressor MSE
CE

References

[Boyd04]: S. Boyd and L. Vandenberghe, "Convex Optimization", New York: Cambridge University Press, (2004).

[Nocedal06]: J. Nocedal and S. J. Wrigh, "Numerical Optimization", New York: Springer, (2006).

[Wachter06]: A. Wächter and L. Biegler, "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming", Math. Program. 106, pp. 25, (2006).

[Zhou20]: F. Zhou, J. Anderson and S. H. Low, "The Optimal Power Flow Operator: Theory and Computation", arXiv:1907.02219, (2020).

[Guha19]: G. Neel, Z. Wang and A. Majumdar, “Machine Learning for AC Optimal Power Flow”, In Proceedings of the 36th International Conference on Machine Learning Workshop, Long Beach, CA, USA, (2019).

[Fioretto19]: F. Fioretto, T. Mak and P. V. Hentenryck, “Predicting AC Optimal Power Flows: Combining Deep Learning and Lagrangian Dual Methods”, arXiv:1909.10461, (2019).

[Chatzos20]: M. Chatzos, F. Fioretto, T. W.K. Mak, P. V. Hentenryck, “High-Fidelity Machine Learning Approximations of Large-Scale Optimal Power Flow”, arXiv:2006.1635, (2020).

[Baker19]: K. Baker, "Learning Warm-Start Points For Ac Optimal Power Flow", IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Pittsburgh, PA, USA, pp. 1, (2019).

[Zamzam19]: A. Zamzam and K. Baker, "Learning Optimal Solutions for Extremely Fast AC Optimal Power Flow", arXiv:1910.01213, (2019).

[Pan19]: X. Pan, T. Zhao, M. Chen and S Zhang, "DeepOPF: A Deep Neural Network Approach for Security-Constrained DC Optimal Power Flow", arXiv:1910.14448, (2019).

[Pan20]: X. Pan, M. Chen, T. Zhao and S. H. Low, "DeepOPF: A Feasibility-Optimized Deep Neural Network Approach for AC Optimal Power Flow Problems", arXiv:2007.01002, (2020).

[Ng18]: Y. Ng, S. Misra, L. A. Roald and S. Backhaus, "Statistical Learning For DC Optimal Power Flow", arXiv:1801.07809, (2018).

[Deka19]: D. Deka and S. Misra, "Learning for DC-OPF: Classifying active sets using neural nets", arXiv:1902.05607, (2019).

[Jamei19]: M. Jamei, L. Mones, A. Robson, L. White, J. Requeima and C. Ududec, “Meta-Optimization of Optimal Power Flow”, in Proceedings of the 36th International Conference on Machine Learning Workshop, Long Beach, CA, USA, (2019.

[Pineda20]: S. Pineda, J. M. Morales and A. Jiménez-Cordero, "Data-Driven Screening of Network Constraints for Unit Commitment", IEEE Transactions on Power Systems, 35, pp. 3695, (2020).

[Robson20]: A. Robson, M. Jamei, C. Ududec and L. Mones, "Learning an Optimally Reduced Formulation of OPF through Meta-optimization", arXiv:1911.06784, (2020).

[Babaeinejadsarookolaee19]: S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin, C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, R. Huang, C. Josz, R. Korab, B. Lesieutre, J. Maeght, D. K. Molzahn, T. J. Overbye, P. Panciatici, B. Park, J. Snodgrass and R. Zimmerman, "The Power Grid Library for Benchmarking AC Optimal Power Flow Algorithms", arXiv:1908.02788, (2019).

[Chen20]: L. Chen and J. E. Tate, "Hot-Starting the Ac Power Flow with Convolutional Neural Networks", arXiv:2004.09342, (2020).

[Owerko19]: D. Owerko, F. Gama and A. Ribeiro, "Optimal Power Flow Using Graph Neural Networks", arXiv:1910.09658, (2019).

[Venzke20]: A. Venzke, G. Qu and S. Low and S. Chatzivasileiadis, "Learning Optimal Power Flow: Worst-Case Guarantees for Neural Networks", arXiv:2006.11029 (2020).

[Falconer20]: T. Falconer and L. Mones, "Deep learning architectures for inference of AC-OPF solutions", arXiv:2011.03352, (2020).