CN105391057B

CN105391057B - A kind of GPU threaded design methods that direction of energy Jacobi battle array calculates

Info

Publication number: CN105391057B
Application number: CN201510809809.3A
Authority: CN
Inventors: 贾育培; 周赣; 冯燕钧; 罗雅迪; ***; 林静怀; 张旭
Original assignee: State Grid Corp of China SGCC; Southeast University; China Electric Power Research Institute Co Ltd CEPRI; State Grid Fujian Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Southeast University; China Electric Power Research Institute Co Ltd CEPRI; State Grid Fujian Electric Power Co Ltd
Priority date: 2015-11-20
Filing date: 2015-11-20
Publication date: 2017-11-14
Anticipated expiration: 2035-11-20
Also published as: CN105391057A

Abstract

The invention discloses a kind of GPU threaded design methods that direction of energy Jacobi battle array calculates, it is characterised in that：Methods described includes：Input electric network data, pretreatment node admittance battle array Y；CPU distinguishes calculated sub-matrix nonzero element and node admittance battle array Y nonzero element position mapping tables；CPU distinguishes calculated sub-matrix nonzero element and Jacobian matrix nonzero element position mapping table；GPU interior joint injecting power kernel functions S calculates all node injecting powers；Jacobi submatrix calculates kernel function and distinguishes nonzero element in calculated sub-matrix and be stored in Jacobian matrix in GPU.The present invention is calculated in Jacobian matrix non-zero entry calculating process by submatrix to judge the branched structure of the affiliated submatrix region process of element when avoiding and directly being calculated using single kernel function, lifts execution efficiency；And by centralized calculation there is nonzero element in the submatrix of identical calculations formula to solve the unbalanced problem of threads load, lift parallel efficiency calculation.

Description

GPU thread design method for power flow Jacobian array calculation

Technical Field

The invention belongs to the field of high-performance computing application of power systems, and particularly relates to a GPU thread design method for power flow Jacobian array computing.

Background

Load flow calculation is one of the most widely, basic and important electrical operations in electrical power systems. In the research of the operation mode and the planning scheme of the power system, load flow calculation is needed to compare the feasibility, the reliability and the economy of the operation mode or the planning power supply scheme. Meanwhile, in order to monitor the operation state of the power system in real time, a large amount of rapid load flow calculation is also required. Therefore, when the system plans and designs and arranges the operation mode of the system, offline load flow calculation is adopted; in the real-time monitoring of the running state of the power system, on-line load flow calculation is adopted.

In the actual production process, both off-line power flow calculation and on-line power flow calculation have higher requirements on the calculation speed of the power flow. In the offline power flow related to planning design and operation mode arrangement, due to the complex conditions of equipment landing schemes and the like, the types of simulation operation are various, the power flow calculation amount is large, and the overall simulation duration is influenced by the single power flow calculation time; on-line tidal current calculation performed in the operation of the power system has high sensitivity to calculation time, and a tidal current calculation result needs to be given in real time, for example, in tidal current calculation of an expected accident and the influence of equipment quitting operation on static safety, the system needs to calculate a great amount of tidal current distribution under the expected accident and make an expected operation mode adjustment scheme in real time.

In traditional Newton Raphson method load flow calculation, the calculation time of the Jacobian matrix accounts for 30% of the total calculation time of the load flow, and the calculation speed of the Jacobian matrix influences the overall performance of a program. With the slow increase of the calculation speed of the CPU, the single load flow calculation time at the present stage reaches a bottleneck. At present, the acceleration method for load flow calculation mainly focuses on the coarse-grained acceleration of multiple load flows by using a cluster and a multi-core server, and the research on the internal operation acceleration of a single load flow in actual production is less.

A GPU is a many-core parallel processor, far exceeding the CPU in the number of processing units. The conventional GPU is only responsible for graphics rendering, and most of the processing is handed over to the CPU. The present GPU is already architected as a multi-core, multi-threaded, programmable processor with powerful computing power and very high memory bandwidth. Under a general computing model, the GPU works as a coprocessor of the CPU, and high-performance computing is completed through reasonable task allocation and decomposition.

The Jacobian matrix calculation has parallelism, the calculation of each nonzero element is independent, no dependency relationship exists, and the Jacobian matrix calculation can be naturally processed by parallel calculation and is suitable for GPU acceleration. Therefore, the calculation of the Jacobian matrix can be rapidly completed through reasonable scheduling between the CPU and the GPU, and domestic and foreign scholars have studied the Jacobian acceleration method of the GPU, but have not deeply optimized thread design, have simply studied the calculation thread design from the distribution of calculated amount, have not studied the thread calculation mode and the data index mode deeply, and cannot enable the program to fully exert the advantages of the GPU.

Therefore, it is desired to solve the above problems.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the defects of the prior art, the invention provides the GPU thread design method for calculating the Jacobian of the power flow, which can greatly reduce the calculation time of the Jacobian and can improve the calculation speed of the power flow.

The technical scheme is as follows: the invention provides a GPU thread design method for calculating a power flow Jacobian matrix based on a GPU, which utilizes a mapping relation to distribute calculation resources.

And (3) load flow calculation: the term of electromechanics refers to the calculation of the distribution of active power, reactive power and voltage in a power grid given the topology of the power system network, the parameters of the elements and the parameters of power generation and load.

And (3) parallel computing: compared with serial operation, the method is an algorithm capable of executing a plurality of instructions at one time, and aims to improve the calculation speed and solve large and complex calculation problems by enlarging the problem solving scale.

GPU: graphics processors (English: Graphic processing Unit, abbreviation: GPU).

Admittance matrix: a matrix describing the relationship between the voltage and the injected current at each node of the power network is established based on the equivalent admittance of the system components.

The invention discloses a GPU thread design method for calculating a power flow Jacobian, which comprises the following steps:

(1) inputting power grid data, and preprocessing a node admittance array Y;

(2) the CPU respectively calculates H, N, J, L the mapping relation table M of the non-zero elements of the four sub-matrixes and the non-zero elements of the node admittance array Y_H2Y、M_N2Y、M_J2Y、M_L2YThe specific calculation method comprises the following steps:

(2.1) scanning the node admittance array Y in COO format, selecting the elements corresponding to the calculation of H matrix and adding the elements into the set H_YGenerating a look-up table M_H2Y；

a) Scanning the row number i of each element in the COO format, and when it belongs to the active reactive PQ node or the active voltage PV node, noting that i ═ pqpv [ x [ ]]Wherein, pqpv is the index of PQ and PV nodes, and x is the index number; rescanning column number j, when j belongs to PQ or PV node, noting that j equals pqpv [ y]Selecting the element to add to the set H_Y；

b) Each time an element is added to the set H_YTime, record set H_YThe current number k of elements, and the position h of the element in the COO format of the matrix Y_k(ii) a Record i position x in PQ, PV node_kJ position y in PQ, PV node_k(ii) a Set H_YThe middle element is expressed as k, h_k，x_k，y_k}；

c) Generation of k → h_kMapping table M_H2Y；

(2.2) scanning the node admittance array Y in COO format, selecting the elements corresponding to the N matrix calculation and adding the elements into the set N_YGenerating a mapping table M_N2Y；

a) Scanning the row number i of each element in the COO format, and when it belongs to a PQ or PV node, noting i ═ pqpv [ x]Then, scan column number j, and when j belongs to PQ node, let j equal PQ [ y]Wherein PQ is the index of PQ node, y is the index number, and the element is selected and added into the set N_Y；

b) Each time an element is added to the set N_YTime, record set N_YThe current number k of elements, and the position n of the element in the COO format of the matrix Y_k(ii) a Record i position x in PQ, PV node_kJ position y in PQ node_k(ii) a Set N_YThe middle element is expressed as k, n_k，x_k，y_k}；

c) Generation of k → n_kMapping table M_N2Y；

(2.3) scanning the node admittance array Y in COO format, selecting the elements corresponding to the calculation of J matrix and adding the elements into the set M_YGenerating a mapping table M_J2Y；

a) Scanning the line number i of each element in the COO format, and when it belongs to the PQ node, noting that i ═ PQ [ x]And then scanning the column number j, and when j belongs to PQ or PV node, recording j as pqpv [ y ═]Selecting the element to join the set M_Y；

b) Each time an element is added to the set J_YTime, record set N_YThe current number k of elements, and the position m of the element in the COO format of the matrix Y_k(ii) a Record i position x in PQ node_kJ position y in PQ, PV node_k(ii) a Set J_YThe middle element is expressed as k, j_k，x_k，y_k}；

c) Generation k → j_kMapping table M_J2Y；

(2.4) scanning the node admittance array Y in COO format, selecting the elements corresponding to the calculation of the L matrix and adding the elements into the set L_YGenerating a look-up table M_L2Y；

a) Scanning the line number i of each element in the COO format, and when it belongs to the PQ node, noting that i ═ PQ [ x]Then, scan column number j, and when j belongs to PQ node, let j equal PQ [ y]Selecting the element to join the set L_Y；

b) Each time an element is added to the set L_YTime, record set L_YThe current number k of elements, and the position l of the element in the COO format of the matrix Y_k(ii) a Record i position x in PQ node_kJ position y in PQ node_k(ii) a Set L_YThe middle element is expressed as k, l_k，x_k，y_k}；

c) Generation k → l_kMapping table M_L2Y；

(3) The CPU respectively calculates H, N, J, L the mapping relation table M of the non-zero elements of the four sub-matrixes and the non-zero elements of the Jacobian matrix_H2Ja、M_N2Ja、M_J2Ja、M_L2JaThe specific calculation method comprises the following steps:

(3.1) H obtained according to step 2_Y、N_Y、J_Y、L_YX in (2)_k，y_kResulting in a sparse format of matrix H, N, J, L

(3.2) defining a set Cmp { x, y, NO, zone }, wherein x is a Jacobian matrix row number, y is a Jacobian matrix column number, NO is a position of an element in a H, N, J, L matrix COO format, k in the step 2 is equivalent to, zone is an identifier of an element in a H, N, J, L sub-area, and the number of PQ and PV nodes is npqpv;

(3.3) filling the Cmp with elements in matrix H, N, J, L, wherein the value of zone in the H matrix is set to 1; adding npqpv to the value of element y in the N array, and setting zone to 2; the value of an element x in the J array is added with npqpv, and zone is set to be 3; in the L array, the value of an element x is added with npqpv, the value of y is added with npqpv, and zone is set to be 4;

(3.4) reordering the set Cmp according to the ascending order x and the ascending order y to obtain a Jacobi matrix COO format with ascending rows; adding a variable num into the set Cmp to represent the position of an element in the Jacobian matrix in a standard COO format, and assigning 1,2 and 3 … in sequence;

(3.5) sorting the Cmp { x, y, NO, zone, num } according to ascending order of zone and ascending order of NO, wherein the num sequence in each zone is the lookup table M_H2Ja、M_N2Ja、M_J2Ja、M_L2Ja；

(4) Calculating the injection power of all nodes by using a node injection power kernel function S in the GPU;

(5) the Jacobian submatrix computation kernel H, M, N, L in the GPU computes H, N, J, L non-zero elements in the four submatrices respectively and stores the non-zero elements in the Jacobian submatrices.

In the step (1), the number of nodes of the power grid is n, the node admittance array Y { i, j, G, B } of the power grid is stored in a Coordinate sparse format of a row number i, a column number j, a conductance G, and a susceptance B, a relative offset position of each row of data of the node admittance array in Y is stored in a row offset vector R, and a node voltage phase angle vector is defined as theta, and a voltage amplitude vector is defined as V.

In the step (4), the calculation mode of the node injection power is that the k number thread of the kernel function S calculates the active power injection and the reactive power injection of the k number node, and the calculation formula is as follows:

wherein,

k is a parallel thread number in GPU programming design;

P_kinjecting active power for node k;

Q_kreactive power injection for node k;

k_bstarting index of the k-th row of non-zero elements, k, of the admittance array Y_b＝R[k]；

k_eTermination index for the k-th row of non-zero elements of the admittance array Y, k_e＝R[k+1]-1；

j, G and B are data stored in the first position in the admittance array Y;

V_kthe voltage amplitude of node k;

V_jthe voltage amplitude of the j node;

θ_kvoltage phase angle of node k;

θ_jthe voltage phase angle of node j.

In the step (5), a method for calculating non-zero elements in the H submatrix is as follows:

according to the mapping relation table M of the H sub-matrix non-zero elements and the node admittance array Y obtained in the step (2)_H2YWhen the t number thread of the H kernel function is calculated, the table M is inquired_H2YThe index number k of the Y array is taken out from the t position_H2YTaking out Y [ k ]_H2Y]Corresponding data { i, j, G, B } in the elements are calculated, and the calculation formula of the t-th non-zero element of the submatrix H in the Jacobian matrix is as follows:

wherein,

t is the number of parallel threads in GPU programming design; h_tIs a non-zero element at the t-th position in the COO format of the H submatrix;

i, j, G, B is the k-th order in the admittance array Y_H2YData stored at each location;

Q_iinjecting reactive power for node i;

then according to the mapping relation M of the H sub-matrix nonzero element and the matrix Ja obtained in the step (3)_H2JaQuery M_H2JaPosition t of (1) take out H_tStorage index number k of the calculation result in Ja matrix_H2JaThen adding H_tThe calculation result of (c) is stored in Ja [ k ]_H2Ja]。

Has the advantages that: compared with the prior art, the invention has the beneficial effects that: firstly, the Jacobian matrix calculation is accelerated by adopting a GPU processor, and compared with a CPU, the acceleration ratio is 16 times; secondly, the thread design method optimizes the performance of the GPU program, avoids judging the branch structure of the sub-matrix area process to which the element belongs when using single kernel function to directly calculate through sub-matrix calculation in the Jacobian matrix non-zero element calculation process, and improves the GPU branch execution efficiency; and the problem of unbalanced thread load is solved by intensively calculating non-zero elements in the submatrices with the same calculation formula, and the parallel calculation efficiency is improved.

Drawings

FIG. 1 is a flowchart of a GPU thread design method for power flow Jacobian matrix calculation according to the present invention

Detailed Description

As shown in fig. 1, the present invention provides a GPU thread design method for power flow jacobian matrix calculation based on a GPU, which allocates calculation resources by using a mapping relationship, the method comprising the following steps:

the node number of a power grid is set as n, a node admittance array Y { i, j, G, B } of the power grid is stored in a COO sparse format of line number, column number, conductance and susceptance, an index in the COO format is converted into a CompressedSparseRow (CSR) line compression format, a line offset array vector is set as R, a node voltage phase angle vector is set as theta, and a voltage amplitude vector is set as V;

step 1: generating a look-up table M_H2Y、M_N2Y、M_J2Y、M_L2Y

1) Scanning COO format node admittance array Y, selecting element corresponding to H matrix calculation and adding it into set H_YGenerating a look-up table M_H2Y；

c) Generation of k → h_kMapping table M_H2Y；

2) Scanning COO format node admittance array Y, selecting element corresponding to N matrix calculation and adding into set N_YGenerating a mapping table M_N2Y；

b) Each time an element is added to the set N_YTime, record set N_YThe current number k of elements, and the position n of the element in the COO format of the matrix Y_k(ii) a Record i inPosition x in PQ, PV node_kJ position y in PQ node_k(ii) a Set N_YThe middle element is expressed as k, n_k，x_k，y_k}；

c) Generation of k → n_kMapping table M_N2Y；

3) Scanning COO format node admittance array Y, selecting element corresponding to J matrix calculation and adding into set M_YGenerating a mapping table M_J2Y；

c) Generation k → j_kMapping table M_J2Y；

4) Scanning COO format node admittance array Y, selecting the element corresponding to L matrix calculation and adding to set L_YGenerating a look-up table M_L2Y；

c) Generation k → l_kMapping table M_L2Y；

Step 2: generating a look-up table M_H2Ja、M_N2Ja、M_J2Ja、M_L2Ja

1) According to H obtained in step 1_Y、N_Y、J_Y、L_YX in (2)_k，y_kResulting in a sparse format of matrix H, N, J, L

2) Defining a set Cmp { x, y, NO, zone }, wherein x is a Jacobian matrix row number, y is a Jacobian matrix column number, NO is a position of an element in a H, N, J, L matrix COO format, and is equivalent to k in step 1, zone is an identifier of the element in a H, N, J, L sub-area, and the number of PQ and PV nodes is npqpv;

3) filling the Cmp with elements in the matrix H, N, J, L, wherein the value of zone in the H matrix is set to 1; adding npqpv to the value of element y in the N array, and setting zone to 2; the value of an element x in the J array is added with npqpv, and zone is set to be 3; in the L array, the value of an element x is added with npqpv, the value of y is added with npqpv, and zone is set to be 4;

4) reordering the set Cmp according to the ascending order x and the ascending order y to obtain a Jacobi matrix COO format with ascending rows; adding a variable num into the set Cmp to represent the position of an element in the Jacobian matrix in a standard COO format, and assigning 1,2 and 3 … in sequence;

5) sorting the Cmp { x, y, NO, zone, num } according to ascending order of zone and ascending order of NO, wherein the num sequence in each zone is the lookup table M_H2Ja、M_N2Ja、M_J2Ja、M_L2Ja。

And step 3: computing node injected power

And calculating active reactive power injection of all nodes by the S-kernel function, setting the starting parameter of the kernel function to be < < < n/256+1,256> >, namely starting the number of one-dimensional block (n/256 ] +1, starting 256 threads in each block, and totaling n threads. The required data is transmitted into a GPU in advance, the k number thread of an S kernel function calculates the active power injection and the reactive power injection of the k number node, and the thread calculation formula is as follows:

wherein,

j, G and B are data stored in the ith position of the admittance array Y.

And 4, step 4: calculating the value of non-zero element in Jacobian matrix

The Jacobian matrix calculation is divided into four parts, the sub-matrices are calculated H, M, J, L respectively, and the thread of each sub-function in the calculation process is calculated by directly reading the calculation data through mapping.

Taking the non-zero value calculation of the submatrix H as an example,

H₁the kernel function is responsible for calculating the non-zero elements of the submatrix H and setting the starting parameters thereof to<<<H_num/256+1,256>>>One-dimensional block number [ H ] is started immediately_num/256]+1,256 threads are opened in each block for a total of H_numA plurality of threads, wherein H_numIs the number of non-zero elements of the submatrix H. Pre-transmitting the required data into GPU H₁When the t number thread of the kernel function is calculated, the table M is inquired_H2YThe index number k of the Y array is taken out from the t position_H2YTaking out Y [ k ]_H2Y]Calculating corresponding data { i, j, G, B } in the elements, wherein the calculation content is the tth nonzero element of the submatrix H in the Jacobi matrix, and calculating the publicThe formula is as follows:

wherein,

Q_iinjecting reactive power for node i.

Establishment of H₂Mapping relation M between global thread number t of kernel function and matrix Jacobi_H2Ja，H₂After the t number thread of the kernel function calculates the result, inquiring M_H2JaThe storage index number k of the calculation result in the Jacobi matrix is taken out from the position of the t_H2JaThen, the calculation result is stored in Jacobi [ k ]_H2Ja]。

The other sub-matrices are calculated similarly.

Claims

1. A GPU thread design method for power flow Jacobian matrix calculation is characterized by comprising the following steps: the method comprises the following steps:

(1) inputting power grid data, and preprocessing a node admittance array Y;

(2.1) scanning the node admittance array Y of COO format, selecting the node admittance array Y corresponding to H momentElement joining set H for array computation_YGenerating a look-up table M_H2Y；

c) Generation of k → h_kMapping table M_H2Y；

c) Generation of k → n_kMapping table M_N2Y；

a) The row number i of each element in the COO format is scanned and, when it belongs to a PQ node,let i be pq [ x ]]And then scanning the column number j, and when j belongs to PQ or PV node, recording j as pqpv [ y ═]Selecting the element to join the set M_Y；

c) Generation k → j_kMapping table M_J2Y；

c) Generation k → l_kMapping table M_L2Y；

2. The method of claim 1, wherein the method comprises the steps of: in the step (1), the number of nodes of the power grid is n, the node admittance array Y { i, j, G, B } of the power grid is stored in a Coordinate sparse format of a row number i, a column number j, conductance G and susceptance B, the relative offset position of each row of data of the node admittance array in Y is stored in a row offset vector R, and a node voltage phase angle vector is defined to be theta, and a voltage amplitude vector is defined to be V.

3. The method of claim 1, wherein the method comprises the steps of:

the calculation mode of the node injection power in the step (4) is that the k number thread of the kernel function S calculates the active power injection and the reactive power injection of the k number node, and the calculation formula is as follows:

<mrow> <msub> <mi>P</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <msub> <mi>k</mi> <mi>b</mi> </msub> </mrow> <msub> <mi>k</mi> <mi>e</mi> </msub> </munderover> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>&lsqb;</mo> <mi>G</mi> <mi> </mi> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>B</mi> <mi> </mi> <mi>s</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

<mrow> <msub> <mi>Q</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <msub> <mi>k</mi> <mi>b</mi> </msub> </mrow> <msub> <mi>k</mi> <mi>e</mi> </msub> </munderover> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>&lsqb;</mo> <mi>G</mi> <mi> </mi> <mi>s</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>B</mi> <mi> </mi> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

wherein,

k is the parallel thread number of the S kernel function in GPU programming design;

P_kinjecting active power for node k;

Q_kreactive power injection for node k;

j, G and B are data stored in the first position in the admittance array Y;

V_kthe voltage amplitude of node k;

V_jthe voltage amplitude of the j node;

θ_kvoltage phase angle of node k;

θ_jthe voltage phase angle of node j.

4. The method of claim 1, wherein the method comprises the steps of:

in the step (5), the calculation method of the non-zero elements in the H submatrix is as follows:

according to the mapping relation table M of the H sub-matrix non-zero elements and the node admittance array Y obtained in the step (2)_H2YWhen the t number thread of the H kernel function is calculated, the table M is inquired_H2YThe index number k of the Y array is taken out from the t position_H2YTaking out Y [ k ]_H2Y]Calculating corresponding data { i, j, G, B } in the elements, and calculating the t-th nonzero coefficient of a submatrix H in a Jacobi matrixThe calculation formula of the elements is as follows:

<mrow> <msub> <mi>H</mi> <mi>t</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>V</mi> <mi>i</mi> </msub> <msub> <mi>V</mi> <mi>j</mi> </msub> <mrow> <mo>&lsqb;</mo> <mrow> <mi>G</mi> <mi> </mi> <mi>sin</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> </mrow> <mo>)</mo> </mrow> <mo>-</mo> <mi>B</mi> <mi> </mi> <mi>cos</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> </mrow> <mo>)</mo> </mrow> </mrow> <mo>&rsqb;</mo> </mrow> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>V</mi> <mi>i</mi> <mn>2</mn> </msubsup> <mi>B</mi> <mo>+</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

wherein,

t is the parallel thread number of the H kernel function in the GPU programming design;

H_tis a non-zero element at the t-th position in the COO format of the H submatrix;

Q_iinjecting reactive power for node i;

V_ithe voltage amplitude of node i;

V_jthe voltage amplitude of the j node;

θ_ithe voltage phase angle of the node I is shown;

θ_jthe voltage phase angle of the j node;