CN118013866A

CN118013866A - Medium-and-long-term runoff prediction method based on horizontal and vertical attention

Info

Publication number: CN118013866A
Application number: CN202410421506.3A
Authority: CN
Inventors: 张秀伟; 尹翰林; 晁志龙; 张小玲; 张艳宁; 郑启睿; 王兆鑫; 朱明转; 赵黎阳
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2024-04-09
Filing date: 2024-04-09
Publication date: 2024-05-10
Anticipated expiration: 2044-04-09
Also published as: CN118013866B

Abstract

The embodiment of the application relates to the technical field of machine learning, and discloses a middle-long-term runoff prediction method based on transverse and longitudinal attention, which comprises the following steps of: generating an original data set according to runoff data and rainfall data in a target flow field; preprocessing an original data set, and dividing three time-exclusive parts of a training set, a verification set and a test set according to time; the method comprises the steps of constructing a middle-long-term runoff prediction model based on transverse attention and longitudinal attention, wherein the middle-long-term runoff prediction model based on transverse attention and longitudinal attention consists of a convolution layer, a sequence attention module, a periodic attention module, a fusion module and an output selection operation unit; performing repeated iterative training on the model to converge based on the training set, the verification set and the test set to obtain a trained model; and inputting the runoff data and the rainfall data in the river basin to be predicted into the model after training is completed, and obtaining a runoff prediction result. The method integrates sequence information and period information during modeling, and improves the accuracy of medium-long-term runoff prediction.

Description

Medium-and-long-term runoff prediction method based on horizontal and vertical attention

Technical Field

The embodiment of the application relates to the technical field of machine learning, in particular to a middle-long-term runoff prediction method based on transverse and longitudinal attention.

Background

Runoff prediction is an important discipline and has wide application in the hydrologic related field. Based on the difference in time resolution, runoff predictions can be classified into short-term runoff predictions (hours, days, etc.) and medium-long-term runoff predictions (weeks, ten days, months, etc.). Short-term runoff prediction plays a vital role in various applications such as flood control, dam construction, sewage treatment, urban drainage management and the like, and medium-long term runoff prediction is widely and closely related to reservoir operation, agricultural planning, drought relief, water management, power generation planning, water resource planning and management, climate change influence research, drought disaster early warning, ecological system protection and the like.

The runoff prediction method is mainly divided into two kinds, namely a process driving method and a data driving method. Traditional runoff prediction models are often process-driven, are suitable for short-term runoff prediction, often need to consider a plurality of factors such as topography, soil type, vegetation coverage and the like, and are difficult to accurately describe a highly nonlinear relationship in a hydrologic process. The data driving method can reduce the dependence on the complex physical processes by learning the mode of the input data, thereby simplifying the model structure. Unlike short-term runoff prediction, the data of medium-long term runoff prediction has a long time span, and is difficult to describe by using a mechanism model, and factors such as uncertainty of future weather and hydrologic changes are challenges facing the medium-long term runoff prediction. The data driving method is more flexible, so that the method is more suitable for medium-long term runoff prediction.

In recent years, with the development of artificial intelligence, deep learning techniques have been increasingly applied as a data-driven model to medium-and Long-Term runoff prediction, such as a runoff prediction method based on an LSTM (Long Short-Term Memory network) model and a runoff prediction method based on a transducer model.

However, the runoff prediction method based on the LSTM model and the runoff prediction method based on the transducer model are modeled based on the time sequence direction, and the periodic direction is not considered, so that the model is difficult to learn the periodic information of the runoff changing along with seasons, and the effect of the medium-long-term runoff prediction is poor.

Disclosure of Invention

The embodiment of the application aims to provide a middle-long-term runoff prediction method based on transverse and longitudinal attention, which fuses sequence information and period information during modeling and greatly improves the prediction precision of middle-long-term runoff prediction.

In order to solve the technical problems, the embodiment of the application provides a middle-long-term runoff prediction method based on transverse and longitudinal attention, which comprises the following steps of:

generating an original data set according to the collected runoff data and rainfall data in the target flow field;

Preprocessing the original data set, and dividing three time-exclusive parts of a training set, a verification set and a test set according to time; wherein the preprocessing comprises outlier processing, null value processing, data selection, normalization and data serialization;

Constructing a middle-long-term runoff prediction model based on transverse and longitudinal attention; the middle-long-term runoff prediction model based on transverse attention and longitudinal attention consists of a convolution layer, a sequence attention module, a period attention module, a fusion module and an output selection operation unit, wherein each sample input into the convolution layer is one Matrix of/>Is the number of rows of the matrix, includingHistorical reference information of a line and lateral time sequence of last line,/>For the number of columns of a matrix, namely the length of the transverse time sequence, the convolution layer is used for performing one-dimensional convolution operation on the historical reference information row by row and splicing the historical reference information with the transverse time sequence, the sequence attention module is used for calculating transverse attention based on the output of the convolution layer and extracting sequence information, the periodic attention module is used for calculating longitudinal attention based on the output of the convolution layer and extracting periodic information, the fusion module is used for fusing the output of the sequence attention module and the output of the periodic attention module, the output selection operation unit is used for cutting the output of the fusion module, removing known sequence parts and only reserving a predicted sequence part as a runoff prediction result;

Performing repeated iterative training on the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention to convergence based on the training set, the verification set and the test set to obtain a trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention;

Inputting runoff data and rainfall data in a to-be-predicted river basin into the trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention to obtain a runoff prediction result corresponding to the to-be-predicted river basin, which is output by the trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention;

The input of the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention is rainfall data tensor and runoff data tensor, and the rainfall data tensor and the runoff data tensor are in the shape ，/>For the number of small batches of samples, the run-off data tensor/>Carrying out masking processing on the lines, wherein the data form of the rainfall data tensor and the data form of the runoff data tensor are expressed as follows;

Wherein, Representing the rainfall data tensor,/>Representing the runoff data tensor,/>A/>, representing the rainfall data tensorLine/>Rainfall data of column,/>A/>, representing the runoff data tensorLine/>Runoff data of column,/>Representing the length of the known sequence portion,/>Representing the number of predicted steps, i.e. the length of the predicted sequence portion,/>To/>All representing padding with 0s, i.e. representing the masking process;

The one-dimensional convolution operation is carried out on the historical reference information row by row, and the one-dimensional convolution operation is realized through the following formula:

Wherein, Representing a set of convolution kernels,/>For the size of the set of convolution kernels, i.e. the total number of convolution kernels in the set of convolution kernels,/>Representing the/>, of the set of convolution kernelsConvolution kernels,/>Represents the/>, of the convolution layer on the rainfall data tensorLine/>Performing a one-dimensional convolution operation on the column of rainfall data,/>Representing the convolution layer for the runoff data tensor/>Line/>The run-off data of the columns performs a one-dimensional convolution operation.

The embodiment of the application also provides a middle-long-term runoff prediction system based on transverse and longitudinal attention, which comprises the following steps:

the data preparation module is used for collecting runoff data and rainfall data in the target flow field and generating an original data set according to the collected runoff data and rainfall data in the target flow field;

The preprocessing module is used for preprocessing the original data set, dividing three time-exclusive parts of a training set, a verification set and a test set according to time, wherein the preprocessing comprises outlier processing, null processing, data selection, standardization and data serialization;

The model construction module is used for constructing a middle-long-term runoff prediction model based on transverse and longitudinal attention; the middle-long-term runoff prediction model based on transverse attention and longitudinal attention consists of a convolution layer, a sequence attention module, a period attention module, a fusion module and an output selection operation unit, wherein each sample input into the convolution layer is one Matrix of/>Is the number of rows of the matrix, including front/>Historical reference information of a line and lateral time sequence of last line,/>For the number of columns of a matrix, namely the length of the transverse time sequence, the convolution layer is used for performing one-dimensional convolution operation on the historical reference information row by row and splicing the historical reference information with the transverse time sequence, the sequence attention module is used for calculating transverse attention based on the output of the convolution layer and extracting sequence information, the periodic attention module is used for calculating longitudinal attention based on the output of the convolution layer and extracting periodic information, the fusion module is used for fusing the output of the sequence attention module and the output of the periodic attention module, the output selection operation unit is used for cutting the output of the fusion module, removing known sequence parts and only reserving a predicted sequence part as a runoff prediction result;

the model training module is used for carrying out repeated iterative training on the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention to convergence based on the training set, the verification set and the test set to obtain a trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention;

The model use module is used for inputting the runoff data in the river basin to be predicted and the rainfall data corresponding to time to the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after the training is completed, and obtaining the runoff prediction result corresponding to the river basin to be predicted, which is output by the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after the training is completed;

The embodiment of the application also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a mid-to-long term runoff prediction method based on lateral attention as described above.

The embodiment of the application also provides a computer readable storage medium storing a computer program which when executed by a processor realizes the medium-long runoff prediction method based on transverse and longitudinal attention.

The embodiment of the application provides a middle-long-term runoff prediction method based on transverse and longitudinal attention, which aims at solving the problem that the periodic information of the runoffs along with the seasonal change is difficult to learn by a runoff prediction method based on an LSTM model and a runoff prediction method based on a transducer model, so that the effect of the middle-long-term runoff prediction is poor, and the middle-long-term runoff prediction model based on transverse and longitudinal attention is constructed, trained and used for runoff prediction. The horizontal direction refers to the calculation of the attention of the input matrix in the horizontal direction (namely, each row), the information of the input data along the time sequence is mined, the vertical direction refers to the calculation of the attention of the input matrix in the vertical direction (namely, each column), and the period information of the input data in the same and similar months in different years is mined. The input historical reference information is convolved based on the middle-long-term runoff prediction model with horizontal attention and vertical attention, so that the reference information of a prediction sequence is not limited to the same month, but is expanded to the month nearby the same month, periodicity is smoother, when the attention is calculated, the sequence attention is calculated, and meanwhile, the periodic attention is calculated, namely, the sequence information and the periodic information are fused to give a runoff prediction result, and the prediction precision of the middle-long-term runoff prediction is greatly improved. Considering the production and life of people in runoff forecasting, a great amount of key data is provided for flood control and disaster reduction and even national defense and military, so certain secret treatment is needed, and based on the result, the application also provides a runoff data tensorMasking processing is performed, thereby preventing information leakage.

In some alternative embodiments, the input to the sequence attention module isAnd/>，/>AndIs of the shape/>The input of the periodic attention module is/>And/>，For/>Transpose of/>For/>Transpose of/>And/>Is of the shape ofThe sequence attention module and the periodic attention module have the same composition and are composed of an input conversion layer, a position embedding operation unit, an encoder, a decoder and an output conversion operation unit;

The input transformation layer is used for mapping the low-dimensional characteristics input by the input transformation layer to a high-dimensional space; wherein the input conversion layer of the sequence attention module performs dimension-lifting operation on the characteristic dimension, namely Raise dimension to/>The input conversion layer of the spatial attention module performs dimension-lifting operation in the sequence length dimension, namely/>Raise dimension to/>；

The position embedding operation unit is used for encoding the length input by the position embedding operation unit and increasing the dimension toObtaining position information, and embedding the position information into the own input of the position embedding operation unit;

the encoder is formed by stacking a plurality of encoder layers, each encoder layer comprises a multi-head self-attention sub-layer and a full-connection sub-layer, and residual connection and layer normalization are arranged behind each sub-layer;

The decoder is formed by stacking a plurality of decoder layers, each decoder layer comprises a multi-head self-attention sub-layer, a multi-head cross-attention sub-layer and a full-connection sub-layer, and residual connection and layer normalization are arranged behind each sub-layer;

the output transformation operation unit is used for reducing the dimension; wherein the output transformation operation unit of the sequence attention module is to Dimension reduction of 1 to obtain a shape of/>Output sequence/>The output conversion operation unit of the periodic attention module firstly outputs/>Dimension reduction is/>The shape is obtained as/>Extracting the last row of the intermediate tensor and exchanging the dimension order to obtain the shape as/>Output sequence/>。

In some alternative embodiments, the input transform layer of the sequence attention module performs an up-scaling operation in a feature dimension and the input transform layer of the periodic attention module performs an up-scaling operation in a sequence length dimension by the following formula:

Wherein, 、/>、/>And/>All represent learnable parameters,/>And/>All representing the output of the input transform layer of the sequence attention module,/>And/>All representing the output of the input transform layer of the periodic attention module;

the position embedding operation unit is used for encoding the length input by the position embedding operation unit and increasing the dimension to Obtaining position information, and embedding the position information into own input of the position embedding operation unit, wherein the position information is realized by the following formula:

Wherein, Representing the location information,/>And/>Output of a position embedding operation unit representing the sequence attention module,/>And/>Representing the output of the position embedding operation unit of the periodic attention module.

In some alternative embodiments, the attention mechanisms of the multi-headed self-attention sub-layer and the multi-headed cross-attention sub-layer are expressed by the following formulas:

Wherein, Representing query vectors,/>Representing key vectors,/>Representing a value corresponding to a key vector,/>、/>、/>Respectively represent、/>、/>Input of/>Representation/>Dimension in hidden space,/>Representation/>Transpose of/>Representing softmax operation, when/>The values of (1) are equal to each other,/>Represent self-attention, when/>Is of the value of (2)When the values of (a) are different,/>Representing cross attention,/>Representing the number of heads,/>Represents the/>Output of individual head,/>Representing the splice operation, when/>The values of (1) are equal to each other,/>Representing multi-headed self-attention, when/>Values of/>When the values of (a) are different,/>Representing multi-headed cross attention,/>And/>All represent a learnable parameter.

In some optional embodiments, the training set, the verification set and the test set are used for performing iterative training on the mid-long-term runoff prediction model based on the transverse attention to converge for multiple times, so as to obtain a trained mid-long-term runoff prediction model based on the transverse attention, including: inputting the training set into the middle-long-term runoff prediction model based on the transverse attention, performing repeated iterative training on the middle-long-term runoff prediction model based on the transverse attention, calculating a smooth Nash efficiency coefficient loss by using a forward propagation algorithm, and updating model parameters of the middle-long-term runoff prediction model based on the transverse attention by using a backward propagation algorithm based on the smooth Nash efficiency coefficient loss; in each iteration, the Nash efficiency coefficient is used for measuring the performance of the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after the training of the current iteration times under the verification set, and after the preset iteration times are reached, the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention, which has the best performance, is stored under the verification set and is used as a middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after the training is completed; inputting the test set into the training-completed middle-long-term runoff prediction model based on the transverse and longitudinal attention to obtain runoff prediction results corresponding to all the test data in the test set, and evaluating the runoff prediction results corresponding to all the test data by combining with real runoff data. The Nash efficiency coefficient is very suitable for verifying the quality of the simulation result of the hydrological model, the smooth Nash efficiency coefficient loss is used as a loss function in the iterative training process, and the Nash efficiency coefficient is used for measuring the performance of the model under a verification set in the verification process, so that the training effect of the middle-long-term runoff prediction model based on transverse and longitudinal attention is further improved, and the prediction precision of runoff prediction is further improved.

In some alternative embodiments, the calculating the smoothed nash efficiency coefficient loss is accomplished by the following formula:

Wherein, Representing the total number of batches,/>Represents the/>Total number of samples in batch of small batch samples,/>The representation is for the/>/>, In batch of small batch samplesFirst/>, of the samplesRunoff prediction results of steps,/>Representing the predicted number of steps,/>Represents the/>/>, In batch of small batch samplesFirst/>, of the samplesReal runoff data of steps,/>Represents the/>Standard deviation of true runoff data for all samples in a batch of small samples,/>Representing a preset minimum integer for preventing denominator from being 0,/>Representing the calculated smoothed nash efficiency coefficient loss value;

the calculation formula of the Nash efficiency coefficient is as follows:

Wherein, Representing the mean value of the real runoff data of all samples in the validation set,/>Representing the calculated nash efficiency coefficient.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings.

FIG. 1 is a flow chart of a mid-to-long term runoff prediction method based on lateral attention provided in one embodiment of the present application;

FIG. 2 is a schematic diagram of one-dimensional convolution operation provided in one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of the working principle of a convolutional layer provided in one embodiment of the present application;

FIG. 4 is a schematic diagram of the specific structure and data flow of a mid-to-long term runoff prediction model based on lateral attention provided in one embodiment of the present application;

FIG. 5 is a flowchart of performing multiple iterative training to convergence on a mid-to-long term runoff prediction model based on lateral attention based on a training set, a validation set and a test set provided in one embodiment of the present application to obtain a trained mid-to-long term runoff prediction model based on lateral attention;

FIG. 6 is a schematic diagram of a long-term and medium-term runoff prediction system based on attention in the lateral direction according to another embodiment of the present application;

Fig. 7 is a schematic structural diagram of an electronic device according to another embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be understood by those of ordinary skill in the art that in various embodiments of the present application, numerous specific details are set forth in order to provide a thorough understanding of the present application. The claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments can be mutually combined and referred to without contradiction.

An embodiment of the present application provides a method for predicting medium-long-term runoff based on horizontal and vertical attention, which is applied to an electronic device, wherein the electronic device may be a terminal or a server, and the embodiment and the following embodiments illustrate the electronic device by taking the server as an example. The implementation details of the middle-long-term runoff prediction method based on the transverse attention and the longitudinal attention are specifically described below, and the following is only implementation details provided for facilitating understanding, and is not necessary for implementing the present embodiment.

The specific flow of the middle-long-term runoff prediction method based on the transverse attention and the longitudinal attention according to the embodiment may be as shown in fig. 1, which specifically includes:

And step 101, generating an original data set according to the collected runoff data and rainfall data in the target flow field.

In specific implementation, a server firstly performs data preparation, collects runoff data of each site in a target flow area and rainfall data corresponding to the runoff data in time, and generates an original data set according to the collected runoff data and rainfall data in the target flow area, wherein the time resolution can be set to week, ten days, month and the like according to requirements.

Step 102, preprocessing the original data set, and dividing three time-exclusive parts of a training set, a verification set and a test set according to time.

In a specific implementation, after the server generates the original data set, preprocessing including outlier processing, null processing, data selection, standardization and data serialization can be performed on the original data, and three time-exclusive parts of a training set, a verification set and a test set are divided according to time for the data set obtained after preprocessing. The data selecting process is a process of removing sites with excessive missing data, and the time mutual exclusion among the training set, the verification set and the test set is that the data in the training set, the verification set and the test set belong to different time periods.

And step 103, constructing a middle-long-term runoff prediction model based on transverse attention and longitudinal attention, wherein the middle-long-term runoff prediction model based on transverse attention and longitudinal attention consists of a convolution layer, a sequence attention module, a periodic attention module, a fusion module and an output selection operation unit.

In a specific implementation, after the training set, the verification set and the test set are obtained by dividing, the server can construct a middle-long-term runoff prediction model based on transverse attention and longitudinal attention, wherein the middle-long-term runoff prediction model based on transverse attention and longitudinal attention consists of a convolution layer, a sequence attention module, a periodic attention module, a fusion module and an output selection operation unit.

In a specific implementation, each sample of the input convolution layer is oneMatrix of/>Is the number of rows of the matrix, including front/>Historical reference information of a line and lateral time sequence of last line,/>Is the number of columns of the matrix and is also the length of the lateral time series. Taking the example of 1 month in 2024 as the start time, assuming that runoff prediction is performed with reference to characteristic information (including runoff data and rainfall data) 10 years before 2024, and runoff for 8 months is predicted using data of 1 month to 7 months, the time series length is 8, including a known sequence portion of length 7, and a predicted sequence portion of length 1 (i.e., the number of predicted steps is 1), at which time each sample of the input convolution layer is one/>A matrix of (a) may be expressed as:

In the method, in the process of the invention, Characteristic information representing month 1 of 2024.

In one example, the input of the mid-to-long term runoff prediction model based on lateral attention is a rainfall data tensorAnd runoff data tensor/>，/>And/>Is of the shape/>，/>Representing the number of small batches of samples, runoff data tensor/>(1 /)The lines are masked, and rainfall data tensors/>Data forms and runoff data tensors/>May be expressed as a data form of (a);

In the method, in the process of the invention, Representing rainfall data tensors,/>Representing runoff data tensors,/>First/>, representing rainfall data tensorLine 1Rainfall data of column,/>First/>, representing a tensor of radial flow dataLine/>Runoff data of column,/>Representing the length of a known sequence portion,/>Representing the number of predicted steps, i.e. the length of the predicted sequence portion,/>To/>All represent padding with 0 s, i.e. represent masking.

It can be understood that the runoff prediction is used for the production and life of people, and provides a great amount of key data for flood control and disaster reduction, even national defense and military, so that certain security treatment is needed, and based on the key data, the embodiment tensors the runoff data(1 /)Masking processing is performed, thereby preventing information leakage.

In a specific implementation, a convolution layer of the middle-long-term runoff prediction model based on transverse and longitudinal attention is used for performing one-dimensional convolution operation on the historical reference information row by row and splicing the historical reference information with a transverse time sequence, so that the reference information of a prediction sequence part is not limited to the same month but extends to the month nearby the same month, and the specific extension range is related to the size of a convolution kernel.

In one example, a one-dimensional convolution operation multiplies the weights of the convolution kernel by the corresponding portions of the input sequence of the convolution layer by sliding the convolution kernel over the input sequence of the convolution layer and adds them to finally yield each element of the output sequence of the convolution layer, the sliding of the convolution kernel over the input sequence being controlled by a stride. The convolution layer performs one-dimensional convolution operation on the historical reference information row by row, and can be realized by the following formula:

In the method, in the process of the invention, Representing a set of convolution kernels,/>Is the size of the set of convolution kernels, i.e., the total number of convolution kernels in the set of convolution kernels,/>Representing the/>, of the set of convolution kernelsConvolution kernels,/>The/>, representing the convolutions layer versus the tensor of the rainfall dataLine/>Performing a one-dimensional convolution operation on the column of rainfall data,/>Representing the/>, of the convolution layer for the radial flow data tensorLine/>The run-off data of the columns performs a one-dimensional convolution operation. Output of convolutional layerAnd/>Will be input to the sequence attention module, and/>And/>Transpose/>And/>Will be the input to the periodic attention module.

It is noted that the one-dimensional convolution operation reduces the length of the input sequence of the convolution layer, and zero values need to be added at both ends of the input sequence of the convolution layer in order to make the length of the output sequence of the convolution layer equal to the length of the input sequence. Fig. 2 shows the principle of one-dimensional convolution operation, the convolution kernel size shown in fig. 2 is 3, the stride is 1, and 1 zero value is added to each of the two ends of the input sequence.

In one example, FIG. 3 illustrates the principle of operation of a convolutional layer, which, as shown in FIGS. 2 and 3, references historical reference information (i.e., the previousRow) performs a one-dimensional convolution operation row by row, then with a lateral time series (i.e., the/>Row) splice. In this way, when the periodic attention calculation is performed later, the information of the months near the past same month can be referred to, and the smooth periodicity is enhanced. To/>For example, in the case where a convolution layer is not used,/>Reference only/>And using a convolution layer with a convolution kernel size of 3,/>Then reference can be made to、/>And/>Is a piece of information of (a). And then/>For example, in the case where a convolution layer is not used,/>Reference can only be made toAnd after using a convolution layer with a convolution kernel size of 3,/>Can be referred to/>And/>Is a piece of information of (a).

In a specific implementation, the sequence attention module is responsible for calculating the transverse attention based on the output of the convolution layer, extracting the sequence information, and the periodic attention module is responsible for calculating the longitudinal attention based on the output of the convolution layer, extracting the periodic information.

In one example, the input to the sequence attention module isAnd/>，/>And/>Is of the shape/>The input of the periodic attention module is/>And/>，/>For/>Is to be used in the present invention,For/>Transpose of/>And/>Is of the shape/>。

In one example, the sequence attention module and the period attention module are both used essentially to calculate attention, and thus both are identical in composition, and are composed of an input transform layer, a position embedding operation unit, an encoder, a decoder, and an output transform operation unit. The information extracted by the sequence attention module and the period attention module and the transformation operation on the output are different due to the different shapes of the input data.

In one example, the input transform layer is a fully connected layer that maps low-dimensional features of its own input to high-dimensional space. Wherein, the input conversion layer of the sequence attention module performs dimension-lifting operation in the characteristic dimension, namelyRaise dimension toThe shape of the output of the input transform layer of the sequence attention module is therefore/>The input conversion layer of the spatial attention module performs dimension-increasing operation in the sequence length dimension, namely/>Raise dimension to/>Therefore the shape of the output of the input transform layer of the spatial attention module is/>。

In one example, the dimension up operation of the input transform layer of the sequence attention module in the feature dimension and the dimension up operation of the input transform layer of the periodic attention module in the sequence length dimension can be achieved by the following formula:

In the method, in the process of the invention, 、/>、/>And/>All represent learnable parameters,/>And/>All representing the output of the input transform layer of the sequence attention module,/>And/>All represent the output of the input transform layer of the periodic attention module, and when model parameters are updated,/>、/>、/>And/>Will participate in the update.

In one example, the position embedding operation unit is used for encoding the length of the input of the position embedding operation unit and increasing the dimension to the position of the input of the position embedding operation unitPosition information is obtained and embedded into the own input of the position embedding operation unit, thereby providing the model with information about the relative positions of the elements in the sequence. It can be appreciated that for the sequence attention module, its position embeds the length/>, of the operation unit input to itselfSimple coding, e.g. into features/>The feature is then upscaled to/>, e.g., by filling 0Position information can be obtained, and for the periodic attention module, the length/>, input by the position embedding operation unit to the position embedding operation unitSimple coding, e.g. into features/>The feature is then upscaled to/>, e.g., by filling 0And obtaining the position information.

In one example, the position embedding operation unit is used for encoding the length of the input of the position embedding operation unit and increasing the dimension to the position of the input of the position embedding operation unitObtaining position information, and embedding the position information into the own input of the position embedding operation unit, wherein the position information can be realized by the following formula:

In the method, in the process of the invention, Representing location information,/>And/>Output of position embedding operation unit representing sequence attention module,/>And/>Representing the output of the position-embedding operation unit of the periodic attention module.

In one example, the encoder is stacked by a plurality of encoder layers, each including a multi-headed self-attention sub-layer and a full-connection sub-layer, each sub-layer being followed by a residual connection and layer normalization, the decoder is likewise stacked by a plurality of decoder layers, each decoder layer including a multi-headed self-attention sub-layer, a multi-headed cross-attention sub-layer and a full-connection sub-layer, each sub-layer being followed by a residual connection and layer normalization. Residual connection can solve the problem of gradient disappearance and the problem of degradation of a weight matrix, layer normalization normalizes the final output dimension, and after residual connection and layer normalization, the sequence attention module obtains a shape ofThe periodic attention module gets the shape/>Is a periodic output of (a).

In one example, multi-headed self-attention and multi-headed cross-attention are variants of the attention mechanism that calculate attention scores for different locations, with high scoring locations indicating high importance being assigned a greater weight. The attention mechanisms of the multi-headed self-attention sub-layer and multi-headed cross-attention sub-layer can be expressed by the following formulas:

In the method, in the process of the invention, Representing query vectors,/>Representing key vectors,/>Representing a value corresponding to a key vector,/>、/>、/>Respectively represent、/>、/>Input of/>Representation/>Dimension in hidden space,/>Representation/>Transpose of/>Representing softmax operation, when/>The values of (1) are equal to each other,/>Represent self-attention, when/>Is of the value of (2)When the values of (a) are different,/>Representing cross attention,/>Representing the number of heads,/>Represents the/>Output of individual head,/>Representing the splice operation, when/>The values of (1) are equal to each other,/>Representing multi-headed self-attention, when/>Values of/>When the values of (a) are different,/>Representing multi-headed cross attention,/>And/>Each representing a learnable parameter that, upon updating of the model parameters,And/>Will participate in the update.

For the multi-headed self-attention sub-layer of the encoder layer,The values of (2) are/>I.e. input of encoder layer, for multi-headed self-attention sub-layer of decoder layer,/>The values of (2) are/>I.e. the input of the decoder layer, the output of the multi-headed self-attention sub-layer of the decoder layer is denoted/>For multi-headed cross-attention sub-layers of the decoder layer,/>The value of (2) is/>，/>The values of (2) are/>。

In one example, the output transform operation unit is used to perform dimension reduction. Because the sequence attention module performs dimension-lifting operation on the characteristic dimension, the output transformation operation is simpler and only needs to be performedThe dimension is reduced to 1, and the shape is obtainedOutput sequence/>（/>Also called sequence information, since only one feature of the runoff value needs to be predicted, only/>, the output needs to be performed). The periodic attention module performs dimension-increasing operation in the length dimension of the sequence, so that the output transformation operation is relatively complex, and/>, the output transformation operation is needed firstDimension reduction is/>That is, the original sequence length is recovered, and the shape of/>, can be obtainedThen extracting the period information of the last line of the intermediate tensor and exchanging dimension order, keeping consistent with the dimension order of the sequence information, and obtaining the shape as/>Output sequence/>（/>Also called periodic information, since only one feature of the runoff value needs to be predicted, only the/>, is output) Fusion module will/>And/>Fusion and addition are carried out, and the fusion and addition are input to an output selection operation unit.

In a specific implementation, the output selection operation unit is configured to cut the output of the fusion module, that is, remove the known sequence portion, and only reserve the predicted sequence portion as the runoff prediction result. Runoff prediction result/>The shape of (2) is/>。

In one example, the specific structure and data flow of the mid-to-long term runoff prediction model based on the lateral attention may be as shown in fig. 4.

And 104, performing repeated iterative training on the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention to convergence based on the training set, the verification set and the test set to obtain the trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention.

In a specific implementation, after the server builds an initial middle-long-term runoff prediction model based on the transverse attention, the middle-long-term runoff prediction model based on the longitudinal attention can be repeatedly trained to be converged based on a training set, a verification set and a test set, so that the trained middle-long-term runoff prediction model based on the transverse attention is obtained.

In one example, the server performs iterative training on the middle-long term runoff prediction model based on the transverse attention and the longitudinal attention for multiple times to convergence based on the training set, the verification set and the test set, so as to obtain the middle-long term runoff prediction model based on the transverse attention and the longitudinal attention after training, which can be implemented through the steps shown in fig. 5, and specifically includes:

Step 1041, inputting the training set into a middle-long-term runoff prediction model based on the horizontal attention, performing multiple iterative training on the middle-long-term runoff prediction model based on the horizontal attention, calculating a smooth Nash efficiency coefficient loss by using a forward propagation algorithm, and updating model parameters of the middle-long-term runoff prediction model based on the horizontal attention by using a backward propagation algorithm based on the smooth Nash efficiency coefficient loss.

In a specific implementation, after an initial middle-long-term runoff prediction model based on transverse attention and longitudinal attention is constructed and obtained, a training set can be input into the middle-long-term runoff prediction model based on transverse attention and longitudinal attention, repeated iterative training is carried out on the middle-long-term runoff prediction model based on transverse attention and longitudinal attention, a forward propagation algorithm is utilized to calculate a smooth Nash efficiency coefficient loss, and model parameters of the middle-long-term runoff prediction model based on transverse attention and longitudinal attention are updated based on the smooth Nash efficiency coefficient loss by a backward propagation algorithm.

In one example, the server computes a smoothed nash efficiency coefficient loss, which can be achieved by the following formula:

In the method, in the process of the invention, Representing the total number of batches,/>Represents the/>Total number of samples in batch of small batch samples,/>The representation is for the/>/>, In batch of small batch samplesFirst/>, of the samplesRunoff prediction results of steps,/>Representing the predicted number of steps,/>Represents the/>/>, In batch of small batch samplesFirst/>, of the samplesReal runoff data of steps,/>Represents the/>Standard deviation of true runoff data for all samples in a batch of small samples,/>Representing a preset minimum integer for preventing denominator from being 0,/>Representing the calculated smoothed nash efficiency coefficient loss value.

In step 1042, in each iteration, the performance of the middle-long term runoff prediction model based on the horizontal attention after training the current iteration number is measured by using the Nash efficiency coefficient, and after the preset iteration number is reached, the middle-long term runoff prediction model based on the horizontal attention with the best performance under the verification set is stored as the middle-long term runoff prediction model based on the horizontal attention after training.

In a specific implementation, the server uses Nash efficiency coefficient to measure the performance of the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after training the current iteration times in each iteration, stops training after reaching the preset iteration times, and stores the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention with the best performance under the verification set, namely stores the model with the largest Nash efficiency coefficient under the verification set as the middle-long-term runoff prediction model based on the transverse attention after training is completed. The Nash efficiency coefficient is very suitable for verifying the quality of the simulation result of the hydrological model, the smooth Nash efficiency coefficient loss is used as a loss function in the iterative training process, and the Nash efficiency coefficient is used for measuring the performance of the model under a verification set in the verification process, so that the training effect of the middle-long-term runoff prediction model based on transverse and longitudinal attention is further improved, and the runoff prediction precision is further improved.

In one example, the Nash efficiency coefficient is calculated as follows:

In the method, in the process of the invention, Representing the mean of the real runoff data of all samples in the validation set,/>Representing the calculated nash efficiency coefficient.

Step 1043, inputting the test set into the trained middle-long term runoff prediction model based on the transverse and longitudinal attention, obtaining the runoff prediction result corresponding to each test data in the test set, and evaluating the runoff prediction result corresponding to each test data by combining the real runoff data.

In a specific implementation, after obtaining the trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention, the server can also input the data in the test set into the trained middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention to obtain a runoff prediction result corresponding to each test data in the test set, and evaluate the runoff prediction result corresponding to each test data by combining with the real runoff data to obtain various evaluation indexes for further improving the model.

And 105, inputting the runoff data and rainfall data in the river basin to be predicted into a middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after training, and obtaining a runoff prediction result corresponding to the river basin to be predicted, which is output by the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after training.

In a specific implementation, after obtaining the trained middle-long-term runoff prediction model based on the horizontal attention, the server can input the runoff data and rainfall data in the to-be-predicted river basin into the trained middle-long-term runoff prediction model based on the horizontal attention, so as to obtain a runoff prediction result corresponding to the to-be-predicted river basin, which is output by the trained middle-long-term runoff prediction model based on the horizontal attention.

In this embodiment, for the problem that the effect of medium-long-term runoff prediction is poor due to difficulty in learning periodic information of runoffs along with seasonal changes in the runoff prediction method based on the LSTM model and the runoff prediction method based on the Transformer model, the medium-long-term runoff prediction model based on the transverse and longitudinal attention is constructed, trained and used for runoff prediction. The horizontal direction refers to the calculation of the attention of the input matrix in the horizontal direction (namely, each row), the information of the input data along the time sequence is mined, the vertical direction refers to the calculation of the attention of the input matrix in the vertical direction (namely, each column), and the period information of the input data in the same and similar months in different years is mined. The input historical reference information is convolved based on the middle-long-term runoff prediction model with horizontal attention and vertical attention, so that the reference information of a prediction sequence is not limited to the same month, but is expanded to the month nearby the same month, periodicity is smoother, when the attention is calculated, the sequence attention is calculated, and meanwhile, the periodic attention is calculated, namely, the sequence information and the periodic information are fused to give a runoff prediction result, and the prediction precision of the middle-long-term runoff prediction is greatly improved.

The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.

Another embodiment of the present application provides a long-medium-and-long-term runoff prediction system based on horizontal attention, and the following specifically describes implementation details of the long-medium-and-long-term runoff prediction system based on horizontal attention, which are provided for convenience in understanding, and not necessary for implementing the present embodiment, and a schematic diagram of the long-medium-and-long-term runoff prediction system based on horizontal attention according to the present embodiment may be shown in fig. 6, and includes:

The data preparation module 201 is configured to collect runoff data and rainfall data in the target flow area, and generate an original data set according to the collected runoff data and rainfall data in the target flow area.

The preprocessing module 202 is configured to preprocess an original data set, and divide three time-exclusive parts of a training set, a verification set and a test set according to time, where the preprocessing includes outlier processing, null processing, data selection, normalization and data serialization.

The model construction module 203 is used for constructing a middle-long-term runoff prediction model based on transverse and longitudinal attention; the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention consists of a convolution layer, a sequence attention module, a period attention module, a fusion module and an output selection operation unit, wherein each sample input into the convolution layer is oneMatrix of/>Is the number of rows of the matrix, including front/>Historical reference information of a line and lateral time sequence of last line,/>For the number of columns of the matrix, namely the length of the transverse time sequence, the convolution layer is used for performing one-dimensional convolution operation on the historical reference information row by row and splicing the historical reference information with the transverse time sequence, the sequence attention module is used for calculating transverse attention based on the output of the convolution layer, extracting sequence information, the periodic attention module is used for calculating longitudinal attention based on the output of the convolution layer, extracting periodic information, the fusion module is used for fusing the output of the sequence attention module and the output of the periodic attention module, the output selection operation unit is used for cutting the output of the fusion module, removing the known sequence part, and only retaining the predicted sequence part as a runoff prediction result.

And the model training module 204 is configured to perform multiple iterative training on the mid-long-term runoff prediction model based on the transverse attention and the longitudinal attention to convergence based on the training set, the verification set and the test set, so as to obtain a trained mid-long-term runoff prediction model based on the transverse attention and the longitudinal attention.

The model use module 205 is configured to input runoff data in a to-be-predicted river basin and rainfall data corresponding to time to the trained middle-long term runoff prediction model based on the lateral attention, and obtain a runoff prediction result corresponding to the to-be-predicted river basin, which is output by the trained middle-long term runoff prediction model based on the lateral attention.

It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units less closely related to solving the technical problem presented by the present application are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.

Another embodiment of the present application relates to an electronic device, as shown in fig. 7, comprising: at least one processor 301; and a memory 302 communicatively coupled to the at least one processor 301; wherein the memory 302 stores instructions executable by the at least one processor 301, the instructions being executable by the at least one processor 301 to enable the at least one processor 301 to perform a mid-to-long term runoff prediction method based on lateral attention as described in the above embodiments.

Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.

Another embodiment of the application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims

1. A mid-to-long term runoff prediction method based on transverse and longitudinal attention, comprising:

Constructing a middle-long-term runoff prediction model based on transverse and longitudinal attention; the middle-long-term runoff prediction model based on transverse attention and longitudinal attention consists of a convolution layer, a sequence attention module, a period attention module, a fusion module and an output selection operation unit, wherein each sample input into the convolution layer is one Matrix of/>Is the number of rows of the matrix, including front/>Historical reference information of a line and lateral time sequence of last line,/>For the number of columns of a matrix, namely the length of the transverse time sequence, the convolution layer is used for performing one-dimensional convolution operation on the historical reference information row by row and splicing the historical reference information with the transverse time sequence, the sequence attention module is used for calculating transverse attention based on the output of the convolution layer and extracting sequence information, the periodic attention module is used for calculating longitudinal attention based on the output of the convolution layer and extracting periodic information, the fusion module is used for fusing the output of the sequence attention module and the output of the periodic attention module, the output selection operation unit is used for cutting the output of the fusion module, removing known sequence parts and only reserving a predicted sequence part as a runoff prediction result;

2. A method for long and medium runoff prediction based on horizontal and vertical attention as claimed in claim 1, wherein the input of the sequential attention module is thatAnd/>，/>And/>Is of the shape/>The input of the periodic attention module is/>And/>，/>For/>Transpose of/>Is thatTranspose of/>And/>Is of the shape/>The sequence attention module and the periodic attention module have the same composition and are composed of an input conversion layer, a position embedding operation unit, an encoder, a decoder and an output conversion operation unit;

3. The method for mid-long term runoff prediction based on horizontal and vertical attention according to claim 2, wherein the dimension-up operation is performed by the input conversion layer of the sequence attention module in the characteristic dimension and the dimension-up operation is performed by the input conversion layer of the periodic attention module in the sequence length dimension, and the method is implemented by the following formula:

4. A mid-long term runoff prediction method based on longitudinal attention as defined in claim 2 wherein the attention mechanisms of said multi-headed self-attention sub-layer and said multi-headed cross-attention sub-layer are expressed by the following formulas:

Wherein, Representing query vectors,/>Representing key vectors,/>Representing a value corresponding to a key vector,/>、/>、/>Respectively express/>、/>、/>Input of/>Representation/>Dimension in hidden space,/>Representation/>Transpose of/>Representing softmax operation, when/>The values of (1) are equal to each other,/>Represent self-attention, when/>Values of/>When the values of (a) are different,/>Representing cross attention,/>Representing the number of heads,/>Represents the/>Output of individual head,/>Representing the splice operation, when/>The values of (1) are equal to each other,/>Representing multi-headed self-attention, when/>Values of/>When the values of (a) are different,/>Representing the multi-headed cross-attention,And/>All represent a learnable parameter.

5. The method for mid-long term runoff prediction based on attention from horizontal to vertical according to any one of claims 1 to 4, wherein the training set, the verification set and the test set are used for performing multiple iterative training to convergence on the mid-long term runoff prediction model based on attention from horizontal to vertical, so as to obtain a trained mid-long term runoff prediction model based on attention from horizontal to vertical, comprising:

Inputting the training set into the middle-long-term runoff prediction model based on the transverse attention, performing repeated iterative training on the middle-long-term runoff prediction model based on the transverse attention, calculating a smooth Nash efficiency coefficient loss by using a forward propagation algorithm, and updating model parameters of the middle-long-term runoff prediction model based on the transverse attention by using a backward propagation algorithm based on the smooth Nash efficiency coefficient loss;

In each iteration, the Nash efficiency coefficient is used for measuring the performance of the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after the training of the current iteration times under the verification set, and after the preset iteration times are reached, the middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention, which has the best performance, is stored under the verification set and is used as a middle-long-term runoff prediction model based on the transverse attention and the longitudinal attention after the training is completed;

Inputting the test set into the training-completed middle-long-term runoff prediction model based on the transverse and longitudinal attention to obtain runoff prediction results corresponding to all the test data in the test set, and evaluating the runoff prediction results corresponding to all the test data by combining with real runoff data.

6. The method for mid-to-long term runoff prediction based on longitudinal attention as set forth in claim 5 wherein said calculating a smoothed nash efficiency coefficient loss is accomplished by the following equation:

the calculation formula of the Nash efficiency coefficient is as follows:

7. A mid-to-long term runoff prediction system based on lateral attention, comprising:

8. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of mid-long term runoff prediction based on attention from lateral to longitudinal as claimed in any one of claims 1 to 6.

9. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a method for long and medium term runoff prediction based on attention from lateral to longitudinal as claimed in any one of claims 1 to 6.