CN113962819A

CN113962819A - Method for predicting dissolved oxygen in industrial aquaculture based on extreme learning machine

Info

Publication number: CN113962819A
Application number: CN202111170371.0A
Authority: CN
Inventors: 施珮; 唐玥; 匡亮; 余晓栋; 孙宁; 陆松
Original assignee: Wuxi University
Current assignee: Wuxi University
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-01-21

Abstract

The invention discloses an extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture. Belongs to the technical field of aquaculture; the method comprises the following specific steps: data preprocessing, factor screening, IELM network model construction, test prediction method and prediction result output. The invention corrects the missing data by using a data preprocessing method; screening the index factors by using a Pearson correlation coefficient method, determining 8 indexes with strongest correlation with the dissolved oxygen concentration as input quantity of a prediction method, and dividing a preprocessed data set into a training set and a testing set; then, optimizing the initial weight and the threshold of the extreme learning machine by using an artificial bee colony algorithm to obtain an optimal parameter value, and constructing an IELM network model; finally, the obtained dissolved oxygen prediction value of the IELM is compared with the prediction result of the traditional ELM model in the test set, the prediction effect of the IELM prediction method is better, and the change trend of the dissolved oxygen in the industrial aquaculture can be predicted more accurately.

Description

Method for predicting dissolved oxygen in industrial aquaculture based on extreme learning machine

Technical Field

The invention belongs to the technical field of aquaculture, relates to a method for predicting dissolved oxygen in industrial aquaculture, and particularly relates to a method for predicting dissolved oxygen in industrial aquaculture based on an extreme learning machine.

Background

Industrial aquaculture provides new hopes for areas with limited natural resources in an industrial and intensive culture mode, and is an industry trend of the aquaculture industry. In industrial aquaculture, the balance and quality of water quality of a water body are particularly important, and the accurate control and prediction of dissolved oxygen are the center of gravity of the aquaculture work. How to obtain and effectively utilize the information of aquaculture water environment and meteorological environment to prevent and control the anoxic death of fish bodies is an important problem needing attention and research at present.

In the current dissolved oxygen prediction research, the traditional neural network and the support vector machine are the most studied prediction methods. However, conventional neural networks are not suitable for handling dissolved oxygen predictions for high-dimensional, small samples. The support vector machine has the problems of high computational complexity, low training speed and the like. As an effective prediction method, the extreme learning machine has quick learning capability and can overcome some defects in the traditional algorithm, but weight and threshold parameter selection of the method can influence the prediction accuracy of the dissolved oxygen, meanwhile, high-dimensional redundant network input can influence the prediction performance of the method, and no effective method can solve the problem of the extreme learning machine in dissolved oxygen prediction at present.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide an extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture, which analyzes related influence factors influencing the change of the concentration of the dissolved oxygen by utilizing a Pearson correlation coefficient method, effectively realizes redundant deletion of the predicted input quantity of the dissolved oxygen, and completes quick and accurate prediction of the dissolved oxygen in water by obtaining the optimal weight and threshold parameters of the extreme learning machine.

The technical scheme is as follows: the invention relates to an extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture, which comprises the following specific operation steps of:

(1) data preprocessing;

(2) factor screening, and determining a dissolved oxygen prediction data set;

(3) constructing an IELM network model;

(4) and testing the prediction method and outputting the prediction result.

Further, in step (1), the data preprocessing operation procedure is: deploying a dissolved oxygen sensor and a pH sensor in a test pond of an industrial aquaculture base, deploying an automatic weather station at the side of the pond, and acquiring water body parameter data and weather data in real time through a constructed wireless sensing network;

firstly, for small part of data with discontinuous loss, a linear difference method is adopted to complete the interpolation of the lost data, and the formula is as follows:

in the formula, x_kAnd x_k+jRespectively representing the monitored water quality data at known k time and k + j time, x_k+iThe water quality monitoring data value lost at the k + i moment is represented;

secondly, for different dimension data in the acquisition process, the Z-Score method is used for completing the standardization of the data set, and the formula is as follows:

wherein m represents the number of index variables, n represents the number of samples,

represents X_mnMean value of S_nRepresents X_mnIs markedStandard deviation, standard value Z obtained by standard processing of raw data_mnHas a mean value of 0 and a variance of 1.

Further, in step (2), the data set for dissolved oxygen prediction is determined by factor screening: analyzing the data by using a Pearson correlation coefficient method aiming at the normalized data set; removing factors influencing the small change of the dissolved oxygen concentration, and reserving factors influencing the large change of the dissolved oxygen concentration, thereby determining a data set of a prediction test;

the method mainly comprises the following steps of screening factors by a Pearson correlation coefficient method:

first, defining m influence factors, where n represents the number of samples, and then representing the matrix of influence factors by an n × m matrix:

secondly, calculating the Pearson correlation coefficient value between each influence factor and the dissolved oxygen concentration, wherein the calculation formula is as follows:

in the formula, x_i，y_iRespectively representing the ith elements in two correlation vectors x and y; l represents a variable length;

mean values of elements in vectors x and y, respectively;

and finally, after acquiring the Pearson correlation coefficient values between the factors and the dissolved oxygen concentration, removing the factors according to the principle that the correlation coefficient value is less than 0.1 and the correlation coefficient value is greater than 0.1 to complete the factor screening process.

Further, in step (3), the building of the IELM network model is: the network model comprises an input layer unit, a hidden layer unit and an output layer unit, the weight and the threshold of the extreme learning machine are optimized by utilizing an artificial bee colony algorithm to obtain the optimal initial values of the weight and the threshold of the extreme learning machine, and the specific operation process is as follows:

first, n samples are set to constitute a sample set (x)_i,t_i) (i ═ 1,2, …, n), m-dimensional feature x of the ith sample_i＝[x_i1,x_i2,…,x_im],t_i＝[t_i1,t_i2,…,t_im]If the number of hidden layers in the ELM network is l, the ELM network is:

in the formula, w_jRepresenting the weight of the input layer unit and the jth hidden layer unit; b_jRepresenting the biasing of the input layer elements from the hidden layer elements; beta is a_jRepresenting the output weight between the jth hidden layer unit and the output layer unit; g (x) activation function, selection Sigmoid function, of network

Is an activation function; let the output value of the ELM network equal the desired value, equation (5) above can be converted into:

the simplified equation (6) is in matrix form as follows:

Hβ＝T (7)

after the weight w and the bias in the network are randomly obtained, solving the weight beta between the hidden layer unit and the output layer unit by using a least square method, wherein the calculation expression is as follows:

β＝H⁺Y (9)

in the formula, H⁺Represents the generalized inverse of the output matrix H;

secondly, initializing the population in the artificial bee colony to generate k particles, namely k feasible solutions;

each particle has D ═ l · (n +1) elements, where l denotes the number of hidden layer elements, n denotes the number of input layer elements, and the size of each element is [ -1,1 [ ]]To (c) to (d); each particle represents a set of input weights and a threshold value of the hidden layer cell, namely [ w₁₁,w₁₂,…w_1L,w₂₁,w₂₂,…,w_2L,…,w_n1,w_n2,…w_nL,b₁,b₂,…b_L]From each feasible solution, a corresponding fitness value may be generated;

determining k/2 particles as employment bees, recording the optimal value and the corresponding employment bees, and using the rest particles as observation bees; and searching a new honey source in the neighborhood range to update the employment bees, wherein the updating calculation formula is as follows:

P′_j＝P_j+(P_j-P_N)*(rand-0.5)*2 (10)

of formula (II) to (III)'_jRepresenting updated employment bees, P_jRepresenting the original employed bee, P, before renewal_NRepresenting a randomly selected original hiring bee;

performing iterative movement according to the principle that the source fitness is better and the bee moves to a new honey source, using a roulette method to observe whether the bee follows the information of the employed bee and performing the iterative movement according to the probability

Executing a roulette method to select a honey source; according to an objective function f_iThe rule of whether it is greater than 0, fitness function f (σ)_i) Expressed as:

in the formula, delta_iDenotes the ith honey source, i belongs to {1,2,3, …, T }, T denotes the number of honey sources, f (delta)_i) Is shown asδ_iThe fitness of the position honey source; by comparison of f (delta)_i) Observing the honey source selected by the bees; when the maximum honey collection times are met, the fitness is still unsuccessfully updated, and the local optimal solution is found, the honey source is abandoned, a new hiring bee is obtained according to the formula (10), and the new honey source is continuously found for replacement; when the maximum iteration times are met, obtaining the optimal fitness value and the optimal particles in the optimizing process, and taking the optimal result as the input parameter weight and the threshold of the ELM network model;

finally, performing IELM neural network training; based on preprocessing and factor screening, applying the optimal parameter weight and threshold determined in the artificial bee colony algorithm optimizing process to the ELM network; selecting a training set to carry out network training, calculating a predicted value and a root mean square error under the time point of the corresponding training set to be recorded as RMSE,

and storing the well-trained IELM network model meeting the error condition.

Further, in step (4), the passing the test prediction method, so as to output the prediction result, means that: testing the prediction performance of the trained IELM network in a test set based on the trained IELM network model, and outputting the prediction result of the test set; and selecting a traditional ELM network model as a comparison algorithm, and outputting the prediction results of different algorithms in the test set.

Furthermore, the method utilizes the Pearson correlation coefficient method to carry out factor screening on the listed 11 index factors, eliminates the influence factors with small correlation, retains the influence factors with large correlation, avoids data redundancy and improves the precision and the efficiency of dissolved oxygen prediction.

Furthermore, the initial weight and the threshold parameter of the extreme learning machine are optimized by using the artificial bee colony algorithm, so that the problem that the extreme learning machine falls into local optimization in the optimizing process is avoided, and the precision of predicting the aquaculture dissolved oxygen is improved.

Has the advantages that: compared with the prior art, the method provided by the invention relates to 11 index parameters related to the dissolved oxygen concentration, which are collected in industrial aquaculture, and the missing data is corrected by using a data preprocessing method; screening the index factors by using a Pearson correlation coefficient method, determining 8 indexes with strongest correlation with the dissolved oxygen concentration as input quantity of a prediction method, and dividing a preprocessed data set into a training set and a testing set; then, optimizing the initial weight and the threshold of the extreme learning machine by using an artificial bee colony algorithm to obtain an optimal parameter value, and constructing an IELM network model; finally, the dissolved oxygen prediction value of the IELM is obtained in the test set, the prediction result of the IELM network model is compared with the prediction result of the traditional ELM model, the prediction effect of the IELM prediction method is better, and the change trend of the dissolved oxygen in the industrial aquaculture can be predicted more accurately.

Drawings

FIG. 1 is a flow chart of the operation of the present invention;

fig. 2 is a diagram showing the prediction result of dissolved oxygen in the IELM network model according to the present invention.

Detailed Description

The invention is further described below with reference to the following figures and specific examples.

As shown in the figure, the method for predicting the dissolved oxygen in the industrial aquaculture based on the extreme learning machine comprises the following specific operation steps:

(1) data preprocessing; deploying a dissolved oxygen sensor and a pH sensor in a test pond of an industrial aquaculture base, deploying an automatic weather station at the side of the pond, and acquiring water body parameter data and weather data in real time through a constructed wireless sensing network;

represents X_mnMean value of S_nRepresents X_mnThe standard deviation of (2), the normalized value Z obtained after the raw data are normalized_mnHas a mean value of 0 and a variance of 1;

the dissolved oxygen concentration of the water body is influenced by various water body parameter indexes and meteorological environment parameters, and in view of the culture experience of fishermen and the research experience of related personnel, different sensors are respectively selected from two parts of water body parameters and meteorological parameters for data acquisition in the test, wherein the two parts comprise dissolved oxygen, pH value, water temperature and CO₂Concentration, air pressure, temperature, humidity, wind speed, wind direction, illumination, photosynthetically active radiation and radiation illumination, thereby obtaining an initial data index system;

(2) factor screening, and determining a dissolved oxygen prediction data set; analyzing the data by using a Pearson correlation coefficient method aiming at the normalized data set; removing factors influencing small change of the dissolved oxygen concentration, and reserving factors influencing large change of the dissolved oxygen concentration;

mean values of elements in vectors x and y, respectively;

finally, after the Pearson correlation coefficient values between the factors and the dissolved oxygen concentration are obtained, the factors are removed according to the principle that the correlation coefficient value is smaller than 0.1 and the correlation coefficient value is larger than 0.1, and the screening process of the factors is finished;

in the model, the Pearson correlation coefficient of 11 factors influencing the change of the dissolved oxygen concentration of the water body is calculated, and 8 indexes of water temperature, pH value, humidity, temperature, illumination, wind speed, radiation illumination and photosynthetically active radiation are kept after factor screening to be used as the input quantity of the IELM dissolved oxygen prediction model;

(3) constructing an IELM network model; the network model comprises an input layer unit, a hidden layer unit and an output layer unit, the weight and the threshold of the extreme learning machine are optimized by utilizing an artificial bee colony algorithm to obtain the optimal initial values of the weight and the threshold of the extreme learning machine, and the specific operation process is as follows:

the simplified equation (6) is in matrix form as follows:

Hβ＝T (7)

β＝H⁺Y (9)

in the formula, H⁺Represents the generalized inverse of the output matrix H;

P′_j＝P_j+(P_j-P_N)*(rand-0.5)*2 (10)

in the formula, delta_iDenotes the ith honey source, i belongs to {1,2,3, …, T }, T denotes the number of honey sources, f (delta)_i) Is denoted by the number δ_iThe fitness of the position honey source; by comparison of f (delta)_i) Observing the honey source selected by the bees; when the maximum honey collection times are met, the fitness is still unsuccessfully updated, and the local optimal solution is found, the honey source is abandoned, a new hiring bee is obtained according to the formula (10), and the new honey source is continuously found for replacement; when the maximum iteration times are met, obtaining the optimal fitness value and the optimal particles in the optimizing process, and taking the optimal result as the input parameter weight and the threshold of the ELM network model;

and storing the trained IELM network model meeting the error condition;

in the model, water body dissolved oxygen data collected in a test period from 1/7/2019 to 30/7/2019 are selected for prediction; firstly, preprocessing collected meteorological data and water parameter data by using a data preprocessing method to obtain 4320 groups of data; the front 3888 group data of the data set is used as a training set, and the rear 432 group data is used as a testing set; after the factor screening is completed through the Pearson correlation coefficient, an input-output structure of the IELM prediction model is constructed by using the screened factor; finally, training the IELM network to finish the output of the training result;

(4) and the test prediction method outputs a prediction result: testing the prediction performance of the trained IELM network in a test set based on the trained IELM network model, and outputting the prediction result of the test set; selecting a traditional ELM network model as a comparison algorithm, and outputting prediction results of different algorithms in a test set;

predicting the dissolved oxygen concentration by using an IELM neural network model and a traditional ELM neural network model respectively to obtain a prediction result graph of the dissolved oxygen concentration in 432 groups of test set data in total; in the figure, the abscissa is the serial number of the test sample, and the ordinate is the dissolved oxygen concentration value; the prediction results of the two prediction models are combined to discover that the two prediction models can realize the prediction of the dissolved oxygen, but the prediction effects are greatly different; the dissolved oxygen prediction result of the IELM neural network model is closer to the actually measured dissolved oxygen concentration value; however, the fluctuation amplitude of the prediction curves of the two prediction models between the sample No. 140-185 and the sample No. 275-339 in the test set is obviously higher than that of the other positions; the time interval corresponds to 0 to 7 points of the day, is the time interval with the lowest dissolved oxygen concentration of water in one day, and has frequent respiration of microorganisms and plants in the water;

and (3) comparison analysis of the prediction model:

predicting the dissolved oxygen concentration of the IELM neural network prediction model from 7 months, 28 days to 30 days in 2019 to obtain a corresponding predicted value, a Root Mean Square Error (RMSE) and a mean relative error (MAE) based on the trained and tested IELM neural network prediction model; and the prediction results of the traditional ELM neural network model are compared with those of the IELM, and 24 groups of dissolved oxygen prediction results of each integral point of 7-month-28-day are listed in a limited space, as shown in Table 1.

TABLE 1 comparison of water dissolved oxygen prediction results for IELM and ELM prediction models

Time	Actual value	IELM prediction value	ELM prediction
				0:00	4.97	5.09	5.59
1:00	4.46	4.17	4.44
				2:00	3.70	3.41	4.15
3:00	3.52	3.30	3.85
				4:00	3.13	3.06	3.48
5:00	3.39	2.91	3.18
				6:00	2.88	2.75	2.88
7:00	2.68	2.45	2.87
				8:00	2.84	3.10	3.80
9:00	3.20	3.41	4.34
				10:00	3.63	3.92	4.06
11:00	4.04	4.10	4.02
				12:00	4.56	4.41	4.18
13:00	5.11	5.05	4.54
				14:00	5.72	5.54	5.66
15:00	6.42	5.74	6.23
				16:00	6.65	6.31	6.85
17:00	6.65	6.60	7.21
				18:00	6.69	6.17	7.05
19:00	5.91	5.60	5.98
				20:00	7.56	7.48	6.23
21:00	7.18	7.30	6.33
				22:00	6.49	6.86	6.65
23:00	5.29	5.84	6.25
				RMSE	/	0.35	0.64
MAE	/	0.25	0.44

According to the comparison result, when the IELM neural network model is used for predicting the dissolved oxygen concentration of the aquaculture water body, the predicted root mean square error value is 0.35 and is obviously lower than the predicted root mean square error value of the ELM neural network model by 0.64; meanwhile, the average relative error values of the IELM and ELM neural network model predicted values of the whole day of 28 days in 7 months are 0.25 and 0.44 respectively; therefore, the prediction precision and the prediction effect of the IELM network model are higher.

The invention takes the water body dissolved oxygen of industrial aquaculture as a research object, provides a prediction algorithm based on an extreme learning machine to predict the water body dissolved oxygen, utilizes the raw data standardized by a data preprocessing method to screen a plurality of influence factors influencing the change of the dissolved oxygen by using a Pearson correlation coefficient method, obtains the input quantity and the output quantity of a dissolved oxygen prediction model, then improves the extreme learning machine based on an artificial bee colony algorithm, and constructs an IELM neural network model; the method can effectively avoid the calculation problem caused by multi-input redundant information in the dissolved oxygen prediction, and solve the problem that the traditional ELM neural network is easy to fall into local optimum in the network training process, thereby improving the training speed and the prediction precision of the traditional ELM network model; the method can be used for predicting the dissolved oxygen in the industrial aquaculture production, so that scientific, reasonable and accurate prediction results can be obtained, the aquaculture production is guaranteed, and the aquaculture risk is reduced.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture is characterized by comprising the following specific operation steps:

(1) data preprocessing;

(2) factor screening, and determining a dissolved oxygen prediction data set;

(3) constructing an IELM network model;

(4) and testing the prediction method and outputting the prediction result.

2. The extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture according to claim 1,

in step (1), the data preprocessing operation process is as follows: deploying a dissolved oxygen sensor and a pH sensor in a test pond of an industrial aquaculture base, deploying an automatic weather station at the side of the pond, and acquiring water body parameter data and weather data in real time through a constructed wireless sensing network;

represents X_mnMean value of S_nRepresents X_mnThe standard deviation of (2), the normalized value Z obtained after the raw data are normalized_mnHas a mean value of 0 and a variance of 1.

3. The extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture according to claim 1,

in step (2), the data set for dissolved oxygen prediction is determined by factor screening: analyzing the data by using a Pearson correlation coefficient method aiming at the normalized data set; removing factors influencing small change of the dissolved oxygen concentration, and reserving factors influencing large change of the dissolved oxygen concentration;

representing the elements in both vectors x and y, respectivelyA value;

4. The extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture according to claim 1,

in step (3), the building of the IELM network model is: the network model comprises an input layer unit, a hidden layer unit and an output layer unit, the weight and the threshold of the extreme learning machine are optimized by utilizing an artificial bee colony algorithm to obtain the optimal initial values of the weight and the threshold of the extreme learning machine, and the specific operation process is as follows:

the simplified equation (6) is in matrix form as follows:

Hβ＝T (7)

β＝H⁺Y (9)

in the formula, H⁺Represents the generalized inverse of the output matrix H;

P′_j＝P_j+(P_j-P_N)*(rand-0.5)*2 (10)

performing iterative movement according to the principle that the source fitness is better and the source is moved to a new honey source, and using a wheelThe betting board method is used for observing whether bees follow the information of the employed bees or not and according to the probability

and storing the well-trained IELM network model meeting the error condition.

5. The extreme learning machine-based method for predicting dissolved oxygen in industrial aquaculture according to claim 1,

in step (4), the passing of the test prediction method and the outputting of the prediction result means: testing the prediction performance of the trained IELM network in a test set based on the trained IELM network model, and outputting the prediction result of the test set; and selecting a traditional ELM network model as a comparison algorithm, and outputting the prediction results of different algorithms in the test set.