CN115879039A - Quantitative analysis method for element content by combining support vector regression with gravity search - Google Patents

Quantitative analysis method for element content by combining support vector regression with gravity search Download PDF

Info

Publication number
CN115879039A
CN115879039A CN202211396783.0A CN202211396783A CN115879039A CN 115879039 A CN115879039 A CN 115879039A CN 202211396783 A CN202211396783 A CN 202211396783A CN 115879039 A CN115879039 A CN 115879039A
Authority
CN
China
Prior art keywords
gsa
data
detected
content
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211396783.0A
Other languages
Chinese (zh)
Inventor
李福生
樊佳婧
杨婉琪
吕树彬
赵彦春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202211396783.0A priority Critical patent/CN115879039A/en
Publication of CN115879039A publication Critical patent/CN115879039A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Analysing Materials By The Use Of Radiation (AREA)

Abstract

The invention belongs to the field of X-ray fluorescence (XRF) element quantitative analysis, and discloses a method for quantitatively analyzing the element content by combining Support Vector Regression (SVR) with Gravity Search Algorithm (GSA), which comprises the following steps: determining an element to be detected, and acquiring XRF spectral data of a sample to be detected by using a spectrometer; determining peak information of the element to be measured based on the spectral data; and constructing a GSA-SVR model, training the GSA-SVR model by using a data set, and predicting the content of the element to be detected by using the trained GSA-SVR model based on the peak information of the element to be detected. After normalization processing is carried out on a data set, the data is divided into a training set and a test set, an SVR prediction model is built by utilizing the training set data, the performance of the model is predicted by the test set, a GSA-SVR model is built based on training sample data obtained after GSA optimization, and quantitative analysis of elements is realized by the model. The element content quantitative analysis based on GSA-SVR can be widely applied to the field of XRF quantitative analysis of elements.

Description

Quantitative analysis method for element content by combining support vector regression with gravity search
Technical Field
The invention belongs to the field of quantitative analysis of elements of an X-ray fluorescence instrument, and particularly relates to a quantitative analysis method for the content of elements by combining support vector regression with gravity search.
Background
Currently, in XRF-based quantitative elemental analysis, for the calculation of the content of soil elements, a conventional analysis method is to establish a calibration model by measuring the intensities of characteristic peaks corresponding to elements in a spectrum, and perform fitting analysis on the content of elements, for example: partial Least Squares Regression (PLSR), and the like. In the actual prediction of the XRF element content, due to interference of many nonlinear factors in the XRF spectrum on the spectral data, such as interference by spectral lines emitted from the x-ray tube, the sample and the optical path with similar wavelengths, the XRF spectrum may also come from the target material itself, including emission lines of the target element and related impurities (e.g., copper in the tungsten target), and interference lines emitted from other elements in the sample. At the moment, the conventional linear analysis method has certain defects and is not ideal in accuracy. In this case, some non-linear algorithms, such as: algorithms such as a Convolutional Neural Network (CNN), a support vector machine regression (SVR), a Radial Basis Function (RBF) neural network, a BP neural network (BP) and the like have been widely used in the field of XRF element quantitative analysis due to their advantages of strong adaptive ability, capability of well processing multi-element nonlinear data and the like. The support vector machine regression (SVR) algorithm has better generalization and prediction capabilities than other algorithms, and provides an accurate prediction model for a small sample set.
Support Vector Machines (SVMs) are themselves proposed for the two-classification problem, while SVR (support vector regression) is an important branch of application in SVMs (support vector machines). SVR regression differs from SVM classification in that the sample points of SVR are ultimately of one type, and the optimal hyperplane it seeks is not the "most open" of the two or more types of sample points as SVM does, but rather minimizes the total deviation of all sample points from the hyperplane. Reference vector regression is a supervised learning algorithm for predicting discrete values, and the basic idea of SVR is to find the best fit line. In SVR, the best-fit straight line is the hyperplane with the highest estimate. Advantages of support vector regression include robustness to outliers, easy update of decision models, good generalization capability, high prediction accuracy and ease of implementation.
Through the above analysis, the problems and defects of the prior art are as follows: the existing soil element content analysis method has large analysis error and inaccurate and unreliable analysis result.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a quantitative analysis method for the element content by combining support vector regression with gravity search, wherein the support vector regression has the advantages of robustness to abnormal values, easiness in updating a decision model, good generalization capability, high prediction precision and easiness in implementation, and the gravity search algorithm realizes optimization of support vector regression C and g so as to solve the problem of influence of redundant features on model precision in the existing SVR modeling process.
The invention is realized in such a way that the element content quantitative analysis method combining support vector regression and gravity search comprises the following steps:
step one, selecting an element a to be detected, and collecting n selected samples;
determining elements to be detected, acquiring XRF (X-ray fluorescence) spectrum data of a sample to be detected by using a spectrometer, and normalizing the data; determining peak information of the element to be measured based on the spectral data;
thirdly, selecting the peak information and the content information of the element a to be detected and the peak information and the content information of m interference elements of the element a to be detected on the basis of the XRF spectrum data obtained in the second step to obtain a target sample set containing p characteristics of the XRF spectrum;
step four, constructing a GSA-SVR model, training the GSA-SVR model by using a data set, and predicting the content of the element to be detected based on the peak value information of the element to be detected by using the trained GSA-SVR model;
on the basis of the data in the third step, dividing a training sample set and a testing sample set; the first k target sample data are training sample sets, wherein peak data of an element a to be detected is used as input data of the SVM model, and the content of the element a to be detected is output data of the model; the next n-k target sample data are test sample sets, wherein peak data of the element a to be tested is used as input data of the SVM model, and the content of the element a to be tested is output data of the model;
step five, training and constructing a GSA-SVR model based on the k training sample sets in the step five;
substituting the input data of the n-k test sample sets in the step five into the GSA-SVR model trained in the step five for prediction to obtain content prediction results of the element a to be detected in the n-k test sample sets;
step seven, performing inverse normalization on the content prediction result of the element a to be detected obtained in the step six;
further, each XRF spectrum data obtained in step two is obtained by ED-XRF fluorescence spectrometer test.
Further, before the GSA-SVR model is constructed and trained by using the data set, the following steps are required:
collecting a plurality of selected samples containing elements to be detected; for the selected sample, acquiring XRF (X-ray fluorescence) spectrum data of a target sample by using a spectrometer, and normalizing the spectrum data;
screening peak value information and content information of an element to be detected and peak value information and content information of a plurality of interference elements of the element to be detected based on the spectrum data to obtain a target sample set containing a plurality of characteristics of the XRF spectrum;
dividing a target sample set containing a plurality of characteristics of the obtained XRF spectrum to obtain a training sample set and a testing sample set;
the dividing the target sample set containing a plurality of features of the obtained XRF spectrum into a training sample set and a test sample set includes:
dividing first k target sample data of a target sample set containing p characteristics of an XRF spectrum into a training sample set;
taking the next n-k target sample data of a target sample set containing p characteristics of the XRF spectrum as a test sample set; where n represents the number of selected samples.
Further, the training of the GSA-SVR model by using the data set comprises:
firstly, determining the kernel function of the support vector machine as a Gaussian kernel function:
K(x i ,x j )=exp(-g||x i -y j || 2 );
wherein, K (x) i ,x j ) Representing a kernel function; g represents a kernel function parameter; y is i Representing the content value of the element a to be tested in the ith test sample set;
secondly, carrying out parameter optimization on the support vector machine by using a GSA algorithm to obtain an optimal punishment parameter C and a kernel function parameter g;
the parameter optimization of the support vector machine by using the GSA algorithm comprises the following steps:
initializing parameters of a Support Vector Machine (SVM), and setting C and g in the SVM as GSA (global system for mobile communications) optimized values;
and training the SVM model by using the training samples according to the C and g initialization positions and speeds to obtain the GSA-SVR model with the optimal C and g.
Further, the parameter optimization of the support vector machine by using the GSA algorithm comprises the following steps:
respectively initializing position and speed in a solution space and a speed space, and setting iteration times, wherein the position represents the solution of the problem; determining the mass and the gravity of each individual by evaluating the objective function value of each individual, calculating the acceleration, and updating the speed and the position to obtain a GSA-SVR model with optimal C and g;
the step of determining the mass and the gravity of each individual, calculating the acceleration, and updating the speed and the position to obtain the GSA-SVR model with the optimal C and g comprises the following steps:
1) Calculate the mass of individual i:
Figure BDA0003926135730000041
Figure BDA0003926135730000042
wherein, fit i (t) and M i (t) respectively representing the fitness function value and the quality of the ith individual at the tth iteration; best (t) and worst (t) represent the best fitness function value and the worst fitness function value among all individuals at the tth iteration:
Figure BDA0003926135730000043
2) Calculating the attraction force:
Figure BDA0003926135730000044
Figure BDA0003926135730000045
wherein G (t) represents the value of universal gravitation constant in t iterations, and M aj (t) is the active gravitational mass associated with individual j, M pi (t) is the passive gravitational mass associated with the individual i, R ij (t) denotes the Euclidean distance between individuals i and j, R ij (t)=||X i (t),X j (t)|| 2 ε is a constant used to prevent the denominator from being zero; rand j Is represented by [0,1]]The method comprises the following steps that a random variable which is uniformly distributed is obeyed, kbest represents the first k individuals with the individual quality arranged in a descending order, the value of k is linearly reduced along with the iteration times, the initial value is N, and the final value is 1;
3) Calculating the acceleration:
Figure BDA0003926135730000051
wherein, M ii (t) represents the inertial gravity of the individual i at the tth iteration;
4) Update speed and position:
Figure BDA0003926135730000052
Figure BDA0003926135730000053
wherein r represents a random variable subject to uniform distribution between [0,1 ];
5) Judging whether the maximum iteration number is reached or the precision requirement is met, and if the maximum iteration number is met, outputting a GSA-SVR model with optimal C and g; otherwise, returning to the iteration again until the maximum iteration times are obtained or the accuracy requirement is met, and outputting the GSA-SVR model with the optimal C and g.
Further, the method for quantitatively analyzing the element content by combining the support vector regression with the gravity search further comprises the following steps:
testing the trained GSA-SVR model by using the test sample set to obtain a test result; carrying out reverse normalization processing on the obtained test result;
and evaluating the GSA-SVR model by calculating the mean square error and the goodness of fit of the GSA-SVR model.
Further, the calculating the mean square error and the goodness of fit of the GSA-SVR model comprises:
Figure BDA0003926135730000054
Figure BDA0003926135730000055
wherein, y i Representing the content value of the element a to be tested in the ith test sample set,
Figure BDA0003926135730000061
representing the content predicted value of the element a to be tested in the ith test sample set after the reverse normalization treatment; />
Figure BDA0003926135730000062
Representing the average value of the content true values of the element a to be detected in all the test sample sets; r =1,2.
Another object of the present invention is to provide a system for quantitatively analyzing element content by combining support vector regression with gravity search, which implements the method for quantitatively analyzing element content by combining support vector regression with gravity search, the system for quantitatively analyzing element content by combining support vector regression with gravity search comprising:
the device comprises a to-be-detected element spectrum data acquisition module, a spectrum analyzer and a spectrum analysis module, wherein the to-be-detected element spectrum data acquisition module is used for determining to-be-detected elements and acquiring XRF spectrum data of to-be-detected samples by the spectrum analyzer;
the peak information extraction module is used for determining the peak information of the element to be detected based on the spectral data;
the model construction module is used for constructing a GSA-SVR model;
the data set construction module is used for acquiring sample data containing elements to be detected and spectral data of the sample; screening peak value information and content information of a target element and peak value information and content information of a plurality of interference elements of an element to be detected based on the spectrum data to obtain a target sample set containing a plurality of characteristics of the XRF spectrum;
the model training module is used for training the constructed GSA-SVR model by combining the GSA with a training sample set;
and the content prediction module is used for predicting the content of the element to be detected based on the peak value information of the element to be detected by utilizing the trained GSA-SVR model.
Another object of the invention is to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method for quantitative analysis of elemental content in support vector regression combined with gravity search.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the quantitative analysis of elemental content method in support vector regression in combination with gravity search.
Another object of the present invention is to provide an information data processing terminal for implementing the system for quantitative analysis of element content by support vector regression in combination with gravity search.
By combining the technical scheme and the technical problem to be solved, the technical scheme to be protected by the invention has the advantages and positive effects that:
based on the characteristic that in the element quantitative analysis of an XRF spectrum, the sample characteristics and the element content have definite corresponding relation, the training and prediction process of the model is completed by respectively dividing the training set and the testing set, but because the sample characteristic dimension is high, redundant characteristic information exists, the training time is long by utilizing the original characteristic information and the like, the invention introduces a GSA algorithm to search for the optimal parameters C and g on the basis of SVR. After normalization processing is carried out on a data set, the data is divided into a training set and a test set, an SVR prediction model is built by using the training set data, and then quantitative analysis of elements is realized by the performance of the test set prediction model and a GSA-SVR model built based on training sample data obtained after GSA optimization. Compared with the prior art, the element content quantitative analysis based on GSA-SVR has higher estimation precision, and is a reliable method for improving the prediction precision in XRF element quantitative analysis. Can be widely applied to the XRF quantitative analysis field of elements.
The invention provides an element content quantitative analysis method combining support vector regression and gravity search, belongs to supervised machine learning, and carries out modeling work on the premise that XRF spectral data exist and the nonlinear relation between the characteristic and the element content can be acquired through data.
Drawings
FIG. 1 is a schematic diagram of a method for quantitative analysis of element content by support vector regression in combination with gravity search according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for quantitative analysis of element content by support vector regression in combination with gravity search according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an optimization training of a GSA-SVR model provided by an embodiment of the present invention;
FIG. 4 is a diagram of the effect of element analysis of a GSA-SVR model provided by an embodiment of the present invention;
fig. 5 is a diagram of an effect of element analysis of an SVM model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
This section is an illustrative example developed to explain the claims in order to enable those skilled in the art to fully understand how to implement the present invention.
As shown in fig. 1 to fig. 3, the method for quantitatively analyzing element content by combining support vector regression and gravity search provided in the embodiment of the present invention completes quantitative analysis of heavy metal element content in soil by using the analysis method, including:
s101, determining an element to be detected, and acquiring XRF (X-ray fluorescence) spectrum data of a sample to be detected by using a spectrometer; determining peak information of the element to be measured based on the spectral data;
s102, constructing a GSA-SVR model, training the GSA-SVR model by using a data set, and predicting the content of the element to be detected by using the trained GSA-SVR model based on the peak value information of the element to be detected.
The method for quantitatively analyzing the element content by combining the support vector regression with the gravity search specifically comprises the following steps:
step 1: selecting an element a to be detected, and collecting n selected samples;
and 2, step: aiming at the selected sample obtained in the step 1, measuring XRF spectrum data of a target sample through a spectrometer, and normalizing the data; the individual XRF spectral data were obtained from ED-XRF fluorescence spectroscopy.
And 3, step 3: selecting peak information and content information of the element a to be detected and peak information and content information of m interference elements of the element a to be detected on the basis of the XRF spectrum data obtained in the step 2 to obtain a target sample set containing p characteristics of the XRF spectrum;
and 4, step 4: on the basis of the data in the step 3, dividing a training sample set and a testing sample set; the first k target sample data are training sample sets, wherein peak data of an element a to be detected is used as input data of the SVM model, and the content of the element a to be detected is output data of the model; the next n-k target sample data are used as a test sample set, wherein peak data of an element a to be tested is used as input data of the SVM model, and the content of the element a to be tested is output data of the model;
and 5: training and constructing a GSA-SVR model based on the k training sample sets in the step 5:
step 5.1:
the SVM projects the samples into a high-dimensional space through a kernel function, and finds an optimal classification hyperplane in the high-dimensional space to separate different samples as far as possible according to the maximum interval. When the sample is not completely separable, a soft interval needs to be introduced to alleviate the problem. The soft interval allows the SVM to make mistakes in some classification of the dispute samples, with the optimization objective as shown in equation (3)
Figure BDA0003926135730000091
s.t.y·(ω·x+b)≥1-ξ i ,i=1,…,N
In the formula, omega is a normal vector of the maximum interval hyperplane; (ii) a i is a relaxation variable; (ii) a For the deviation, C is a penalty parameter, which determines the penalty degree of the SVM for the erroneous samples, is used to realize the compromise between the maximum classification interval and the minimum number of the erroneous classification samples, and is a parameter having a significant influence on the classification performance of the SVM.
And step 5.2:
in order to solve the convex quadratic programming problem, after a Lagrange function and a kernel function are introduced, the formula (()) is converted into a formula (()
K(x i ,x j )=exp(-g||x i -y j ||| 2 )
Figure BDA0003926135730000092
In the formula of alpha i Is Lagrange multiplier, K (x) i ,x j ) A kernel function.
Step 5.3:
the invention selects a Gaussian radial basis kernel function with universality as a kernel function of the SVM, and the function expression of the kernel function is shown as the formula (()
K(x i ,x j )=exp(-g||x i -y j || 2 ) (5)
In the formula, g is a kernel function parameter, and the action range of a Gaussian kernel is controlled, so that the method is another important parameter which has a remarkable influence on the classification capability of the SVM. Aiming at different classification objects, the parameter combinations of C and g with the best classification performance of the SVM are different, the SVM cannot optimize the C and the g, and other algorithms need to be introduced.
Step 5.4: the gravity search algorithm first initializes position and velocity in a solution space and a velocity space, respectively, and sets iteration times, wherein the position represents a solution of the problem. For example, the position and velocity of the ith search individual in the d-dimensional space are respectively expressed as:
Figure BDA0003926135730000101
Figure BDA0003926135730000102
wherein,
Figure BDA0003926135730000103
and &>
Figure BDA0003926135730000104
Respectively representing the position component and velocity component of the individual i in d-dimension. By evaluating the objective function values of the individual individuals, the mass and the gravitational force experienced by each individual are determined, the acceleration is calculated, and the velocity and position are updated.
Step 5.5: calculating mass
The mass of an individual i is defined as follows:
Figure BDA0003926135730000105
Figure BDA0003926135730000106
therein, fit i (t) and M i (t) respectively representing the fitness function value and the quality of the ith individual at the tth iteration; best (t)) worst (t)) shows the best fitness function value and the worst fitness function value among all individuals at the tth iteration, which for the minimization problem is defined as follows:
Figure BDA0003926135730000107
Figure BDA0003926135730000108
step 5.6: calculating gravity
The algorithm is derived from simulation of the law of universal gravitation, but is not limited to the precise expression of universal gravitation formula in physics. In the d-th dimension, the attraction of an individual j to an individual i is defined as follows:
Figure BDA0003926135730000111
wherein G (t) represents the value of the universal gravitation constant in t iterations, and M aj (t) is the active gravitational mass associated with individual j, M pi (t) is the passive gravitational mass associated with the individual i, R ij (t) denotes the Euclidean distance between individuals i and j, R ij (t)=||X i (t),X j (t)|| 2 ε is a constant that prevents the denominator from being zero.
In the d-dimension, the resultant force experienced by the individual i is:
Figure BDA0003926135730000112
wherein, rand j Is represented by [0,1]]And a random variable which is uniformly distributed is obeyed between the two individuals, kbest represents that the individual mass of the first k individuals is arranged in a descending order, the value of k is linearly reduced along with the iteration times, the initial value is N, and the final value is 1.
Step 5.7: calculating acceleration
According to Newton's second law, the acceleration equation of an individual i in the d-th dimension is:
Figure BDA0003926135730000113
M ii (t) is the inertial gravity of the individual i at the t-th iteration.
Step 5.8: update speed and position
Figure BDA0003926135730000114
Figure BDA0003926135730000115
Where r represents a random variable subject to uniform distribution between 0, 1.
Step 5.9: and (4) ending when the maximum iteration times are reached or the precision requirement is met, outputting the optimal solution, and returning to the step 5.2 to enter the next iteration.
In step 5, the process of training and constructing the GSA-SVR model provided by the embodiment of the present invention is as follows:
and adopting a Gaussian kernel function as a kernel function of the support vector machine, wherein the SVM is influenced by a penalty factor C and a Gaussian kernel parameter g. And the GSA algorithm is used for optimizing parameters of the SVM, and the optimal parameters C and g are searched through continuous iterative optimization, so that the accuracy of the model is improved, and the error recognition rate is reduced. The GSA-SVR algorithm process is as follows:
step 5.10: initializing parameters of the SVM, and setting C and g in the SVM as GSA optimization values.
Step 5.11: and training the SVM model by using the training samples according to the C and g initial positions and speeds.
Step 5.12: and predicting the test sample by a GSA-SVR model, and evaluating the performance of the model.
And 6: substituting the input data of the n-k test sample sets in the step 5 into the GSA-SVR model trained in the step 5 for prediction to obtain the content prediction result of the element a to be detected in the n-k test sample sets;
and 7: performing inverse normalization on the content prediction result of the element a to be detected obtained in the step 6;
and step 8: calculating Mean-Square error (MSE) and goodness-of-fit (GoodnessofFit, R) of GSA-SVR model 2) Two performance indexes are as follows:
Figure BDA0003926135730000121
Figure BDA0003926135730000122
wherein, y i The content value of the element a to be tested in the ith test sample set,
Figure BDA0003926135730000123
for the content predicted value of the element a to be detected in the ith test sample set after the inverse normalization treatment, the value is determined>
Figure BDA0003926135730000124
The average value of the content true values of the element a to be detected in all the test sample sets is obtained; .
In order to prove the creativity and the technical value of the technical scheme of the invention, the part is the application example of the technical scheme of the claims on specific products or related technologies.
The method for quantitatively analyzing the element content by combining the support vector regression with the gravity search, provided by the embodiment of the invention, is applied to the analysis of the heavy metal elements in the soil, and comprises the following specific steps:
step 1: the Cu element was designated as element a to be measured and n =57 national standard samples were used as selected samples, the instrument used in this example being a hand-held ED-XRF spectrometer manufactured by taconid, model No. TS-XH4000-SOIL, the X-ray tube parameters of the apparatus under normal operation being 45KV and 25uA. 2048 full channel spectral patterns for all samples measured by an ED-XRF spectrometer instrument step 2: normalizing the peak value and the content information selected in the step 1 to a [0,1] interval; the resulting XRF spectral data were obtained from ED-XRF fluorescence spectroscopy.
And step 3: taking out peak value and content information of the element Cu to be detected and peak value information of 5 interference elements Fe, ni, P, co and Mn corresponding to the interference of the element Cu to be detected from the normalized data in the step 2 to obtain an original data set A;
and 4, step 4: on the basis of the data in the step 3, dividing a training sample set and a testing sample set; the first 45 target sample data are training sample sets, wherein peak data of the element Cu to be detected is used as input data of the SVM model, and the content of the element Cu to be detected is output data of the model; the last 12 target sample data are test sample sets, wherein peak data of the element Cu to be tested is used as input data of the SVM model, and the content of the element Cu to be tested is output data of the model;
and 5: training and constructing a GSA-SVR model based on the 45 training sample sets in the step 5;
the GSA-SVR model is constructed as follows:
step 5.1:
the SVM projects the samples into a high-dimensional space through a kernel function, and finds an optimal classification hyperplane in the high-dimensional space to separate different samples as far as possible according to the maximum interval. When the sample is not completely separable, a soft gap needs to be introduced to alleviate the problem. The soft interval allows the SVM to make mistakes in the classification of some dispute samples, and the optimization objective is as shown in equation (1: (1) (b))
Figure BDA0003926135730000131
s.t.y·(ω·x+b)≥1-ξ i ,i=1,…,N
In the formula, omega is a normal vector of the maximum interval hyperplane; ξ; is a relaxation variable; b; and the deviation C is a penalty parameter, determines the penalty degree of the SVM to the error samples, is used for realizing the compromise between the maximum classification interval and the minimum number of the error classification samples, and is a parameter which has obvious influence on the classification performance of the SVM.
Step 5.2:
in order to solve the convex quadratic programming problem, after Lagrange function and kernel function are introduced, the dual principle is utilized, and the formula (1 (turn) is the formula (2: (2) ((R))
K(x i ,x j )=exp(-g||x i -y j || 2 )
Figure BDA0003926135730000141
In the formula alpha i Is Lagrange multiplier, K (x) i ,x j ) A kernel function.
Step 5.3:
the invention selects a universal Gaussian radial basis kernel function as a kernel function of the SVM, and the function expression is as the formula (3) (C)
K(x i ,x j )=exp(-g||x i -y j || 2 ) (3)
In the formula, g is a kernel function parameter, controls the action range of a gaussian kernel, and is another important parameter which has a significant influence on the classification capability of the SVM. Aiming at different classification objects, the parameter combinations of C and g with the best classification performance of the SVM are different, the SVM can not optimize the C and the g, and other algorithms need to be introduced.
Step 5.4: the gravity search algorithm first initializes position and velocity in a solution space and a velocity space, respectively, and sets iteration times, wherein the position represents a solution of the problem. For example, the position and velocity of the ith search individual in the d-dimensional space are respectively expressed as:
Figure BDA0003926135730000142
Figure BDA0003926135730000143
wherein,
Figure BDA0003926135730000144
and &>
Figure BDA0003926135730000145
Respectively representing the position component and velocity component of the individual i in d-dimension. By evaluating the objective function values of the individual individuals, the mass and the gravitational force experienced by each individual is determined, the acceleration is calculated, and the velocity and position are updated.
And step 5.5: calculated mass
The mass of an individual i is defined as follows:
Figure BDA0003926135730000146
/>
Figure BDA0003926135730000147
wherein, fit i (t) and M i (t) respectively representing the fitness function value and the quality of the ith individual at the tth iteration; best (t) and worst (t) represent the optimal fitness function value and the worst fitness function value in all individuals at the tth iteration, which for the minimization problem is defined as follows:
Figure BDA0003926135730000151
Figure BDA0003926135730000152
step 5.6: calculating gravity
The algorithm is derived from simulation of the law of universal gravitation, but is not limited to the precise expression of the universal gravitation formula in physics. In the d-th dimension, the attraction of an individual j to an individual i is defined as follows:
Figure BDA0003926135730000153
wherein G (t) represents the value of universal gravitation constant in t iterations, and M aj (t) is the active gravitational mass associated with individual j, M pi (t) is the passive gravitational mass associated with the individual i, R ij (t) denotes the Euclidean distance between individuals i and j, R ij (t)=||X i (t),X j (t)|| 2 ε is a constant that prevents the denominator from being zero.
In the d-dimension, the resultant force experienced by the individual i is:
Figure BDA0003926135730000154
wherein, rand j Is shown in[0,1]A random variable obeying uniform distribution, kbest represents the first k individuals with the mass of the individuals in descending order, an]And the value of k is linearly reduced along with the iteration times, the initial value is N, and the final value is 1.
Step 5.7: calculating acceleration
According to Newton's second law, the acceleration equation of an individual i in the d-th dimension is:
Figure BDA0003926135730000155
M ii (t) is the inertial gravity of the individual i at the t-th iteration.
Step 5.8: update speed and position
Figure BDA0003926135730000156
Figure BDA0003926135730000157
Where r represents a random variable subject to uniform distribution between 0, 1.
Step 5.9: and (4) ending when the maximum iteration times are reached or the precision requirement is met, outputting the optimal solution, and returning to the step 5.2 to enter the next iteration.
The process of training and constructing the GSA-SVR model is as follows:
and adopting a Gaussian kernel function as a kernel function of the support vector machine, wherein the SVM is influenced by a penalty factor C and a Gaussian kernel parameter g. And the GSA algorithm is used for optimizing parameters of the SVM, and optimal parameters C and g are searched through continuous iterative optimization, so that the accuracy of the model is improved, and the error recognition rate is reduced. The GSA-SVR algorithm process is as follows:
step 5.10: initializing parameters of the SVM, and setting C and g in the SVM as GSA optimization values.
Step 5.11: and (5) according to the C and g initialization positions and speeds, training the SVM model by using the training samples.
Step 5.12: and predicting the test sample by a GSA-SVR model, and evaluating the performance of the model.
Step 6: substituting the input data of the 12 test sample sets in the step 5 into the GSA-SVR model trained in the step 5 for prediction to obtain the content prediction result of the element a to be detected in the 12 test sample sets;
and 7: performing inverse normalization on the content prediction result of the element Cu to be detected obtained in the step 6;
and step 8: calculating Mean-Square error (MSE) and goodness-of-fit (GoodnessofFit, R) of GSA-SVR model 2) Two performance indexes are as follows:
Figure BDA0003926135730000161
Figure BDA0003926135730000162
wherein, y i For the content value of the element Cu to be tested in the ith test sample set,
Figure BDA0003926135730000163
for the content prediction value of the element Cu to be tested in the ith test sample set subjected to the inverse normalization treatment, determining whether the element Cu is in the normal condition or not>
Figure BDA0003926135730000164
The average value of the actual content values of the element Cu to be detected in all the test sample sets is obtained; r =1,2.
The embodiment of the invention has some positive effects in the process of research and development or use, and indeed has great advantages compared with the prior art, and the following contents are described by combining data, graphs and the like in the experimental process.
The prediction results of the element Cu to be detected in the soil obtained in the embodiment and the comparative example 1 are shown in fig. 4 to 5.
The index data is evaluated using the following table comparing the examples to the training and prediction sets of comparative example 1:
TABLE 1 Cu element content training set and prediction set performance index of GSA-SVR and SVM models
Figure BDA0003926135730000171
Table 1 shows R of a training set and a test set for predicting Cu content by GSA-SVR and SVM models 2 And two model performance evaluation indexes of MSE. As shown in the table, the GSA-SVR training set R is adopted 2 And MSE 0.992 and 0.005, respectively, test set R 2 And RMSE of 0.988 and 0.008, respectively, without using the training set R of the SVM model of GSA 2 And RMSE 0.918 and 0.039, respectively, test set R 2 The RMSE is 0.886 and 0.042 respectively, which shows that the prediction performance of the established GSA-SVR model on Cu elements is remarkably improved compared with that of an SVM model, and modeling data can be well fitted; and the MSE values of the GSA-SVR and the SVM models are small, and the error is within an acceptable range.
In addition, compared with the training set, the model evaluation indexes of the test set have little difference, which shows that the model has no overfitting and has better generalization performance. By comparing the model evaluation indexes of the GSA-SVR and the SVM, the test set adopts the GSA-SVR to model R 2 At least 11% of the increase, the MSEs are all smaller than the RFR model, and as can be seen from FIGS. 4-5, the estimated qualification rate of the GSA-SVR is higher. In conclusion, the element content quantitative analysis based on GSA-SVR has higher estimation precision, and is a reliable method for improving the prediction precision in XRF element quantitative analysis.
It should be noted that embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portions may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present invention disclosed in the present invention should be covered within the scope of the present invention.

Claims (10)

1. A quantitative analysis method for element content combining support vector regression and gravity search is characterized by comprising the following steps:
determining an element to be detected, and acquiring XRF spectral data of a sample to be detected by using a spectrometer; determining peak information of the element to be measured based on the spectral data;
and constructing a GSA-SVR model, training the GSA-SVR model by using a data set, and predicting the content of the element to be detected by using the trained GSA-SVR model based on the peak information of the element to be detected.
2. The method for quantitative analysis of element content by support vector regression combined with gravity search as claimed in claim 1, specifically comprising:
step one, selecting an element a to be detected, and collecting n selected samples;
determining elements to be detected, acquiring XRF (X-ray fluorescence) spectrum data of a sample to be detected by using a spectrometer, and normalizing the data; determining peak information of the element to be measured based on the spectral data;
thirdly, selecting the peak information and the content information of the element a to be detected and the peak information and the content information of m interference elements of the element a to be detected on the basis of the XRF spectrum data obtained in the second step to obtain a target sample set containing p characteristics of the XRF spectrum;
step four, constructing a GSA-SVR model, training the constructed GSA-SVR model by using a data set, and predicting the content of the element to be detected based on the peak value information of the element to be detected by using the trained GSA-SVR model; on the basis of the data in the third step, dividing a training sample set and a testing sample set; the first k target sample data are training sample sets, wherein peak data of an element a to be detected is used as input data of the SVM model, and the content of the element a to be detected is output data of the model; the next n-k target sample data are test sample sets, wherein peak data of the element a to be tested is used as input data of the SVM model, and the content of the element a to be tested is output data of the model;
step five, training and constructing a GSA-SVR model based on the k training sample sets in the step five;
substituting the input data of the n-k test sample sets in the step five into the GSA-SVR model trained in the step five for prediction to obtain content prediction results of the element a to be detected in the n-k test sample sets;
and seventhly, performing inverse normalization on the content prediction result of the element a to be detected obtained in the sixth step.
3. The method for quantitative analysis of element content by support vector regression in combination with gravity search as claimed in claim 1, wherein before the GSA-SVR model is constructed and trained by using the data set, the method further comprises:
collecting a plurality of selected samples containing elements to be tested; for the selected sample, acquiring XRF (X-ray fluorescence) spectrum data of a target sample by using a spectrometer, and normalizing the spectrum data;
screening peak value information and content information of an element to be detected and peak value information and content information of a plurality of interference elements of the element to be detected based on the spectrum data to obtain a target sample set containing a plurality of characteristics of the XRF spectrum;
dividing a target sample set containing a plurality of characteristics of the obtained XRF spectrum to obtain a training sample set and a testing sample set;
the dividing the target sample set containing a plurality of features of the obtained XRF spectrum into a training sample set and a testing sample set includes:
dividing first k target sample data of a target sample set containing p features of an XRF spectrum into a training sample set;
taking the next n-k target sample data of a target sample set containing p characteristics of the XRF spectrum as a test sample set; where n represents the number of selected samples.
4. The method for quantitative analysis of elemental content using support vector regression coupled with gravity search of claim 1, wherein the training of the GSA-SVR model using the data set comprises:
firstly, determining the kernel function of the support vector machine as a Gaussian kernel function:
K(x i ,x j )=exp(-g||x i -y j || 2 );
wherein, K (x) i ,x j ) Representing a kernel function; g represents a kernel function parameter; y is i Representing the content value of the element a to be detected in the ith test sample set;
secondly, carrying out parameter optimization on the support vector machine by using a GSA algorithm to obtain an optimal punishment parameter C and a kernel function parameter g;
the parameter optimization of the support vector machine by using the GSA algorithm comprises the following steps:
initializing parameters of a Support Vector Machine (SVM), and setting C and g in the SVM as GSA (global system for mobile communications) optimized values;
and training the SVM model by using the training samples according to the C and g initialization positions and speeds to obtain the GSA-SVR model with the optimal C and g.
5. The method for quantitative analysis of element content by support vector regression combined with gravity search as claimed in claim 4, wherein said parameter optimization of support vector machine by GSA algorithm comprises the steps of:
respectively initializing the position and the speed in a solution space and a speed space, and setting iteration times, wherein the position represents the solution of the problem; determining the mass and the gravity of each individual by evaluating the objective function value of each individual, calculating the acceleration, and updating the speed and the position to obtain a GSA-SVR model with optimal C and g;
the step of determining the mass and the gravity of each individual, calculating the acceleration, and updating the speed and the position to obtain the GSA-SVR model with the optimal C and g comprises the following steps:
1) Calculate the mass of individual i:
Figure FDA0003926135720000031
Figure FDA0003926135720000032
therein, fit i (t) and M i (t) respectively representing the fitness function value and the quality of the ith individual at the tth iteration; best (t) and worst (t) represent the optimal fitness function value and the worst fitness function value in all individuals at the tth iteration:
Figure FDA0003926135720000033
2) Calculating the gravity:
Figure FDA0003926135720000034
Figure FDA0003926135720000035
wherein G (t) represents the value of universal gravitation constant in t iterations, and M aj (t)For active gravitational masses associated with individual j, M pi (t) is the passive gravitational mass associated with the individual i, R ij (t) denotes the Euclidean distance between individuals i and j, R ij (t)=||X i (t),X j (t)|| 2 ε is a constant used to prevent the denominator from being zero; rand j Is represented by [0,1]]The method comprises the following steps that a random variable which is uniformly distributed is obeyed, kbest represents that the individual mass is arranged in front k individuals in a descending order, the value of k linearly decreases along with the iteration times, the initial value is N, and the final value is 1;
3) Calculating the acceleration:
Figure FDA0003926135720000041
/>
wherein, M ii (t) represents the inertial gravity of the individual i at the tth iteration;
4) Update speed and position:
Figure FDA0003926135720000042
Figure FDA0003926135720000043
wherein r represents a random variable subject to uniform distribution between [0,1 ];
5) Judging whether the maximum iteration times or the precision requirement is met, and if the maximum iteration times or the precision requirement is met, outputting a GSA-SVR model with optimal C and g; otherwise, returning to the iteration again until the maximum iteration times are obtained or the accuracy requirement is met, and outputting the GSA-SVR model with the optimal C and g.
6. The method of claim 1, wherein the method further comprises:
testing the trained GSA-SVR model by using the test sample set to obtain a test result; carrying out reverse normalization processing on the obtained test result;
evaluating the GSA-SVR model by calculating the mean square error and the goodness of fit of the GSA-SVR model;
the calculating the mean square error and the goodness of fit of the GSA-SVR model comprises the following steps:
Figure FDA0003926135720000044
Figure FDA0003926135720000045
wherein, y i Representing the content value of the element a to be tested in the ith test sample set,
Figure FDA0003926135720000046
representing the content predicted value of the element a to be detected in the ith test sample set after the reverse normalization treatment; />
Figure FDA0003926135720000047
Representing the average value of the content true values of the element a to be detected in all the test sample sets; r =1,2.
7. An element content quantitative analysis system combining support vector regression and gravity search for implementing the method of element content quantitative analysis combining support vector regression and gravity search as claimed in any one of claims 1 to 6, wherein the element content quantitative analysis system combining support vector regression and gravity search comprises:
the device comprises a to-be-detected element spectrum data acquisition module, a spectrum analyzer and a spectrum analysis module, wherein the to-be-detected element spectrum data acquisition module is used for determining to-be-detected elements and acquiring XRF spectrum data of to-be-detected samples by the spectrum analyzer;
the peak information extraction module is used for determining the peak information of the element to be detected based on the spectral data;
the model construction module is used for constructing a GSA-SVR model;
the data set construction module is used for acquiring sample data containing elements to be detected and spectral data of the sample; screening peak value information and content information of a target element and peak value information and content information of a plurality of interference elements of an element to be detected based on the spectrum data to obtain a target sample set containing a plurality of characteristics of the XRF spectrum;
the model training module is used for training the constructed GSA-SVR model by combining the GSA with a training sample set;
and the content prediction module is used for predicting the content of the element to be detected based on the peak value information of the element to be detected by using the trained GSA-SVR model.
8. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of quantitative analysis of elemental content in combination with gravity search support vector regression according to any one of claims 1-6.
9. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method for quantitative analysis of elemental content in combination with gravity search in support vector regression according to any of claims 1 to 6.
10. An information data processing terminal, characterized in that the information data processing terminal is used for implementing the quantitative analysis system of element content in combination with gravity search supporting vector regression as claimed in claim 7.
CN202211396783.0A 2022-11-04 2022-11-04 Quantitative analysis method for element content by combining support vector regression with gravity search Pending CN115879039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211396783.0A CN115879039A (en) 2022-11-04 2022-11-04 Quantitative analysis method for element content by combining support vector regression with gravity search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211396783.0A CN115879039A (en) 2022-11-04 2022-11-04 Quantitative analysis method for element content by combining support vector regression with gravity search

Publications (1)

Publication Number Publication Date
CN115879039A true CN115879039A (en) 2023-03-31

Family

ID=85759558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211396783.0A Pending CN115879039A (en) 2022-11-04 2022-11-04 Quantitative analysis method for element content by combining support vector regression with gravity search

Country Status (1)

Country Link
CN (1) CN115879039A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116400028A (en) * 2023-05-29 2023-07-07 湖南汇湘轩生物科技股份有限公司 Essence quality detection method, system and medium based on smell sensor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116400028A (en) * 2023-05-29 2023-07-07 湖南汇湘轩生物科技股份有限公司 Essence quality detection method, system and medium based on smell sensor
CN116400028B (en) * 2023-05-29 2023-08-22 湖南汇湘轩生物科技股份有限公司 Essence quality detection method, system and medium based on smell sensor

Similar Documents

Publication Publication Date Title
CN112990432B (en) Target recognition model training method and device and electronic equipment
US20210117760A1 (en) Methods and apparatus to obtain well-calibrated uncertainty in deep neural networks
Schwab et al. Granger-causal attentive mixtures of experts: Learning important features with neural networks
CN111079780B (en) Training method for space diagram convolution network, electronic equipment and storage medium
US8725663B1 (en) System, method, and computer program product for data mining applications
US11550823B2 (en) Preprocessing for a classification algorithm
CN116432091B (en) Equipment fault diagnosis method based on small sample, construction method and device of model
CN113228062B (en) Deep integration model training method based on feature diversity learning
Gensler et al. Novel Criteria to Measure Performance of Time Series Segmentation Techniques.
CN115879039A (en) Quantitative analysis method for element content by combining support vector regression with gravity search
CN113646714A (en) Processing parameter setting method and device for production equipment and computer readable medium
CN111325344A (en) Method and apparatus for evaluating model interpretation tools
De Wiljes et al. An adaptive Markov chain Monte Carlo approach to time series clustering of processes with regime transition behavior
CN111340356A (en) Method and apparatus for evaluating model interpretation tools
CN114839210B (en) XRF element quantitative analysis method based on SSA-BP neural network
Nikolikj et al. Sensitivity Analysis of RF+ clust for Leave-one-problem-out Performance Prediction
CN114637620A (en) Database system abnormity classification prediction method based on SVM algorithm
Farag et al. Inductive Conformal Prediction for Harvest-Readiness Classification of Cauliflower Plants: A Comparative Study of Uncertainty Quantification Methods
CN111723719A (en) Video target detection method, system and device based on category external memory
CN105095689A (en) Data mining method of electronic noses based on Wayne prediction
CN111144503B (en) Balance capability evaluation method and device
Khadka et al. A Combinatorial Approach to Hyperparameter Optimization
CN113469244B (en) Volkswagen app classification system
CN115274002A (en) Compound persistence screening method based on machine learning
Rahman et al. Novel Metrics for Evaluation and Validation of Regression-based Supervised Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination