CN110598836B - Metabolic analysis method based on improved particle swarm optimization algorithm - Google Patents

Metabolic analysis method based on improved particle swarm optimization algorithm Download PDF

Info

Publication number
CN110598836B
CN110598836B CN201910967968.4A CN201910967968A CN110598836B CN 110598836 B CN110598836 B CN 110598836B CN 201910967968 A CN201910967968 A CN 201910967968A CN 110598836 B CN110598836 B CN 110598836B
Authority
CN
China
Prior art keywords
parameter
particle swarm
algorithm
iteration
grid search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910967968.4A
Other languages
Chinese (zh)
Other versions
CN110598836A (en
Inventor
王馨瑶
唐业忠
陆方
王文波
刘杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Institute of Biology of CAS
Original Assignee
Chengdu Institute of Biology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Institute of Biology of CAS filed Critical Chengdu Institute of Biology of CAS
Priority to CN201910967968.4A priority Critical patent/CN110598836B/en
Publication of CN110598836A publication Critical patent/CN110598836A/en
Application granted granted Critical
Publication of CN110598836B publication Critical patent/CN110598836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of artificial intelligence for medical biochemical examination, and relates to a method for performing machine learning and data analysis processing on metabolic components. The method for analyzing and identifying the metabolome based on the improved particle swarm optimization algorithm comprises the following steps: 1. establishing a non-target metabolite database of human blood samples; 2. carrying out standardized treatment on the metabolome data; 3. utilizing an improved particle swarm optimization algorithm to obtain optimal parameters of a support vector machine (Support Vector Machine, SVM for short); 4. and modeling and classifying the metabolome data by using the SVM, and analyzing and identifying the characteristics of the blood sample. The invention is applied to the analysis of the retinopathy of prematurity (ROP) metabonomics and UCI machine learning standard test data set, and is proved to have excellent global optimizing capability, higher optimizing speed, good prediction precision and stability.

Description

Metabolic analysis method based on improved particle swarm optimization algorithm
Technical Field
The present invention relates to histology (omics) analysis and also to the detection and analysis of human metabolites.
Background
In the post-genomic era, various histology (omics) are rapidly developing, and higher demands are being placed on analysis of medical biological data. The characteristics of the histology are that the medical biology data are rapidly and largely acquired by means of high-sensitivity instruments and equipment, so that a statistical method and an analysis algorithm become important contents of the histology study.
Unlike other histology such as genomics, transcriptomics, and proteomics, metabolomics detects the kind and abundance of metabolites through information such as mass-charge ratios and residence times specific to various compounds, and thus does not contain sequence information such as genes or proteins. Metabonomics differs from any previous chemical analysis method in that it is a non-target assay, i.e. all compounds are sampled indifferently, metabolome profiles (profiling) are constructed with the kind and abundance of all compounds, and the samples are marked with profiles. Because the metabolic components and the content thereof in the blood sample are very complex, the corresponding calculation speed becomes a technical bottleneck for data analysis.
The support vector machine (Support Vector Machine, SVM for short) is a machine learning method proposed according to the structural risk minimization principle, and has excellent generalization capability in the aspects of small sample, nonlinearity, high-dimensional pattern recognition and the like. The selection of parameters has great influence on the performance of the support vector machine, so how to find the optimal parameters of the SVM in the fastest time becomes a key problem affecting the learning and generalization capabilities of the support vector machine. Because of the variety of the combination and the value of each parameter in the support vector machine, the selection of the parameters by the manual experience is very difficult, and the workload is very complicated. Therefore, scholars at home and abroad develop many researches on optimizing functions, and at present, the internationally more common parameter optimizing modes are as follows: particle swarm optimization (Particle Swarm Optimization, PSO for short), grid search algorithm, genetic algorithm, etc.
PSO is a global random search algorithm proposed by simulating the foraging behavior of a flock of birds. In PSO, the potential solution to each optimization problem is a particle in the search space that updates itself by tracking individual extrema with global extrema. As a heuristic algorithm, PSO has the advantages of high searching speed, high efficiency and simple algorithm, and the disadvantages of easy sinking into local optimum and weaker global searching capability. The grid search method is used for arranging and combining possible values of each parameter, listing all possible combined results to generate a grid, and is an exhaustive search method for specifying parameter values, and has the defects of high complexity, large operand and the like.
Disclosure of Invention
In order to solve the problems of high time consumption, complex parameter combination and the like in the prior art of analyzing metabonomic data, the invention provides a rapid, accurate and efficient histology analysis method based on an improved particle swarm optimization algorithm.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: a metabonomics analysis method based on an improved particle swarm optimization algorithm comprises the following steps:
step one: non-target detection is carried out on the patient and the control blood sample, a metabolome database is obtained, and classification marking is carried out on the blood sample; dividing a training set sample and a test set sample; the kernel function parameter g and the penalty function c are used as super parameters in an SVM algorithm, and the maximum value, the minimum value and the difference f of parameter optimization are determined;
step two: initializing the speed, position and individual history optimization and global optimization of particles in a particle swarm;
step three: introducing a new asynchronous learning function and a new inertia weight function, and updating the speed and the position of the particles;
step four: analyzing the aggregation degree of the population after each iteration in the particle swarm algorithm, taking the aggregation degree of particles in a certain range near the particles finding the global optimal value as a trigger variation condition, and introducing a Logistic function to enable the particles selected in the range to carry out chaotic variation;
step five: carrying out self-adaptive adjustment on parameter combinations obtained by calculating particle swarm;
step six: carrying out data processing on the parameter combination obtained in the fifth step, and then carrying out coarse grid searching;
step seven: performing self-adaptive adjustment and parameter selection on the parameter combination obtained by searching the coarse grid;
step eight: performing fine grid search after data processing on the parameter combination obtained in the step seven, and outputting optimal parameters;
step nine: and constructing a support vector machine by utilizing the optimal parameters, and testing and analyzing the test set data.
In the first step, both the training set and the test set contain samples of the ROP diseased group and the non-diseased group. The method comprises the steps of dividing a training set sample and a testing set sample, constructing a corresponding support vector machine based on improved particle swarm optimization for the training set, and detecting and analyzing pathological characteristics of the testing set sample.
Preferably, in the second step, the particle in the PSO acquires and updates its own velocity and position by the following formula:
v i =w×v i +c 1 ×r 1 ×(pbest i -x i )+c 2 ×r 2 ×(gbest i -x i )
x i =x i +v i
in the above formula, w is an inertial weight, i=1, 2,..n, N is the total number of particles in the population; vi is the velocity of the particle; r is (r) 1 And r 2 Is a random number between (0, 1); x is x i Is the current position of the particle; c 1 And c 2 Is a learning factor; pbest (p best) i Is the optimal location of the individual; gbest (g best) i Is the optimal location for all particles found in this population so far.
The new asynchronous learning function formula in the step three is as follows:
c 1 =sin(2*t/T+2)+0.2*sin(10*t/T+20)+1
c 2 =-sin(2*t/T+2)-0.2*sin(10*t/T+20)+3
wherein T is the current iteration number, T is the total iteration number, c 1 C is a cognitive learning factor 2 Is a social learning factor.
The weight factors include inertial weight factor w and learning factor c 1 And c 2 . The particles are kept in motion inertia, so that the particles have the trend of expanding the search space, and the capability of exploring new areas is formed. c 1 And c 2 The weights representing the statistical acceleration terms that push each particle toward the individual and population optimal positions, with lower values allowing the particle to wander outside the target area before being pulled back, and higher values causing the particle to suddenly flush toward or cross the target area.
Step three, a new sectional inertia weight function which changes along with the change of iteration times is expressed as the following formula:
w=sin(t/(0.66*T)+1.5)+0.1*sin(10*(t/T)-2);
where T is the current iteration number, T is the total iteration number, and w is the inertial weight.
Step four, calculating the number of particles in a certain range near the particles with the global optimal value in each iteration; if the number limit is exceeded, introducing a Logistic function to conduct chaotic variation on the selected particles, and calculating the fitness of the new particles; if the fitness of the new particle is better than that of the original particle, the new particle takes part in iteration instead of the original particle, otherwise, the original particle is used for continuing to take part in iteration.
The self-adaptive adjustment parameter selection strategy comprises the following steps:
cross-verifying the parameter combination obtained by the particle swarm algorithm, and sequencing the accuracy obtained by the cross-verifying; taking out the parameter combination of the first 5 bits of the descending order of the accuracy rate, and respectively carrying out difference value calculation on the punishment parameter value of the first 4 bits of the ordering and the punishment parameter value of the 5 th bit of the ordering; if the difference value is greater than 1/10 of f, taking the parameter combination corresponding to the minimum punishment parameter as the optimal parameter combination obtained by the particle swarm algorithm; if the difference value of a certain item is smaller than 1/10 of f, taking the parameter combination corresponding to the penalty parameter with the highest accuracy in the several items as the optimal parameter combination (c 0 ,g 0 )。
Step six, coarse grid search comprises the following contents:
the fixed penalty parameter is the optimal penalty parameter obtained by the particle swarm optimization, and g is the fixed penalty parameter 0 And (5) carrying out rough grid search with the step length of 2-5 on the kernel function in a certain range nearby. The search range is grid search by taking the kernel function parameter obtained by the particle swarm optimization as the center and taking 1/5 of the integral optimizing range of the kernel function parameter as the search space.
Step seven, the self-adaptive adjustment parameter selection strategy comprises the following contents:
cross-verifying the parameter combination obtained by the coarse grid algorithm, and sequencing the accuracy obtained by the cross-verifying; taking out the parameter combination of the first 10 bits of the descending order of the accuracy rate, and respectively carrying out difference value calculation on the kernel function parameter value of the first 9 bits of the order and the kernel function parameter value of the 10 th bit of the order; if the difference is greater than 1/10 of f, taking the parameter combination corresponding to the minimum g parameter as the optimal parameter combination obtained by the coarse grid search method; if the difference value of a certain item is smaller than 1/10 of f, taking the parameter combination corresponding to the kernel function parameter with the highest corresponding accuracy in the several items as the optimal parameter combination obtained by the coarse grid search method.
Step eight, fine grid search includes the following:
taking the optimal parameter combination obtained by the coarse grid searching method as a center, and carrying out fine grid searching with the step length of 1; the search space is 1/20 of the difference f between the maximum value and the minimum value of the overall optimization of the parameters, and the parameter combinations are respectively subjected to cross verification; and taking the parameter combination with the highest accuracy as a final optimal parameter combination. f is larger than 0, the minimum value is generally 0-1, the maximum value is generally 0-1000, and f is generally 0-1000, so that the precision is not affected.
The beneficial effects of the invention are as follows:
(1) Aiming at the characteristic of extremely complex metabolic components and abundance, the invention improves the traditional particle swarm algorithm, and provides a disease identification and diagnosis method for modeling and corresponding samples of the metabolic group data, which has good prediction precision and stability;
(2) The invention is suitable for non-target detection in metabonomics analysis, and has good operability and practicality;
(3) Aiming at the defect that the traditional particle swarm algorithm is easy to fall into a local optimal value, the invention provides an improved particle swarm optimization algorithm, and the UCI machine learning standard test data set is used for verifying that the algorithm has more excellent global optimizing capability and higher optimizing speed, and good prediction precision and stability.
(4) The focus of existing metabonomics studies is on specific biomarkers, the main purpose of which is to find the metabolites that contribute most to the differences, i.e. the biomarkers. The invention is the innovation of the analysis method. The invention can distinguish different groups as accurately as possible, and is different from the existing popular metabonomics analysis thought. In reality, some samples cannot find metabolites with significant differences by using a classical method, which has a great relation with the complexity of clinical samples. However, this does not represent a distinguishable difference in metabolism between the target group (e.g. disease group) and the control group, and our approach is to focus on the difference in overall metabolic profile rather than the difference in several metabolites for this case. The method is not only suitable for researching the disease mechanism, but also is very suitable for developing a rapid identification and screening and diagnosis method based on metabolic spectrums.
(5) The method of the invention is based on a histology big data environment, obtains more accurate metabonomics information and data analysis results, can be widely applied to population management, medical insurance audit and supervision, drug research and development, identification and confirmation of suspects, disease auxiliary detection and other aspects, and can be used in occasions where the common object and the target object have metabolic differences.
Drawings
FIG. 1 is a flow chart of a parameter optimizing algorithm of the present invention;
FIG. 2 is a flow chart of a particle swarm algorithm of the present invention;
FIG. 3 is a graph of a new piecewise asynchronous learning function as a function of iteration number of the present invention;
FIG. 4 is a graph of a new segmented inertial weight function as a function of iteration number in accordance with the present invention;
FIG. 5 is a flowchart of an algorithm combination according to the present invention;
FIG. 6 shows an adaptive tuning selection strategy when combining a particle swarm algorithm with a coarse grid search;
fig. 7 is an adaptive tuning selection strategy when combining coarse grid search with fine grid search.
Detailed Description
As shown in fig. 1, fig. 1 is a parameter optimization algorithm based on a combination of a modified particle swarm algorithm and a grid search method.
The particle swarm algorithm is used as a heuristic algorithm, and is easy to fall into a local optimal value. The grid search method is an exhaustive search method, and possible values of all parameters are traversed to be arranged and combined, so that the calculation amount is extremely large, and the time consumption is extremely high. The invention combines the above problems with an algorithm and performs fusion search by providing a self-adaptive adjustment parameter selection strategy.
The invention provides a parameter optimizing algorithm based on combination of an improved particle swarm algorithm and a grid searching method, which comprises the following steps:
step one: non-target detection is carried out on the patient and the control blood sample, a metabolome database is obtained, and classification marking is carried out on the blood sample; dividing a training set sample and a test set sample; the kernel function parameter g and the penalty function c are used as super parameters in an SVM algorithm, and the maximum value, the minimum value and the difference f of parameter optimization are determined;
step two: initializing the speed, position and individual history optimization and global optimization of particles in a particle swarm;
step three: introducing a new asynchronous learning function and a new inertia weight function, and updating the speed and the position of the particles;
step four: analyzing the aggregation degree of the population after each iteration in the particle swarm algorithm, taking the aggregation degree of particles in a certain range near the particles finding the global optimal value as a trigger variation condition, and introducing a Logistic function to enable the particles selected in the range to carry out chaotic variation;
step five: carrying out self-adaptive adjustment on parameter combinations obtained by the particle swarm to select parameters;
step six: carrying out data processing on the obtained parameter combination, and then carrying out coarse grid search;
step seven: performing self-adaptive adjustment and parameter selection on the parameter combination obtained by searching the coarse grid;
step eight: performing data processing on the obtained parameter combination, performing fine grid search, and outputting optimal parameters;
step nine: and constructing a support vector machine by utilizing the optimal parameters to test and analyze the test set data.
In the first step, both the training set and the test set contain samples of the ROP diseased group and the non-diseased group.
As shown in fig. 2, in the second step, chaotic mutation is performed under the condition that the particle aggregation degree is a trigger. Chaos is between deterministic and stochastic, has rich space-time dynamics, and the ergodic property can be used as an effective optimization mechanism for avoiding sinking local minima in the searching process. The particle swarm algorithm is used as a heuristic algorithm, is extremely easy to sink into a local minimum value in optimizing, and more particles in the population are always gathered in a certain range near the particles with the optimal value of the population. The phenomenon can increase the probability of local minima occurrence, redundant calculation of the particle swarm algorithm can also occur, and algorithm time consumption is increased. In order to solve the above problems, it is proposed to perform chaos mutation using the particle aggregation level as a trigger condition.
Chaos variation is carried out by taking the aggregation degree of particles as a triggering condition, and the chaos variation comprises the following steps:
and calculating the number of particles in a certain range near the particles with the global optimal value in each iteration, if the number limit is exceeded, introducing a Logistic function to conduct chaotic variation on the selected particles, calculating the fitness of new particles, if the fitness of the new particles is superior to that of the original particles, the new particles replace the original particles to participate in the iteration, otherwise, continuing the iteration by the original particles.
And introducing a new piecewise asynchronous learning function which changes along with the change of the iteration times in the step three. Learning factor c 1 And c 2 Not only is the ability for particles to find individual optima and population optima, but also balances local and global searches. Particle self-renewal capacity and c 1 Related to the following. If c 1 If the value is 0, the particles are not provided with a self-recognition function because of social experience; the convergence speed is increased, and the local optimum is trapped. The particles can work together and transmit the information obtained by themselves to other particles because of the combination with c 2 Related to the following. If c 2 If 0, the particle has no function and convergence accuracy is lowered.
As shown in fig. 3, a new piecewise asynchronous learning function which changes with the iteration number is introduced in the second step, and the feature is that:
the asynchronous learning function studied by the prior scholars pays attention to monotonic increment or monotonic decrement, ignores the function change rate of the learning factors, and is difficult to solve the problems of too fast convergence or oscillation in the later stage in the optimizing process. In order to solve the above problems, a new piecewise asynchronous learning function which changes with the iteration number is proposed:
c 1 =sin(2*t/T+2)+0.2*sin(10*t/T+20)+1
c 2 =-sin(2*t/T+2)-0.2*sin(10*t/T+20)+3
wherein t is the current iteration numberT is the total iteration number, c 1 C is a cognitive learning factor 2 Is a social learning factor.
(1) In the initial stage of iteration, the particles are expected to have higher global searching capability, the influence of the individual optimal values on the particles is gradually weakened, and the influence of the global optimal values on the particles is gradually strengthened. At this stage, the particles tend to be in a state where both velocity and position are extremely random, and thus c is desired 1 The decreasing trend of the particle is gradually increased from gentle to gradual, so that the particle has higher global searching capability at the stage. And c 2 In this stage, the increment is changed in a monotone way along with the iteration increment, and the increment degree is gradually increased from gentle. The particles are not easy to quickly converge to local optimum at this stage;
(2) In the middle of the iteration, it is desirable that the particles have a certain ability to recognize themselves while converging towards a global optimum, and the particles still maintain a certain ability to update their own optimum at this stage. Thus hope c 1 In this stage, the iteration frequency is increased and the monotonically decreasing variation is formed, and the decreasing trend is gradually decreased. And c 2 In this stage, the iterative increment is changed in a monotonic and increasing mode, and the increasing trend of the iterative increment is gradually reduced. The particles in the stage have certain global searching capability while converging to the optimal interval;
(3) At the end of the iteration, a reliable global optimum has been obtained as the population of particles is iterated multiple times. Considering the combination of the follow-up and grid search methods, it is desirable that the particles quickly converge to the global optimum at this stage, and then the convergence speed is controlled to ensure that the particles have a certain local search capability. Thus hope c 1 In this stage, the decreasing trend is increased and then decreased along with the increasing of the iteration times. And c 2 At this stage, the iterative increment is changed in a monotonic increment mode, and the increment trend of the iterative increment is increased and then decreased. The particles in this stage have a certain local searching capability while converging towards a global optimum, and a proper interval is precisely positioned so as to facilitate the subsequent grid searching.
As shown in fig. 4, a new piecewise inertia weight function which changes with the iteration number is introduced in the third step;
in a general global optimization algorithm, a higher global searching capacity is expected to find a proper interval in the early stage, and a higher convergence speed to individual optimization and social optimization is expected in the later stage, so that the inertia weight is considered to be monotonically decreasing by the traditional inertia weight value strategy. In the initial stage of optimizing, the particles are always in a state with extremely strong randomness in speed and position, and in the stage, the inertia weight monotonically decreases, so that the particles can be converged to the optimal value of the population at the moment, and the global searching capability is weakened. In order to solve the above problems, a new piecewise inertia weight function which changes along with the change of the iteration times is provided:
w=sin(t/(0.66*T)+1.5)+0.1*sin(10*(t/T)-2)
where T is the current iteration number, T is the total iteration number, and w is the inertial weight.
A new piecewise inertial weight function that varies with iteration number, characterized by:
(1) In the initial stage of iteration, the particles are expected to have stronger global searching capability while converging to individual optimal and global optimal, so that in this stage, w is expected to change monotonically and incrementally along with the increment of the iteration times, and the increment trend of w is gradually reduced so as to ensure that the particles have stronger global searching capability in this stage;
(2) In the middle iteration stage, the particles are expected to have stronger development capability and higher convergence rate, so that in the stage, w is expected to change monotonically and progressively along with the increment of the iteration times, the descending trend of the w is gradually enhanced, and the convergence rate and the global searching capability of the particles in the stage are improved;
(3) At the end of iteration, the particles are expected to have stronger local searching capability while converging towards the global optimum, so that at the stage, w is expected to change monotonically along with the increment of the iteration times, the decreasing trend of w is firstly decreased and then increased, the stronger local searching capability is ensured at the initial stage of converging towards the global optimum at the stage, and the rapid fine convergence can be carried out at the later stage of converging towards the global optimum to improve the particle convergence rate.
As shown in fig. 5, fig. 5 is a flowchart of the algorithm combination of the present invention.
The traditional parameter selection strategy is characterized in that: the highest verification classification accuracy corresponding to the combination of a plurality of groups of penalty parameters c and kernel function parameters g is possible, and a group with the smallest penalty parameters in the highest verification classification accuracy is selected as the optimal parameters. In practice, different parameter combinations often correspond to different verification classification accuracy, parameter combinations with larger parameter values often correspond to higher verification classification accuracy, but an over-learning state can be caused by an over-high penalty function c, and a self-adaptive adjustment parameter selection strategy is provided for the problems.
An adaptive tuning parameter selection strategy comprising the following:
1) In the sixth step, the parameter combination output by the particle swarm algorithm is subjected to self-adaptive adjustment and parameter selection strategies to obtain a group of optimal parameter combination, and coarse grid search is performed. Wherein the fixed penalty parameter is the optimal penalty parameter c obtained by the particle swarm algorithm, and g is as follows 0 And (5) carrying out rough grid search with the step length of 2-5 on the kernel function in a certain range nearby. The search range is grid search by taking the kernel function parameter obtained by the particle swarm optimization as the center and taking 1/5 of the integral optimizing range of the kernel function parameter as the search space.
2) In the eighth step, the parameter combination output by the coarse grid search algorithm is subjected to self-adaptive adjustment of a parameter selection strategy to obtain a group of optimal parameter combination, and fine grid search is performed. Wherein, the fine grid search with the step length of 1 is carried out by taking the optimal parameter combination obtained by the coarse grid search method as the center, and the search space is 1/20 of f. And respectively carrying out cross verification on the parameter combinations, and taking the parameter combination with the highest accuracy as a final optimal parameter combination.
As shown in fig. 6, fig. 6 shows an adaptive adjustment parameter selection strategy when the particle swarm algorithm is combined with the coarse grid search method, and the content is as follows:
in the fifth step, the parameter combinations obtained by the particle swarm algorithm are cross-validated, the accuracy obtained by the particle swarm algorithm is ordered, the parameter combinations of the first 5 bits of the accuracy descending order are taken out, and the penalty of the first 4 bits of the order is givenRespectively carrying out difference value calculation on the parameter value and the punishment parameter value of the 5 th sequencing; if the difference value is greater than 1/10 of f, taking the parameter combination corresponding to the minimum punishment parameter as the optimal parameter combination obtained by the particle swarm algorithm; if the difference value of a certain item is smaller than 1/10 of f, taking the parameter combination corresponding to the penalty parameter with the highest accuracy in the several items as the optimal parameter combination (c 0 ,g 0 );
As shown in fig. 7, fig. 7 shows an adaptive adjustment parameter selection strategy when combining a coarse grid search method and a fine grid search method, which is as follows:
in the seventh step, the parameter combination obtained by the rough grid algorithm is subjected to cross verification, the obtained accuracy is sequenced, the parameter combination of the first 10 bits of the descending order of the accuracy is taken out, and the difference value calculation is respectively carried out on the kernel function parameter value of the first 9 bits of the sequencing and the kernel function parameter value of the 10 th bit of the sequencing; if the difference value is greater than 1/10 of the overall optimizing range of the kernel function parameters, taking the parameter combination corresponding to the minimum kernel function parameter as the optimal parameter combination obtained by the coarse grid search method; if the difference value of a certain term is smaller than 1/10 of the overall optimizing range of the kernel function parameters, taking the parameter combination corresponding to the kernel function parameter with the highest corresponding accuracy in the terms as the optimal parameter combination obtained by the coarse grid searching method.
The present invention is applied to a retinopathy of prematurity (ROP) metabolome analysis, and table 1 shows the results of 4 ROP metabolome data obtained through an identification system based on an improved particle swarm optimization algorithm and a conventional PSO identification system. R1 comprises metabolome samples of 134 ROP patients, and 126 blood components are selected as characteristics according to the metabolome analysis result of a pathological model of a pre-animal (rat); after PCA analysis is carried out on the metabolome samples of 146 ROP patients, 119 sample sets obtained by removing outlier data are obtained, and 126 blood components are selected as characteristics for each sample; r3 contained a metabolome sample of 134 ROP patients, each sample extracting all 174 blood components as features; r4 contained a metabolome sample of 134 ROP patients, and 4 identical components in the human and ROP rat model metabolome were extracted as features. The dataset was set at 5:1, dividing a training set and a testing set in proportion, optimizing a support vector machine by using an improved particle swarm algorithm, establishing a model for the training set by using an optimized SVM, and finally carrying out ROP classification judgment on testing set data by using the model. The experimental simulation platform is Matlab R2014a and an operating system: windows 10, processor is Intel (R) Core (TM) I5-9400CPU, memory: 8GB.
TABLE 1
Figure BDA0002231129590000121
The categories of table 1 are ROP initial and normal: class 1 is a neonate group with stage I ROP, where the ROP fundus image is seen as the boundary between the avascular zone and normal retina; class 2 is a normal neonate group without ROP. From the data in table 1, it can be seen intuitively that in the metabolome identification, the improved particle swarm optimization algorithm combined with the grid search has stronger global optimizing capability than the traditional PSO optimization algorithm combined with the grid search, and has better identification stability and higher identification speed on different metabolomes.
The invention provides an improved particle swarm optimization algorithm, which is verified by ROP metabonomic data to have more excellent global optimization capability and good prediction precision and stability. The algorithm is applied to metabonomic analysis of retinopathy of prematurity (ROP), and good early warning precision is obtained.

Claims (7)

1. A metabonomics analysis method based on an improved particle swarm optimization algorithm, comprising the steps of:
first,: non-target detection is carried out on the patient and the control blood sample, a metabolome database is obtained, a mapping relation between the blood sample and pathological characteristics is constructed, and a training set sample and a testing set sample are divided;
then: constructing a corresponding support vector machine based on improved particle swarm optimization through the training set sample, and detecting and analyzing pathological characteristics of the testing set sample; the construction of the support vector machine based on improved particle swarm optimization comprises the following steps:
the improved particle swarm algorithm is combined with the grid search method, and a new self-adaptive adjustment parameter selection strategy is provided in the algorithm combination process;
wherein:
the improved particle swarm algorithm comprises:
initializing the speed, position and individual history optimization and global optimization of particles in a particle swarm;
then introducing an asynchronous learning function and a new inertia weight function, and updating the speed and the position of the particles;
then, carrying out aggregation degree analysis on the population after each iteration in the particle swarm algorithm, taking the aggregation degree of particles in a certain range near the particles with the global optimal value as a trigger variation condition, and introducing a Logistic function to enable the particles selected in the range to carry out chaotic variation; the algorithm combining the improved particle swarm algorithm with the grid search method comprises the following steps: carrying out rough grid search after adaptively adjusting and selecting parameters of a parameter combination obtained by a particle swarm algorithm, and carrying out fine grid search after adaptively adjusting and selecting parameters of a parameter combination obtained by a rough grid search method; the asynchronous learning function has the formula:
c 1 =sin(2*t/T+2)+0.2*sin(10*t/T+20)+1
c 2 =-sin(2*t/T+2)-0.2*sin(10*t/T+20)+3
wherein c 1 And c 2 Is the learning factor of the particle swarm, T is the current iteration number, and T is the total iteration number;
the new sectional inertia weight function changing along with the iteration times changes, and the formula is as follows:
w=sin(t/(0.66*T)+1.5)+0.1*sin(10*(t/T)-2)
wherein w is the inertial weight of the particle swarm, T is the current iteration number, and T is the total iteration number.
2. The improved particle swarm optimization algorithm-based metabonomics analysis method according to claim 1, wherein: the method for carrying out coarse grid search and fine grid search on the optimal parameter combination obtained by the particle swarm algorithm through data processing comprises the following steps: the fixed penalty parameter is an optimal penalty parameter obtained by a particle swarm algorithm; coarse grid search with step length of 2-5 is performed in the space near the kernel function parameter, the search range is centered on the kernel function parameter obtained by the particle swarm optimization, and 1/5 of the overall optimizing range of the kernel function parameter is used as the search space for grid search; and carrying out fine grid search with the step length of 1 by taking the optimal parameter combination obtained by the rough grid search method through data processing as a center, wherein the search space is 1/20 of the difference value f between the maximum value and the minimum value of the overall optimization of the parameters.
3. The metabonomic analysis method based on improved particle swarm optimization algorithm according to claim 2, wherein: the self-adaptive adjustment parameter selection strategy when the particle swarm algorithm is combined with the coarse grid search method comprises the following steps: after cross-verifying the parameter combinations obtained by the particle swarm algorithm, sequencing the obtained accuracy, and taking out the parameter combinations of the first 5 bits of the accuracy descending sequence; respectively carrying out difference value calculation on the punishment parameter value of the 4 th bit before sequencing and the punishment parameter value of the 5 th bit; if the difference value is greater than 1/10 of f, taking the parameter combination corresponding to the minimum punishment parameter as the optimal parameter combination obtained by the particle swarm algorithm; if the difference value of a certain item is smaller than 1/10 of f, taking the parameter combination corresponding to the penalty parameter with the highest accuracy in the several items as the optimal parameter combination obtained by the particle swarm algorithm.
4. The metabonomic analysis method based on improved particle swarm optimization algorithm according to claim 2, wherein: the self-adaptive adjustment when combining the coarse grid search method and the fine grid search method comprises the following steps: the parameter selection strategy carries out cross verification on the parameter combination obtained by the coarse grid algorithm, and sorts the accuracy obtained by the parameter combination; taking out the parameter combination of the first 10 bits of the descending order of the accuracy rate, and respectively carrying out difference value calculation on the kernel function parameter value of the first 9 bits of the order and the kernel function parameter value of the 10 th bit of the order; if the difference value is greater than 1/10 of f, taking the parameter combination corresponding to the smallest kernel function parameter as the optimal parameter combination obtained by the coarse grid search method; if the difference value of a certain term is smaller than 1/10 of f, taking the parameter combination corresponding to the kernel function parameter with the highest corresponding accuracy in the term as the optimal parameter combination obtained by the coarse grid search method.
5. The improved particle swarm optimization algorithm-based metabonomics analysis method according to claim 1, wherein: the specific calculation steps of the asynchronous learning function comprise:
(1) At the initial stage of iteration, c 1 The decreasing trend of the method is gradually increased from gentle and gradual, c 2 The particle is not easy to quickly converge to local optimum at this stage due to the gradual increase of the increasing degree along with the iterative increase;
(2) In the middle of iteration, c 1 The decreasing trend of the iterative frequency is gradually reduced along with the increasing of the iterative frequency 2 The particle gradually decreases along with the monotonous increasing change of the iteration increasing, so that the particle at the stage has certain global searching capability while converging to the optimal interval;
(3) At the end of the iteration, c 1 The decreasing trend of the iterative frequency is increased and then decreased, c 2 The particle in the stage has a certain local searching capacity while converging to a global optimal value, and a proper interval is accurately positioned so as to facilitate the subsequent grid searching.
6. The improved particle swarm optimization algorithm-based metabonomics analysis method according to claim 1, wherein: the inertia weight function specifically calculating step comprises the following steps:
(1) In the initial stage of iteration, w is monotonously and incrementally changed along with the increment of the iteration times, and the increment trend of w is gradually reduced, so that the particles have stronger global searching capability in the stage;
(2) In the iteration middle stage, w is in monotonic decreasing change along with the increasing of the iteration times, the decreasing trend of w is gradually enhanced, and the convergence speed and the global searching capability of particles in the stage are improved;
(3) At the end of iteration, w is in monotonic decrease change along with the increase of iteration times, and the decreasing trend is firstly decreased and then increased, so that the initial stage of convergence to the global optimum in the stage is guaranteed to have stronger local searching capability, and the later stage of convergence to the global optimum can be subjected to rapid fine convergence to improve the particle convergence rate.
7. The method of metabonomic analysis based on improved particle swarm optimization according to any of claims 1-6, wherein the blood sample is replaced by another biological sample of the subject.
CN201910967968.4A 2019-10-12 2019-10-12 Metabolic analysis method based on improved particle swarm optimization algorithm Active CN110598836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910967968.4A CN110598836B (en) 2019-10-12 2019-10-12 Metabolic analysis method based on improved particle swarm optimization algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910967968.4A CN110598836B (en) 2019-10-12 2019-10-12 Metabolic analysis method based on improved particle swarm optimization algorithm

Publications (2)

Publication Number Publication Date
CN110598836A CN110598836A (en) 2019-12-20
CN110598836B true CN110598836B (en) 2023-04-28

Family

ID=68866797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910967968.4A Active CN110598836B (en) 2019-10-12 2019-10-12 Metabolic analysis method based on improved particle swarm optimization algorithm

Country Status (1)

Country Link
CN (1) CN110598836B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111110192A (en) * 2019-12-26 2020-05-08 北京中润普达信息技术有限公司 Skin abnormal symptom auxiliary diagnosis system
CN112331259B (en) * 2020-11-26 2024-04-12 浙江大学 Tissue metabolite information evaluation method, device and medium based on Bloch-McConnell equation simulation
CN112509704A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Acute coronary syndrome early warning method and device based on metabonomics data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106963370A (en) * 2017-03-27 2017-07-21 广州视源电子科技股份有限公司 Electroencephalogram relaxation degree identification method and device based on support vector machine
CN108830372A (en) * 2018-06-08 2018-11-16 湖北工业大学 A kind of adaptive particle swarm optimization method of Traveling Salesman Problem
CN109033936A (en) * 2018-06-01 2018-12-18 齐鲁工业大学 A kind of cervical exfoliated cell core image-recognizing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106963370A (en) * 2017-03-27 2017-07-21 广州视源电子科技股份有限公司 Electroencephalogram relaxation degree identification method and device based on support vector machine
CN109033936A (en) * 2018-06-01 2018-12-18 齐鲁工业大学 A kind of cervical exfoliated cell core image-recognizing method
CN108830372A (en) * 2018-06-08 2018-11-16 湖北工业大学 A kind of adaptive particle swarm optimization method of Traveling Salesman Problem

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于支持向量机的火灾火焰识别算法研究";郑高;《中国优秀硕士学位论文全文数据库信息科技辑》;20130215;第4.2.5节 *

Also Published As

Publication number Publication date
CN110598836A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110598836B (en) Metabolic analysis method based on improved particle swarm optimization algorithm
CN108108762B (en) Nuclear extreme learning machine for coronary heart disease data and random forest classification method
Örkcü et al. Estimating the parameters of 3-p Weibull distribution using particle swarm optimization: A comprehensive experimental comparison
US20070208516A1 (en) Random forest modeling of cellular phenotypes
Schliebs et al. On the probabilistic optimization of spiking neural networks
Flores et al. Deep learning tackles single-cell analysis—a survey of deep learning for scRNA-seq analysis
CN105868775A (en) Imbalance sample classification method based on PSO (Particle Swarm Optimization) algorithm
CN110738362A (en) method for constructing prediction model based on improved multivariate cosmic algorithm
CN111402967B (en) Method for improving virtual screening capability of docking software based on machine learning algorithm
CN111079074A (en) Method for constructing prediction model based on improved sine and cosine algorithm
CN113255873A (en) Clustering longicorn herd optimization method, system, computer equipment and storage medium
Zhang et al. An improved MAHAKIL oversampling method for imbalanced dataset classification
Younis et al. A new sequential forward feature selection (SFFS) algorithm for mining best topological and biological features to predict protein complexes from protein–protein interaction networks (PPINs)
Phong et al. PSO-convolutional neural networks with heterogeneous learning rate
CN111429970A (en) Method and system for obtaining multi-gene risk scores by performing feature selection based on extreme gradient lifting method
Togatoropa et al. Optimizing Random Forest using Genetic Algorithm for Heart Disease Classification
Ahmed et al. Predicting and analysis of students’ academic performance using data mining techniques
CN116226629B (en) Multi-model feature selection method and system based on feature contribution
Abd Elaziz et al. Quantum artificial hummingbird algorithm for feature selection of social IoT
Pokhrel A comparison of AutoML hyperparameter optimization tools for tabular data
Ren et al. Investigating several fundamental properties of random lobster trees and random spider trees
Vanitha et al. Detection and diagnosis of hepatitis virus infection based on human blood smear data in machine learning segmentation technique
WO2022084696A1 (en) Drug optimisation by active learning
CN110782950B (en) Tumor key gene identification method based on preference grid and Lewy flight multi-target particle swarm algorithm
CN110334774A (en) A kind of Medical Images Classification algorithm improving MRMR and PSO optimization SVM based on weight

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant