CN104463251A

CN104463251A - Cancer gene expression profile data identification method based on integration of extreme learning machines

Info

Publication number: CN104463251A
Application number: CN201410773130.9A
Authority: CN
Inventors: 凌青华; 韩飞; 叶松林; 杨春; 崔宝祥
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2014-12-15
Filing date: 2014-12-15
Publication date: 2015-03-25

Abstract

The invention discloses a cancer gene expression profile data identification method based on integration of extreme learning machines. The method includes the steps of selection and integration of member extreme learning machines (ELMs). The method concretely includes the steps that preprocessing is carried out on a cancer gene expression profile data set, wherein the preprocessing includes gene selection and normalization of expression profile data; N sample sets are generated through a Bagging method, and each sample set is divided into a training set and a verification set according to a certain proportion; N ELMs are generated on the N training sets in a learning mode, and L ELMs (L&1t; N) with the highest recognition rate on the corresponding verification sets are selected to form an alternative member ELM base; K member ELMs (K&1t; L) forming an integrated system are selected from L ELMs based on the particle swarm optimization algorithm; the integrated vote weight of K member ELMs is worked out by utilizing the minimum-norm least square method; an integrated ELM system is obtained, and the integrated ELM system is used for performing tumor recognition on a newly increased cancer gene expression profile sample. Through the method, the cancer gene expression profile data can be quickly and accurately recognized.

Description

Based on the oncogene express spectra data identification method of integrated extreme learning machine

Technical field

The invention belongs to the application of the Computer Analysis technology of oncogene express spectra data, be specifically related to a kind of oncogene express spectra data identification method based on integrated extreme learning machine.

Background technology

In life science, DNA microarray technology is while biology and medical research bring unprecedented opportunities, and the gene expression profile data of its complexity produced but proposes huge challenge to existing data analysis and process method.First, gene expression profile data has very high dimension (gene), and has again very complicated relation between these genes dimension.The second, gene expression profile data has less sample, and this and huge number gene form uneven contradiction.3rd, gene expression profile data innately has the high variation of strong noise and waits data analysis difficult point.4th, useful informations a large amount of in gene expression profile data is hidden.Traditional computer analysis method can not meet actual needs to the process of gene expression profile data.How to utilize computer data analytical technology (normal or abnormal to gene expression profile data generic quickly and accurately; the different subtype of tumour) identify; guarantee that clinical diagnosis is more objective and accurate, become the gordian technique of oncogene express spectra data analysis.

The external research utilizing machine learning method to identify gene expression profile tumour of recent year is very active, mainly comprises: (1) uses the neural network based on backpropagation and associated gradients algorithm (BP:Backpropagation) to identify gene expression profile tumour.Such as, J.Khan etc. (Classification anddiagnostic prediction of cancers using gene expression profiling and artificial neuralnetworks) identify effectively by four hypotypes of model to roundlet large cortical cells tumour (SRBCT) based on neural network, and identify efficient gene subset the most.But BP and relevant gradient algorithm all exist speed of convergence is easily absorbed in the defects such as local extremum slowly, and network structure is difficult to determine, thus cause tumour accuracy of identification not high and time overhead is large.(2) Support Vector Machine (SVM:support vector machine) is used to identify gene expression profile tumour.Such as, T.S.Fruey etc. (SVM classification and validationof cancer tissue samples using microarray expression data) identify oophoroma (Ovarian), leukaemia (ALL/AML) and colon cancer (Colon) three data centralization sample class with standard SVM, all obtain higher discrimination.Although SVM is applicable to higher-dimension Small Sample Database, the method is only good at process two class classification problem, and in multicategory classification problem, effect is not ideal.In addition, in SVM parameter choose more time-consuming, and also there is no at present choosing of parameter in effective theories integration SVM.(3) extreme learning machine (ELM:extreme learning machine) is used to identify gene expression profile tumour.Such as, ELM is used for tumour identification by F.Han etc. (A Novel Strategy for Gene Selection of Microarray Data Based onGene-to-Class Sensitivity Information) on the basis that gene is selected, (Leukemia on six gene expression profile data collection, Colon, SRBCT, LUNG, Brain cancer and Lymphoma) all obtain the accuracy of identification being better than classical way.ELM obtains the unique weights solution of Single hidden layer feedforward neural networks with the method for resolving, and demonstrate this solution theoretically and can ensure that minimum training error and minimum norm export weights, therefore this algorithm can obtain optimum Generalization Capability with the extremely short time, and this is that other learning algorithm is incomparable.ELM can provide a unified platform for various application, can approach arbitrary continuous function, can classify to arbitrary disjoint region.Although some SVM (as MOC-LS-SVM], based on the SVM of Bayes rule) can be used in solving many classification problems, they increase computing time and complexity.A large amount of experiments shows, ELM has the measurability more excellent than SVM, close (returning and two class classification problems) or more excellent (many classification problems) Generalization Capability and speed of convergence faster.Obviously, in order to improve processing speed and the precision of gene expression profile data, ELM is a reasonably selection.On the other hand, cause because of the input layer weights of ELM Stochastic choice list hidden layer feedforward network that network hidden node number is on the high side, output layer weights norm and hidden layer output matrix conditional number increase, thus affect the response time of ELM on test set and estimated performance.

Carry out analysis to above-mentioned research can find, although single sorter can be used for carrying out the identification of gene expression profile tumour, its recognition performance still has larger room for promotion.Integrated classifier can make up the deficiency that single sorter exists well.Several single classifier is classified to certain sample jointly, and their recognition result is integrated by certain integrated rule, so just can effectively improve tumour accuracy of identification.Integrated study can play the performance of each member classifiers effectively, plays the complementarity between them fully, thus the Generalization Capability of improvement system, Classification and Identification rate and stability.

Traditional integrated classifier is mainly by multiple feedforward network based on BP or integrated by multiple SVM.Such as, Zhou Zhihua etc. (Ensembling neural networks:Many could be better than all) propose GASEN integrated approach, the method first trains each member's neural network, then uses genetic algorithm (GA:genetic algorithm) to optimize integrated weights.Experimental result in multiple recurrence and categorized data set shows that the performance of GASEN is better than classical Bagging and Boosting algorithm.Although can improve the classification performance of system, BP network or SVM self Problems existing still embody with certain degree in an integrated system.Because of the performance that ELM is good, multiple ELM integrates to improve integrated system performance further by people.Such as, multiple ELM integrates and has carried out successful prediction to landslide, plain boiled water section, reservoir area of Three Gorges by Cheng Lian etc. (Ensemble of extreme learning machine for landslidedisplacement prediction based on time series analysis).But these integrated ELM are just simply integrated by multiple ELM, all too simple choosing with the design of integrated rule of member ELM, thus its performance still has larger room for promotion.Up to now, ELM is integrated and tumour identification is carried out to gene expression profile data yet there are no corresponding report.

Summary of the invention

Goal of the invention: the object of the invention is to propose a kind of oncogene express spectra data identification method based on integrated extreme learning machine, the method also can identify tumour classification more fast exactly.

Technical scheme: a kind of oncogene express spectra data identification method based on integrated extreme learning machine, comprises the integrated step between the selection of member ELM and member ELM, comprise the following steps:

Step 1: the pre-service of oncogene express spectra data set, the gene comprising tumour express spectra data is selected and normalization;

Step 2: by Bagging method, N number of sample set is generated according to a certain percentage to the data set obtained in step 1, and this N number of sample set is generated by a certain percentage again N number of training set and checking collection;

Step 3: on N number of training set in step 2, study generates N number of extreme learning machine, selects a highest L ELM (L<N) according to the discrimination of N number of ELM on corresponding checking collection and forms alternative member ELM storehouse;

Step 4: with the diversity factor in integrated system between member ELM for optimization aim, utilizes K the extreme learning machine of member (K<L) of standard particle colony optimization algorithm optimum option composition integrated system from L ELM;

Step 5: utilize Minimum Norm least square method to calculate the integrated ballot weight of K the extreme learning machine of member;

Step 6: K that tries to achieve ballot weight is carried out integrated to the extreme learning machine of a corresponding K member, obtains an integrated ELM system, this integrated ELM system is carried out tumour identification to newly-increased oncogene express spectra sample.

The following step is comprised further in described step 4:

Step 4.1: initialization is carried out to the position of particle each in population and speed; K ELM of random selecting from L ELM, represents group membership's learning machine using the numbering of this K ELM as initial position i.e. each particle of particle, and particle initial velocity is random in (0,1) to be obtained; In K dimension space, the position of i-th particle can be expressed as vector x _i=(x _i1, x _i2..., x _iK), k=1,2 ..., K, x _ikrepresent kth member's learning machine of i-th integrated system, the speed of particle flight is expressed as vector v _i=(v _i1, v _i2..., v _ik);

Step 4.2: present speed and position according to following formula adjustment particle:

v _id(t+1)＝w×v _id(t)+c ₁×rand(t)×(p _id(t)-x _id(t))+c ₂×rand(t)×(p _gd(t)-x _id(t)) (1)

x _id(t+1)＝x _id(t)+v _id(t+1) (2)

Step 4.3: the adaptive value calculating each particle according to formula (4);

Fitness function in optimizing using the similarity between member as standard particle group, similarity is here the included angle cosine between the input layer weight matrix of any two ELM and two vectors of hidden unit threshold vector conversion; If cosine value is less, mean that between two vectors, angle is larger, show that the difference between the input weight matrix of two ELM and hidden unit threshold value is larger, thus the diversity factor of two ELM is larger; Otherness between member is converted into the folder cosine of an angle between two vectors, calculates the diversity factor between member by calculating included angle cosine, specific as follows:

\cos θ = \frac{α \cdot β}{| α | \cdot | β |} - - - (3)

fitness (i) = Σ_{k_{1} = 1}^{K} Σ_{K_{2} = k_{1} + 1}^{K} \cos θ_{{ik}_{1} k_{2}} = Σ_{k_{1} = 1}^{K} Σ_{k_{2} = k_{1} + 1}^{K} \frac{α_{{ik}_{1}} \cdot α_{{ik}_{2}}^{T}}{| α_{{ik}_{1}} | \cdot | α_{{ik}_{2}}^{T} |} - - - (4)

Wherein

Z_{ik} = [{WH}_{ik}; B_{ik}], α_{ik} = {(Z_{ik} (:))}^{T}, B_{ik} = {[b_{11}, b_{21}, \cdot \cdot \cdot, b_{H 1}]}_{H \times 1}^{T},

α in formula (3), β represent two vectors respectively, and θ represents the angle between these two vectors; WH _ik, B _ik, Z _ikrepresent the input weight matrix of i-th particle kth dimension member ELM, hidden layer threshold vector and their connection matrix respectively, α _ikrepresent Z _ikaccording to the row vector that the form of row changes into, represent kth in i-th particle ₁and kth ₂representated by individual component the included angle cosine value of two ELM; If the less i.e. angle of cosine value is larger, then similarity is less thus diversity factor that is two ELM is larger between the two; Otherwise diversity factor is less between the two.Fitness (i) to represent in i-th particle arbitrarily the summation of Similarity value between member ELM between two, fitness (i) value is less, represent that the overall similarity between each ELM in i-th particle is less, i.e. diversity factor in the integrated system of this particle representative between member is larger; The each component of particle position is the numbering of selected member ELM, and each time, iteration is complete all rounds for speed; Meanwhile, the position of particle, only in [1, L] interior value, if the value of a certain position of certain particle is greater than L, is then got L, if be less than 1, is then got 1;

Step 4.4: in particle group optimizing process, by the optimal location P of the Similarity value of each particle and its process _ithe Similarity value of (history optimal location) compares, if less, then upgrading current particle history optimal location is current particle;

Step 4.5: in particle group optimizing process, by the Similarity value of each particle and global optimum position P _gsimilarity value contrast, if less, then upgrading global optimum particle is current particle;

Step 4.6: as do not reached the set goal (global optimum's particle there is enough good fitness value) or not reaching the maximum iteration time preset, be then back to step 4.2, otherwise go to step 4.7;

Step 4.7: export global optimum's particle, the representative of this particle is finally selected optimum member ELM and is gathered.

The following step is comprised further in described step 5:

Step 5.1: K the ELM according to optimizing in step 4 calculates the output on oncogene express spectra data set, judges the output classification at all samples, writes down class label; Formula (5) represents the output class distinguishing label matrix of K ELM on oncogene express spectra data set Ntr sample, YY _l,krepresent the output classification of a kth ELM on l sample;

Step 5.2 obtains the ballot weight of integrated system according to Minimum Norm least square method, namely obtains the Minimal Norm Least Square Solutions of β in formula (6) by formula (7);

Optimum integrated system ballot weights β vector should meet formula (6), and wherein T represents the desired output categorization vector of Ntr sample on oncogene express spectra data set, β _krepresent the ballot weight of a kth ELM in integrated system,

(YY) β=T (6) wherein

T＝[t ₁,t ₂,…,t _Ntr] ^T,β＝[β ₁,β ₂,…,β _K] ^T

Minimum Norm least square method is adopted to ask the voting right vector of integrated system, as shown in the formula:

\hat{β} = {YY}^{+} T - - - (7)

YY ⁺it is mole Peng Denuosi generalized inverse of the output class distinguishing label matrix of Ntr sample on K ELM.

Beneficial effect: there is variation and noise in the oncogene express spectra data of higher-dimension small sample, although traditional method can identify gene expression profile tumour but still there is the defect that precision is not high, time overhead is excessive.The speed of convergence that the present invention is exceedingly fast from ELM and well convergence precision are started with, and the basis to gene expression profile data analysis is set up integrated ELM tumour model of cognition; In conjunction with gene expression profile data, have studied the diversity factor between member ELM, propose the member ELM system of selection based on similarity between PSO and member ELM and the integrated rule based on Minimum Norm least square method; Compared with the recognition methods of existing gene expression profile tumour, the present invention greatly reduces integrated system study expense, and substantially increases gene expression profile tumour recognition accuracy.

Accompanying drawing explanation

Fig. 1 is structured flowchart of the present invention;

Fig. 2 is that in the present invention, gene selects process flow diagram;

Fig. 3 is that in the present invention, member ELM chooses process flow diagram;

Fig. 4 is that the integrated weights of member ELM in the present invention obtain process flow diagram;

Fig. 5 is to Brain cancer hypotype predictablity rate curve map in the present invention;

Fig. 6 is the number graph of a relation of Brain cancer hypotype predictablity rate and member ELM in the present invention;

Fig. 7 is the similarity change curve of each member ELM in the integrated system in the present invention corresponding to optimal particle.

Embodiment

A kind of oncogene express spectra data identification method based on integrated extreme learning machine, comprise the integrated step between the selection of member ELM and member ELM, ELM (Extreme learning machine) in the present invention is extreme learning machine, and the present invention specifically comprises the following steps:

Step 4: with the diversity factor in integrated system between member ELM for optimization aim, utilizes standard particle group to optimize K the extreme learning machine of member (K<L) of (PSO:Particle swarm optimization) algorithm optimum option composition integrated system from L ELM;

The following step is comprised further in described step 4:

Step 4.1: initialization is carried out to the position of particle each in population and speed.K ELM of random selecting from L ELM, represents group membership's learning machine using the numbering of this K ELM as initial position i.e. each particle of particle, and particle initial velocity is random in (0,1) to be obtained.In K dimension space, the position of i-th particle can be expressed as vector x _i=(x _i1, x _i2..., x _iK), k=1,2 ..., K, x _ikrepresent kth member's learning machine of i-th integrated system, the speed of particle flight is expressed as vector v _i=(v _i1, v _i2..., v _ik).

Step 4.2: according to present speed and the position of formula (1) and (2) adjustment particle.

x _id(t+1)＝x _id(t)+v _id(t+1) (2)

Step 4.3: the adaptive value calculating each particle according to formula (4).

In order to increase diversity factor in integrated system between member ELM to improve the generalization ability of integrated system, the present invention is using the similarity between member as the fitness function in standard P SO.Here similarity is the included angle cosine between the input layer weight matrix of any two ELM and two vectors of therefore hidden unit threshold vector conversion.If cosine value is less, mean that between two vectors, angle is larger, namely show that the difference between the input weight matrix of two ELM and hidden unit threshold value is larger, thus the diversity factor of two ELM is larger.Otherness between member is converted into the folder cosine of an angle between two vectors by the present invention, calculates the diversity factor between member by calculating included angle cosine, specific as follows:

\cos θ = \frac{α \cdot β}{| α | \cdot | β |} - - - (3)

fitness (i) = Σ_{k_{1} = 1}^{K} Σ_{K_{2} = k_{1} + 1}^{K} \cos θ_{{ik}_{1} k_{2}} = Σ_{k_{1} = 1}^{K} Σ_{k_{2} = k_{1} + 1}^{K} \frac{α_{{ik}_{1}} \cdot α_{{ik}_{2}}^{T}}{| α_{{ik}_{1}} | \cdot | α_{{ik}_{2}}^{T} |} - - - (4)

Wherein

Z_{ik} = [{WH}_{ik}; B_{ik}], α_{ik} = {(Z_{ik} (:))}^{T}, B_{ik} = {[b_{11}, b_{21}, \cdot \cdot \cdot, b_{H 1}]}_{H \times 1}^{T},

α in formula (3), β represent two vectors respectively, and θ represents the angle between these two vectors; WH _ik, B _ik, Z _ikrepresent the input weight matrix of i-th particle kth dimension member ELM, hidden layer threshold vector and their connection matrix respectively, α _ikrepresent Z _ikaccording to the row vector that the form of row changes into, represent kth in i-th particle ₁and kth ₂representated by individual component the included angle cosine value of two ELM.If the less i.e. angle of cosine value is larger, then similarity is less thus diversity factor that is two ELM is larger between the two; Otherwise diversity factor is less between the two.Fitness (i) to represent in i-th particle arbitrarily the summation of Similarity value between member ELM between two, fitness (i) value is less, represents that the diversity factor in the integrated system of overall similarity less this particle i.e. representative between each ELM in i-th particle between member is larger.

In the present invention, because each component of particle position is the numbering of selected member ELM, so each time, iteration is complete all must round for speed.Meanwhile, the position of particle in [1, L] interior value, if the value of a certain position of certain particle is greater than L, then can only be got L, if be less than 1, then gets 1.

Step 4.4: in particle group optimizing process, by the optimal location P of the Similarity value of each particle and its process _ithe Similarity value of (history optimal location) compares, if less, then upgrading current particle history optimal location is current particle.

Step 4.5: in particle group optimizing process, by the Similarity value of each particle and global optimum position P _gsimilarity value contrast, if less, then upgrading global optimum particle is current particle.

The following step is comprised further in described step 5:

Step 5.1: K the ELM according to optimizing in step 4 calculates the output on oncogene express spectra data set, judges the output classification at all samples, writes down class label.Formula (5) represents the output class distinguishing label matrix of K ELM on oncogene express spectra data set Ntr sample, YY _l,krepresent the output classification of a kth ELM on l sample;

Step 5.2: the ballot weight obtaining integrated system according to Minimum Norm least square method, namely obtains the Minimal Norm Least Square Solutions of β in formula (6) by formula (7).

(YY)β＝T (6)

Wherein

T＝[t ₁,t ₂,…,t _Ntr] ^T,β＝[β ₁,β ₂,…,β _K] ^T

Minimum Norm least square method is adopted to ask the voting right vector of integrated system in the present invention, as shown in the formula:

\hat{β} = {YY}^{+} T - - - (7)

Here YY ⁺it is mole Peng Denuosi generalized inverse of the output class distinguishing label matrix of Ntr sample on K ELM.Above-mentioned improvement is too simple for the integrated rule of existing integrated ELM, as great majority ballot method, the present invention proposes to use Minimum Norm least square method to obtain the ballot weights of each member ELM to improve the Generalization Capability of integrated system further, thus improves the accuracy of identification of gene expression profile tumour further.

Below with oncogene express spectra data instance, implementation of the present invention is described simply.This example selects the cancer of the brain (Brain cancer) data set, and this data set comprises 60 samples altogether, and each sample is containing 7129 genes.This sample data collection is divided into two classes: 46 patients with classic samples and 14 patientswith desmoplastic samples, data are from http://linus.nci.nih.gov/ ~ brb/DataArchive_New.html.Although cancer of the brain categories of datasets is only divided into two tumors subtypes, because the difference of the expression of its gene in two classes is obvious not, thus cause the precision of prediction of a lot of classical way (as k nearest neighbor, SVM etc.) on this data set all very low.On this data set, concrete execution step of the present invention is as follows:

As shown in Figure 1, a kind of oncogene express spectra data identification method based on integrated extreme learning machine, comprises the constitution step with integrated rule of choosing of member ELM, and choosing of member ELM comprises the following steps with the constitution step of integrated rule:

(1) by linear transformation, each gene expression dose is normalized between [-1,1].Gene Selection Method KMeans-GCSI-MBPSO-ELM (the Han F using us to propose, Sun W, Ling Q-H (2014) A Novel Strategy for Gene Selection of Microarray Data Based on Gene-to-ClassSensitivity Information.PLoS ONE 9 (5): e97530.doi:10.1371/journal.pone.0097530) carry out gene selection.As shown in Figure 2, the method is considering that Data Base is because of on classification sensitivity information (GCSI:gene-to-class sensitivity information) basis, K-mean cluster is used to carry out gene selection in conjunction with scale-of-two PSO and ELM, to select relevant gene sets high to tumour classification.Table 1 gives and carries out to Brain cancer express spectra data 50 genes choosing the frequency the highest that gene selects acquisition by KMeans-GCSI-MBPSO-ELM method.

Table 1 KMeans-GCSI-MBPSO-ELM method carries out to Brain cancer express spectra data 50 genes choosing the frequency the highest that gene selects acquisition

(2) cancer of the brain tumour express spectra data set is divided into former state notebook data collection and newly-increased sample data collection by 1:1, and generates N (N=70) individual training set and checking collection by Bagging method (having the arbitrary sampling method put back to) further in the ratio of 2:1 on former state notebook data collection.Learn generation ELM (in the present embodiment often organizing on training set, in each ELM, Hidden nodes is 300, hidden unit activation function is sigmoid function), and L (L=20) the individual ELM the highest according to the discrimination primary election checking accuracy rate of each ELM on corresponding checking collection forms initial alternative member ELM storehouse.

(3) K the extreme learning machine of member (K<L) of standard particle colony optimization algorithm optimum option integrated system from L ELM is utilized (in the present embodiment, compared by many experiments and determine K=9), as shown in Figure 3, concrete steps are as follows:

1. initialization is carried out to the position of particle each in population and speed: K ELM of random selecting from L ELM, represents group membership's learning machine using the numbering of this K ELM as particle initial position and each position; Each particle rapidity v _i=(v _i1, v _i2..., v _ik) each component initial value be then obtain from random in (0,1).In K dimension space, the position of i-th particle can be shown as vector x _i=(x _i1, x _i2..., x _iK), k=1,2 ..., K, x _ikrepresent kth member's learning machine of i-th integrated system.In the present embodiment, Population Size is 30.

2. according to present speed and the position of formula (1) and (2) adjustment particle.In the present embodiment, inertia weight w is set to 2; Aceleration pulse c ₁and c ₂be respectively 0.9 and 1.7.

3. the adaptive value of each particle is calculated according to formula (4).The fitness function of particle is delineated by formula (3) and (4).In the present embodiment, because each component of particle position is the numbering of selected member ELM, so speed all must round in iteration each time.Meanwhile, the position of particle in [1, L] interior value, if the value of a certain position of certain particle is greater than L, then can only be got L, if be less than 1, then gets 1.

4. in particle group optimizing process, by the optimal location P of the Similarity value of each particle and its process _ithe Similarity value of (history optimal location) compares, if less, then upgrading current particle history optimal location is current particle.

5. in particle group optimizing process, by the Similarity value of each particle and global optimum position P _gsimilarity value contrast, if less, then upgrading global optimum particle is current particle.

6. as do not reached the maximum iteration time (being 30 in the present embodiment) preset, be then back to step 2., otherwise export overall particle, the representative of this particle is finally selected optimum member ELM and is gathered.

(4) utilize Minimum Norm least square method to solve the integrated ballot weight of K the extreme learning machine of member, as Fig. 4, concrete steps are as follows:

1. according to the output of K ELM on former state notebook data collection optimized in step (3), judge the output classification of all samples on former state notebook data collection, write down class label, shown in (5).

2. obtain the ballot weight of integrated system according to Minimum Norm least square method, namely obtain the Minimal Norm Least Square Solutions of β in formula (6) by formula (7)

(5) K the ballot weight of step (4) being tried to achieve is carried out integrated to the extreme learning machine of a corresponding K member, obtains an integrated ELM system, and this integrated system is carried out tumour identification to 30 test sample books.Fig. 5 provides independent operating of the present invention 50 corresponding tumour accuracy of identification, its Average Accuracy reaches 93.48%, far above sorter single under homologous genes system of selection to the recognition accuracy (the identification preparation rate as ELM is the recognition accuracy of 80.40%, SVM is 80.55%) of Brain cancer and simple integrated ELM to the recognition accuracy (88.17%) of Brain cancer.

Claims

1., based on an oncogene express spectra data identification method for integrated extreme learning machine, comprise the integrated step between the selection of member ELM and member ELM, it is characterized in that, comprise the following steps:

2. the oncogene express spectra data identification method based on integrated extreme learning machine according to claim 1, is characterized in that, comprise the following step further in described step 4:

x _id(t+1)＝x _id(t)+v _id(t+1) (2)

\cos θ = \frac{α \cdot β}{| α | \cdot | β |} - - - (3)

fitness (i) = Σ_{k_{1} = 1}^{K} Σ_{k_{2} = k_{1} + 1}^{K} \cos θ_{{ik}_{1} k_{2}} = Σ_{k_{1} = 1}^{K} Σ_{k_{2} = k_{1} + 1}^{K} \frac{α_{{ik}_{1}} \cdot α_{{ik}_{2}}^{T}}{| α_{{ik}_{1}} | \cdot | α_{{ik}_{2}}^{T} |} - - - (4)

Wherein

Z_{ik} = [{WH}_{ik}; B_{ik}], α_{ik} = {(Z_{ik} (:))}^{T}, B_{ik} = {[b_{11}, b_{21}, . . ., b_{H 1}]}_{H \times 1}^{T},

{WH}_{ik} = {[\begin{matrix} {wh}_{11} & {wh}_{12} & . . . & {wh}_{1 n} \\ {wh}_{21} & {wh}_{22} & . . . & {wh}_{2 n} \\ . & . & . \\ . & . & . . . & . \\ . & . & . \\ {wh}_{H 1} & {wh}_{H 2} & . . . & {wh}_{HN} \end{matrix}]}_{H \times n};

3. the oncogene express spectra data identification method based on integrated extreme learning machine according to claim 1, is characterized in that, comprise the following step further in described step 5:

YY = [\begin{matrix} {YY}_{1,1} & {YY}_{1,2} & . . . & {YY}_{1, K} \\ {YY}_{2,1} & {YY}_{2,2} & . . . & {YY}_{2, K} \\ . & . & . \\ . & . & . . . & . \\ . & . & . \\ {YY}_{Ntr, 1} & {TT}_{Ntr, 2} & . . . & {YY}_{Ntr, K} \end{matrix}] - - - (5)

(YY)β＝T (6)

Wherein

T＝[t ₁,t ₂,…,t _Ntr] ^T,β＝[β ₁,β ₂,…,β _K] ^T

\hat{β} = {YY}^{+} T - - - (7)