WO2008072140A2 - Ranking of features - Google Patents

Ranking of features Download PDF

Info

Publication number
WO2008072140A2
WO2008072140A2 PCT/IB2007/054939 IB2007054939W WO2008072140A2 WO 2008072140 A2 WO2008072140 A2 WO 2008072140A2 IB 2007054939 W IB2007054939 W IB 2007054939W WO 2008072140 A2 WO2008072140 A2 WO 2008072140A2
Authority
WO
WIPO (PCT)
Prior art keywords
feature
features
subsets
subset
pool
Prior art date
Application number
PCT/IB2007/054939
Other languages
English (en)
French (fr)
Other versions
WO2008072140A3 (en
Inventor
Angel A. J. Janevski
James D. Schaffer
Mark R. Simpson
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to JP2009540911A priority Critical patent/JP2010514001A/ja
Publication of WO2008072140A2 publication Critical patent/WO2008072140A2/en
Publication of WO2008072140A3 publication Critical patent/WO2008072140A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Definitions

  • the invention relates to pattern discovery, and in particular to the ranking of measurements comprised in said patterns. It finds particular application in the evaluation of classifiers for bioinformatics.
  • Microarrays include glass slides or plates on which arrays of small sample "dots" of c-DNA or another binder are disposed. Each dot comprises a specific c-DNA or other binder that bonds with a specific macro molecule of interest, and a single microarray may include hundreds, thousands, or more such dots.
  • a tissue sample is extracted from a patient, and the molecular species of interest (for example, DNA, RNA, etc.) is extracted, treated with a luminescent signaling agent or other marker, and washed over the microarray.
  • Specific types of macromolecules in the tissue collect at dots having binders keyed to those specific macromolecules in a process called hybridization.
  • a comparison or reference sample treated with a different marker for example, a differently colored luminescent agent
  • a different marker for example, a differently colored luminescent agent
  • the marker or markers are excited, for example with a laser beam, into producing photoluminescence, and the response intensity is measured so as to characterize the concentration of macromolecules associated with the various dots.
  • an assay of a large number of organic macromolecules e.g., hundreds, thousands, or more
  • Mass spectrogram analysis is another method for rapidly assaying concentrations of large numbers of macromolecules in a sample drawn from a patient.
  • the sample is ionized by a laser or other mechanism in a vacuum environment, and the distribution of molecular weight/electric charge ratios of the ionized molecular fragments is measured by an ion counter.
  • concentrations of various macromolecules can be derived from the mass spectrogram on the basis of known cracking patterns of various macromolecules.
  • the peaks of the mass spectrogram may be used as bioinformatics measurement data without correlating the mass spectrogram pattern with specific macromolecules.
  • Bioinformatics employs numerical methods to extract useful biological information from microarray measurements, mass spectrograms, or other genomic or organic macro molecular assays. For example, if a particular pattern in the microarray or mass spectrogram can be strongly correlated with a particular type of cancer, then the pattern can be used as a classifier for screening for that cancer. This enables early detection of cancers and other pathologies of interest by relatively non- invasive techniques such as drawing blood or cerebral spinal fluid, taking a sample of saliva, urine, feces, or so forth, or otherwise acquiring a fluid or tissue sample. A problem arises, however, due to the large amount of information available for developing such diagnostic medical tests.
  • a subset of five measurements is an optimum for the cancer screening test under development, which may be incorrect.
  • the optimum subset of measurements may be four measurements, six measurements, etc., and is usually unknown.
  • Another problem in developing genomic diagnostic medical tests is that the total number of measurements is large, but the number of patients from which these measurements are drawn is typically much smaller. For example, a typical study may use a 50x50 microarray and a test group of 40 test subjects in which 20 subjects have the cancer of interest and 20 subjects are controls who do not have the cancer.
  • the small number of subjects from which a large number of measurements are drawn is the reason why a useful ranking of measurements based on an evaluation of each measurement cannot be achieved.
  • the number of ranks available for each measurement is limited by the number of subjects and is accordingly far too small for evaluating each measurement.
  • each chromosome has a set of genes that indicates a subset of the set of measurements. For example, using a set of measurements generated by a 50x50 microarray, each gene has a value between 1 and 2500 corresponding to the indices of the 2500 measurements provided by the 2500 dots of the microarray. Five such genes in a single chromosome suitably specify a specific subset of five of the 2500 measurements.
  • a classifier uses genes specified by the chromosome to classify subjects into two or more classifications (for example, a cancer classification and a non-cancer classification).
  • a figure of merit measures how accurately the classifier identifies cancer in a group of patients and is used to select the best-fit chromosomes of the chromosome pool for propagation into future generations.
  • Offspring chromosomes are subsequently mutated by random or pseudorandom changes in the gene values analogously to biological mutation processes. After many such mutations and pseudorandom changes, the chromosomes are optimized with respect to their capability of classifying subjects into two or more classes, for example a cancer class and a non-cancer class.
  • a genetic algorithm produces an optimized set of chromosomes, each chromosome comprising genes.
  • the genes are referred to as features, and chromosomes are referred to as feature subsets.
  • the set of genes representing the set of measurements is hereinafter referred to as a pool of features.
  • a subset of features from the pool of features is useful as a classifier in classifying subjects, hereinafter referred to as study objects, on which measurements are taken, e.g. patients or tissue samples, into two or more classes.
  • study objects on which measurements are taken, e.g. patients or tissue samples, into two or more classes.
  • the optimized subsets of features produced by a genetic algorithm are of similar quality as regards their usefulness in classifying study objects into two or more classes.
  • a method of computing a rank of at least one feature from a pool of features comprising: obtaining a plurality of feature subsets, wherein each feature subset comprises features from the pool of features; and - computing the rank of the at least one feature from the pool of features from the occurrence of the at least one feature in a feature subset.
  • the plurality of feature subsets of features from the pool of features may be obtained by any suitable method, e.g. a method based a genetic algorithm.
  • the obtained plurality of feature subsets comprises feature subsets which are suitable for classifying a study object.
  • the feature subsets from the plurality of feature subsets are of predominantly high quality as regards their usefulness in classifying study objects.
  • features comprised in an obtained subset of features may be considered useful in classifying the study object.
  • a plurality of feature subsets may be obtained such that, for example, at least half of the classifiers of a set of study objects, each classifier being defined on the basis of a feature subset from the plurality of feature subsets, has a performance rating based on the set of study objects of greater than 50%.
  • a rank of a feature from the pool of features may depend on the number of feature subsets in which said feature occurs.
  • At the heart of the invention lies a conjecture that a feature that occurs in many feature subsets from the plurality of feature subsets is typically more useful in classifying a study object than a feature that occurs in few feature subsets from the plurality of feature subsets. This conjecture has been validated in numerous experiments.
  • the method may be advantageously applied to the ranking of features from the pool of features.
  • a new subset of features may be created comprising the top-rank features, which are potentially more useful in classifying study objects than a subset of features from the pool of features.
  • each feature from the pool of features is inherently taking into account its classification power in combination with other features.
  • the individual features comprised in said feature subset are inherently complementary with respect to their performance in the classification of study objects.
  • the process of setting up selection criteria for obtaining the plurality of feature subsets and of setting up methods of computing feature ranks opens a powerful path towards finding useful features.
  • the top-ranked features may be combined into new feature subsets that are less likely to be spurious. Therefore, the invention provides a method of finding significant features and feature subsets that are more likely to be truly associated with a class in the classification of study objects, e.g. of finding significant features and feature subsets describing biomarkers useful in classifying a clinical condition of a patient.
  • the plurality of feature subsets is obtained from an evolutionary computing algorithm. For a large pool of features, the number of all feature subsets is very large. It is thus not viable to evaluate each feature subset of the pool of features.
  • An evolutionary computing algorithm is capable of producing feature subsets which are optimized on the basis of their ability to classify a set of study objects.
  • an evolutionary computing algorithm takes into account combined abilities of multiple features comprised in a feature subset to obtain a useful classification of a study object.
  • obtaining the plurality of feature subsets comprises selecting the plurality of feature subsets from a plurality of candidate feature subsets on the basis of a selection criterion. This renders it possible to select an optimum plurality of feature subsets from the plurality of candidate feature subsets produced, for example by means of an evolutionary algorithm.
  • each candidate feature subset from the plurality of candidate feature subsets is associated with a characteristic of the respective candidate feature subset, and the selection criterion is based on an evaluation of the characteristic of said respective candidate feature subset. Using characteristics of candidate feature subsets helps in selecting an optimized plurality of feature subsets from the plurality of candidate feature subsets.
  • computing the rank of the at least one feature from the pool of features is further based on the frequency of occurrence of the at least one feature in the plurality of feature subsets.
  • Features having a relatively higher frequency of occurrence i.e. which occur in many feature subsets, receive a higher rank than features having a relatively lower frequency of occurrence, i.e. which occur in fewer feature subsets.
  • each subset of features from the plurality of feature subsets is associated with a characteristic of the respective subset of features.
  • the characteristic of the feature subset may be advantageously used for evaluating the feature subset according to its usefulness in computing a rank of a feature comprised in the feature subset. For example, a weight based on the feature characteristic may be assigned to each feature subset from the plurality of feature subsets.
  • computing the rank of the at least one feature from the pool of features is further based on the characteristic associated with the respective feature subset from the plurality of feature subsets.
  • a contribution of a feature subset to the rank of the at least one feature may be given a weight based on the characteristic associated with said feature subset.
  • the rank of the at least one feature from the pool of features is computed from a co-occurrence of two or more features from the pool of features in a feature subset from the plurality of feature subsets. For example, two features that always occur together in a feature subset may receive a higher rank, thus taking into account their combined power in classifying study objects.
  • the method further comprises creating a list of ranked features based on the computed rank of the at least one feature.
  • the list of ranked features may be very useful in creating an optimized feature subset for classifying study objects.
  • a module for computing a rank of at least one feature from a pool of features comprising: an obtaining unit for obtaining a plurality of feature subsets, wherein each feature subset comprises features from the pool of features; and - a computation unit for computing the rank of the at least one feature from the pool of features on the basis of an occurrence of the at least one feature in a feature subset.
  • a computer program product is provided for instructing a processing unit to execute the method of claim 1 when said computer program product is run on a computer.
  • Fig. 1 is a flowchart of an implementation of the method of computing a rank of at least one feature from a pool of features
  • Fig. 2 schematically illustrates a embodiment of a module for computing a rank of at least one feature from a pool of features.
  • Fig. 1 is a flowchart of an exemplary implementation of the method 10 of computing a rank of at least one feature from a pool of features.
  • the method 10 begins with obtaining 1 a plurality of feature subsets, each feature subset comprising features from the pool of features. After obtaining 1 the plurality of feature subsets, the method continues with computing 2 a rank of a feature from the pool of features on the basis of the occurrence of the at least one feature in a feature subset. If the feature from the pool of features occurs in a number of feature subsets from the plurality of feature subsets, wherein the number may be predetermined or based on a user input, the feature can be ranked as relevant and receive, for example, the rank 1.
  • the feature can be ranked as not relevant and receive the rank 0.
  • the method 10 may continue with computing 2 the rank of another feature from the pool of features based on occurrence of said another feature in a feature subset. The method continues computing 2 feature ranks until a condition for terminating computing 2 the feature ranks is met, e.g. when all features from the pool of features are ranked. Once the condition for terminating computing 2 feature ranks has been met, the method 10 terminates.
  • the method further comprises creating 3 a list of ranked features on the basis of the computed rank of the at least one feature.
  • the list of ranked features may be used to determine a useful subset of features e.g. for classifying a study object.
  • the plurality of feature subsets of features is obtained from an evolutionary computing algorithm.
  • An example of the evolutionary computing algorithm is a genetic algorithm.
  • implementations of the method 10 of the invention are described with reference to genetic algorithms, the scope of the invention is not limited to algorithms of this type. In general, any algorithm that produces a plurality of feature subsets may be used by the method 10.
  • Such algorithms include, but are not limited to, evolutionary algorithms, evolutionary programming, evolution strategy, genetic programming, iterated local search, and learning classifier systems.
  • a genetic algorithm run typically comprises several experiments. Each experiment starts with a different initial ensemble of feature subsets. This ensemble of feature subsets is called the first generation of feature subsets. Each feature subset from the initial ensemble of feature subsets may comprise features randomly selected from the pool of features. Each feature subset from the ensemble of feature subsets is evaluated as to its usefulness in classifying study objects from a learning set of study objects. A performance rating based on this evaluation may be assigned to the feature subset. After each evaluation, each feature subset comprised in the ensemble of feature subsets may be modified by means of a mutation operation, a crossover operation, and/or other operations, whereby potentially useful features in each feature subset are retained and potentially not useful features are eliminated from each feature subset.
  • the updated ensemble of feature subset is called the next generation, e.g. the second generation, the third generation, etc. of feature subsets.
  • Each modified feature subset comprised in the updated ensemble of feature subsets is again evaluated.
  • the iteration of the modification-evaluation cycle continues until a termination condition is satisfied.
  • the termination condition may be based on a comparison of the ensembles of feature subsets before and after updating.
  • the iteration of modification-evaluation cycle is terminated when the feature subsets comprised in the ensemble of feature subsets before and after modification are similar.
  • Each experiment may comprise multiple so-called soft restarts.
  • a soft restart may be performed when a termination condition to terminate the iteration of modification-evaluation iteration cycle occurs.
  • each feature subset comprised in the ensemble of feature subsets is again randomly initialized, i.e. some features are removed from the feature subsets and some features from the pool of features are added to the feature subsets, but at least one feature subset, typically the one with the best performance rating, is kept intact.
  • An experiment may be terminated after a predetermined number of soft restarts have been performed.
  • a genetic algorithm is described in the published patent application WO 2005/078629 entitled “Genetic algorithms for optimization of genomics-based medical diagnostic tests", which is incorporated herein by reference. Further aspects of genetic algorithms and their applications are described in the published paper "A Genetic Algorithm Approach for Discovering Diagnostic Patterns in Molecular Measurement Data" by D. Schaffer, A. Janevski and M. Simpson, published in the Proceedings of the 2005 IEEE Symposium on Computational
  • obtaining 1 the plurality of feature subsets comprises creating feature subsets by executing a genetic algorithm.
  • the plurality of feature subsets may be obtained from a run or from a plurality of runs of the genetic algorithm.
  • the plurality of feature subsets generated during the runs may be stored in a memory device such that they can be retrieved by the method 10 of the invention.
  • the obtained plurality of feature subsets comprises all subsets generated by a run of a genetic algorithm.
  • the plurality of feature subsets comprises all feature subsets comprised in the initial ensemble of feature subsets and in the updated ensembles of feature subsets at each soft restart of the genetic algorithm and in each experiment comprised in a run of the genetic algorithm.
  • obtaining 1 the plurality of feature subsets comprises selecting the plurality of feature subsets from a plurality of candidate feature subsets on the basis of a selection criterion.
  • the plurality of candidate feature subsets may comprise all subsets generated by a run of a genetic algorithm, whereas the plurality of feature subsets may comprise, for example, feature subsets generated in a predetermined number of? iterations? of the modification-evaluation cycle after each soft restart.
  • the feature subsets comprised in the last 100 iterations of the modification- evaluation cycle, before a termination criterion is met may be included in the plurality of feature subsets.
  • each candidate feature subset from the plurality of candidate feature subsets is associated with a characteristic of the respective candidate feature subset, and the selection criterion is based on an evaluation of the characteristic of said respective candidate feature subset.
  • the characteristic of a candidate feature subset may be computed by a genetic algorithm.
  • characteristics computed by genetic algorithms comprise the performance rating of the feature subset during an evaluation, the feature subset size, and the maximum age of the feature subset, i.e. the maximum number of consecutively updated ensembles of feature subsets comprising the feature subset.
  • each candidate feature subset may be evaluated on the basis of its performance rating and/or its maximum age.
  • obtaining 1 the plurality of feature subsets comprises grouping together feature subset from the plurality of candidate feature subsets. For example, all those candidate feature subset generated by a run of a genetic algorithm that comprise the same features from the pool of features may be considered as one feature subset, and only this one feature subset may be included in the plurality of feature subsets.
  • a characteristic of the one feature subset may be computed from the respective characteristics of the candidate feature subsets comprising the same features.
  • a characteristic of the one feature subset may comprise the number of feature subsets comprising the same features generated by a run of a genetic algorithm.
  • computing 2 the rank of the at least one feature from the pool of features is further based on the frequency of occurrence of the at least one feature in the plurality of feature subsets.
  • the rank r a of a feature a may be equal to the number of subsets from the plurality of feature subsets, in which the feature is comprised:
  • each feature subset from the plurality of feature subsets of features is associated with a characteristic of the respective feature subset.
  • the characteristic of each feature subset may be computed by a genetic algorithm.
  • characteristics computed by genetic algorithms comprise a performance rating of the feature subset during an evaluation, the feature subset size, and the maximum age of the feature subset, i.e. the maximum number of consecutively updated ensembles of feature subsets comprising the feature subset.
  • computing 2 the rank of the at least one feature from the pool of features is further based on the characteristic associated with each feature subset from the plurality of feature subsets.
  • the characteristic associated with each feature subset A may be the performance rating p(A) of the feature subset A.
  • the performance rating p(A) may be defined as a fraction of study objects from a validation set of study objects which are correctly classified by the feature subset ⁇ .
  • the rank r a of a feature a from the pool of features may be defined as where the sum runs over all feature subsets A from the plurality of feature subsets.
  • the characteristic associated with each feature subset A may be the size s(A) of the feature subset. For example, each occurrence of a feature a may be weighted according to the size s(A) of the feature subset:
  • the rank of the at least one feature from the pool of features is computed from a co-occurrence of two or more features from the pool of features in a feature subset from the plurality of feature subsets.
  • the cooccurrences of features i.e. occurrences of pairs, triples, etc.
  • the co-occurrence of two or more features from the pool of features in a feature subset from the plurality of feature subsets is evaluated on the bais of an affinity network defined by the pool of features and the plurality of feature subsets.
  • An affinity network defined on the basis of the pool of features and of the plurality of feature subsets comprises nodes.
  • Each node of the affinity network corresponds to a feature from the pool of features.
  • Two nodes of the affinity network are connected by an edge if a feature subset exists in the plurality of feature subsets such that the features corresponding to the two nodes are comprised in said feature subset.
  • the features from the pool of features may be also referred to as nodes in the context of the affinity network.
  • Affinity networks and their parameters are described, for example, in the paper by Jari Saramaki et al. entitled "Generalizations of the clustering coefficient to weighted complex networks", available at http://arxiv.org/PS_cache/cond-mat/pdf/0608/0608670.pdf.
  • the affinity network may be described by an adjacency matrix.
  • the adjacency matrix of an affinity network is a matrix comprising elements M a b where the indices a, b are nodes of the affinity network.
  • the adjacency matrix element M a b is equal to 1 if the two nodes a and b are connected by an edge, and 0 if they are not.
  • the weight matrix comprises elements W a b where the indices a, b are nodes of the affinity network.
  • the weight matrix element W a b is equal to the frequency of occurrence of the pair of features a and b in the plurality of feature subsets.
  • the weight matrix element W a b defines the weight of an edge connecting nodes a and b.
  • a plurality of ranks of a feature from the pool of features may be computed. Different ranks may be combined so as to compute another rank.
  • different ranks may be used for creating different lists of ranked features.
  • the creation and the choice of a list of ranked features for use in a particular application may be based on various factors such as external conditions in which study objects to be classified by the features from the ranked list of features were obtained (e.g. temperature, pressure, humidity, pollution), or on a population of study objects from which the study objects were obtained (e.g. farmers, women, men).
  • the method 10 comprises obtaining 1 two or more pluralities of feature subsets.
  • Each plurality of feature subsets comprises features from a pool of features.
  • each plurality of feature subsets may be obtained from two or more runs of a genetic algorithm.
  • a rank of a feature from the pool of features is computed. This implementation renders it possible to classify features into three groups on the basis of the two or more values of the rank: features that consistently have a high rank for each plurality of feature subsets - the must- haves.
  • the must-haves are features which are consistently present in most subsets of each plurality of feature subsets, features that have a high rank for some pluralities of feature subsets and a relatively low rank for other pluralities of feature subsets from the two or more pluralities of feature subsets - the swaps.
  • the swaps are features which are consistently present in most feature subsets from some pluralities of feature subsets and which are relatively often absent in feature subsets from other pluralities of feature subsets, and features that consistently have a low rank for each plurality of feature subsets from the two or more pluralities of feature subsets - the padders.
  • Classifying features into the three groups, the must-haves, swaps and padders may be also carried out on the basis of a computation of parameters of the statistical distribution of values of a rank of a feature.
  • the parameters may be the mean and the standard deviation of the rank values.
  • Features that show a large mean and a relatively small standard deviation are the must-haves.
  • Features that show a similar mean deviation, e.g. of the same order of magnitude as the must-haves, but a relatively large standard deviation are the swaps.
  • Features that show a small mean and a small standard deviation finally, are not useful in classifying study objects and are the padders.
  • Fig. 2 illustrates a schematic embodiment of a module 20 for computing a rank of at least one feature from a pool of features.
  • the exemplary embodiment of the module comprises: an obtaining unit 21 for obtaining a plurality of feature subsets, each feature subset comprising features from the pool of features; and a computation unit 22 for computing the rank of the at least one feature from the pool of features based on an occurrence of the at least one feature in a feature subset.
  • the exemplary embodiment of the module 20 further comprises: a list unit 23 for creating a list of ranked features based on the computed rank of the at least one feature; an input connector 27 for receiving input data; - an output connector 28 for delivering output data; a memory unit 25 for storing the input data received from external devices via the input connector 27 and data computed by the units of the module 20; and a memory bus 26 for connecting the units of the module 20.
  • the module may comprise further units, for example a selection unit for selecting the plurality of feature subsets from a plurality of candidate feature subsets on the basis of a selection criterion.
  • the invention may be implemented in any suitable form comprising hardware, software, or firmware implementations, or any combination of these.
  • the invention or some features of the invention may be implemented as a computer program product to be executed on one or more data processors and/or digital signal processors.
  • the elements and components of an embodiment of the invention may be physically, functionally, and logically implemented in any suitable way.
  • a functionality of the module 20 may be implemented in a single unit or in a plurality of units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
PCT/IB2007/054939 2006-12-13 2007-12-06 Ranking of features WO2008072140A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009540911A JP2010514001A (ja) 2006-12-13 2007-12-06 特徴の順位付け

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US86973406P 2006-12-13 2006-12-13
US60/869,734 2006-12-13

Publications (2)

Publication Number Publication Date
WO2008072140A2 true WO2008072140A2 (en) 2008-06-19
WO2008072140A3 WO2008072140A3 (en) 2008-11-27

Family

ID=39414909

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/054939 WO2008072140A2 (en) 2006-12-13 2007-12-06 Ranking of features

Country Status (3)

Country Link
JP (1) JP2010514001A (ja)
CN (1) CN101558419A (ja)
WO (1) WO2008072140A2 (ja)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9239963B2 (en) 2013-04-08 2016-01-19 Omron Corporation Image processing device and method for comparing feature quantities of an object in images

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005078629A2 (en) * 2004-02-10 2005-08-25 Koninklijke Philips Electronics, N.V. Genetic algorithms for optimization of genomics-based medical diagnostic tests

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005078629A2 (en) * 2004-02-10 2005-08-25 Koninklijke Philips Electronics, N.V. Genetic algorithms for optimization of genomics-based medical diagnostic tests

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
D. PELAT: "Chapitre 16: estimation de la loi" BRUITS ET SIGNAUX, [Online] 1998, pages 289-297, XP007905704 Retrieved from the Internet: URL:http://cel.archives-ouvertes.fr/cel-00092937/> [retrieved on 2008-04-17] *
J. D. SCHAFFER, A. JANEVSKI, M. R. SIMPSON: "A Genetic Algorithm Approach for Discovering Diagnostic Patterns in Molecular Measurement Data" PROCEEDINGS OF THE CIBCB'05, 2005, XP010894138 cited in the application *
J. SARAMAKI, M. KIVELA, J.-P. ONNELA, K. KASKI, J. KERTESZ: "Generalizations of the clustering coefficient to weighted complex networks" ARXIV:COND-MAT/0608670V2, [Online] 7 December 2006 (2006-12-07), XP007905705 Retrieved from the Internet: URL:http://arxiv.org/abs/cond-mat/0608670v2> [retrieved on 2008-04-17] cited in the application *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9239963B2 (en) 2013-04-08 2016-01-19 Omron Corporation Image processing device and method for comparing feature quantities of an object in images

Also Published As

Publication number Publication date
CN101558419A (zh) 2009-10-14
WO2008072140A3 (en) 2008-11-27
JP2010514001A (ja) 2010-04-30

Similar Documents

Publication Publication Date Title
KR101054732B1 (ko) 생물학적 데이터의 숨겨진 패턴에 근거한 생물학적 상태의 식별 방법
Hwang et al. A heterogeneous label propagation algorithm for disease gene discovery
KR101642270B1 (ko) 진화 클러스터링 알고리즘
US20020095260A1 (en) Methods for efficiently mining broad data sets for biological markers
JP6313757B2 (ja) 統合デュアルアンサンブルおよび一般化シミュレーテッドアニーリング技法を用いてバイオマーカシグネチャを生成するためのシステムおよび方法
EP1498825A1 (en) Apparatus and method for analyzing data
JP5180478B2 (ja) ゲノムベースの医療診断テストを最適化する遺伝アルゴリズム
CN107066836A (zh) 基因检测管理方法及***
Pashaei et al. Markovian encoding models in human splice site recognition using SVM
US20020184569A1 (en) System and method for using neural nets for analyzing micro-arrays
KR20200133067A (ko) 장내 미생물을 이용한 질병의 예측방법 및 시스템
WO2008072140A2 (en) Ranking of features
KR102447359B1 (ko) 유전자 연관 관계 통합 기반 신규 질병유전자 예측 장치 및 방법
CN107710206B (zh) 用于根据生物学数据的亚群检测的方法、***和装置
KR20170000707A (ko) 유전자 발현 데이터를 이용하여 표현형 특이적인 유전자 네트워크를 식별하는 방법 및 장치
Wahde et al. Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms
Makhtar et al. A multi-classifier method based deep learning approach for breast cancer
AU2016100563A4 (en) System and method for determining an association of at least one biological feature with a medical condition
Liu et al. Finding cancer biomarkers from mass spectrometry data by decision lists
Schäfer Systems biology of tumour evolution: estimating orders from omics data
Wu et al. Determining molecular archetype composition and expression from bulk tissues with unsupervised deconvolution
Feng et al. Statistical considerations in combining biomarkers for disease classification
Millard Methods for the design and analysis of disease-oriented multi-sample single-cell studies
Claude et al. Exploring variability of machine learning methods: first steps towards cancer biomarkers consensus signatures
Han et al. A hybrid unsupervised approach for accurate short read clustering and barcoded sample demultiplexing in nanopore sequencing

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780046259.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07849349

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2007849349

Country of ref document: EP

ENP Entry into the national phase in:

Ref document number: 2009540911

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07849349

Country of ref document: EP

Kind code of ref document: A2