TWI633453B - Method for screening biomarker, electronic apparatus, and biomarker - Google Patents

Method for screening biomarker, electronic apparatus, and biomarker Download PDF

Info

Publication number
TWI633453B
TWI633453B TW102131680A TW102131680A TWI633453B TW I633453 B TWI633453 B TW I633453B TW 102131680 A TW102131680 A TW 102131680A TW 102131680 A TW102131680 A TW 102131680A TW I633453 B TWI633453 B TW I633453B
Authority
TW
Taiwan
Prior art keywords
gene
chromosome
module
genes
fitness
Prior art date
Application number
TW102131680A
Other languages
Chinese (zh)
Other versions
TW201510759A (en
Inventor
王孔政
陳昆皇
楊自森
鄧乃嘉
Original Assignee
國立臺灣科技大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立臺灣科技大學 filed Critical 國立臺灣科技大學
Priority to TW102131680A priority Critical patent/TWI633453B/en
Publication of TW201510759A publication Critical patent/TW201510759A/en
Application granted granted Critical
Publication of TWI633453B publication Critical patent/TWI633453B/en

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一種生物標記的篩選方法、電子裝置及生物標記,其中篩選方法包括以下步驟。根據基因微陣列而獲得各基因的基因表現量。執行基因演算法,包括產生多個染色體結構與執行適合度方法,以獲得各染色體結構的適合度。選擇部分的染色體結構,並重複執行基因演算法,直到達到終止條件為止。自基因演算法停止時所獲得的染色體結構中,選擇具有適合度最大者的染色體結構,並依據此染色體結構獲得多個生物標記,其中生物標記對應於疾病。 A screening method, an electronic device and a biomarker for a biomarker, wherein the screening method comprises the following steps. The gene expression amount of each gene was obtained based on the gene microarray. Gene algorithm execution is performed, including generating multiple chromosome structures and performing fitness methods to obtain the fitness of each chromosome structure. Select part of the chromosome structure and repeat the gene algorithm until the termination condition is reached. Among the chromosome structures obtained when the gene algorithm is stopped, the chromosome structure having the greatest fitness is selected, and a plurality of biomarkers are obtained according to the chromosome structure, wherein the biomarker corresponds to the disease.

Description

生物標記的篩選方法、電子裝置及生物標記 Biomarker screening method, electronic device and biomarker

本發明是有關於一種以生物資訊為基礎的疾病診斷技術,且特別是有關於一種生物標記的篩選方法、電子裝置及生物標記。 The present invention relates to a biometric-based disease diagnosis technique, and more particularly to a biomarker screening method, an electronic device, and a biomarker.

隨著生物資訊的快速發展,基因表現資料(Gene Expression Data)已成為研究疾病分類(例如是癌症分類)的一大關鍵。目前經常被使用的基因微陣列技術,可同時檢測多種基因表現之程度,其可提供正常細胞與癌細胞之差異,以利於大量篩選出癌症標記基因。因此藉由基因微陣列(microarray)晶片的技術,研究團隊對得以找出癌症特異之基因表現組型,並應用於癌症之分類、癌症之診斷、預後評估、找出標靶治療有關的基因等等。 With the rapid development of bioinformatics, Gene Expression Data has become a key to studying disease classifications, such as cancer classification. Currently used gene microarray technology can simultaneously detect the degree of expression of a variety of genes, which can provide the difference between normal cells and cancer cells, in order to facilitate the screening of a large number of cancer marker genes. Therefore, by the technology of microarray wafers, the research team can identify cancer-specific gene expression patterns, and apply them to cancer classification, cancer diagnosis, prognosis evaluation, and identification of genes related to target treatment. Wait.

雖然目前已可利用基因表現量來辨別癌細胞與正常細胞,但所找到的候選基因(Candidate Genes)的數量十分龐大,且這些候選基因中有相當多是雜訊,即非重要基因,故要從中篩選與疾病相關的重要基因是一大挑戰。尤其是,癌症相關的基因表現 資料都有著樣本數少、基因個數非常大的特性,要處理這樣小樣本高維度的資料是相當困難的。 Although the amount of gene expression can be used to distinguish cancer cells from normal cells, the number of candidate genes (Candidate Genes) found is very large, and many of these candidate genes are noises, that is, non-significant genes. Screening for important genes associated with disease is a major challenge. In particular, cancer-related gene expression The data has the characteristics of a small number of samples and a very large number of genes. It is quite difficult to deal with such small samples of high-dimensional data.

利用基因微陣列,我們可以同時觀察上千種甚至上萬種基因的表現,並從中挑選出有意義的基因,不僅可提供細胞的生理途徑及疾病成因,也使得正常或腫瘤細胞的基因表現可以經由分子影像表現出來。 Using gene microarrays, we can simultaneously observe the performance of thousands or even tens of thousands of genes, and select meaningful genes from them, which not only provide the physiological pathways and disease genes of the cells, but also make the gene expression of normal or tumor cells via molecules. The image is shown.

另一方面,雖然目前的基因微陣列已高度自動化與資訊化,而可藉由資料探勘方法與自動化機制,擷取與疾病有關的重要基因,然而,目前在此方面的技術,多是以基本的統計方法來進行差異基因的搜尋與篩選。因此,在生物組織樣本取得不易、樣本數少的情況下,常常會導致自變項(基因)數量遠遠超過個案數,而降低統計檢定力。並且,由於資料數不足以分派做獨立樣本驗證,故其搜尋與篩選結果的外推性亦受限制。 On the other hand, although the current gene microarray is highly automated and informative, it can extract important genes related to diseases through data exploration methods and automated mechanisms. However, most of the current technologies in this area are based on basic techniques. Statistical methods for searching and screening for differential genes. Therefore, in the case that biological tissue samples are not easy to obtain and the number of samples is small, the number of self-variables (genes) often exceeds the number of cases, and the statistical power is reduced. Moreover, since the number of documents is insufficient to be assigned for independent sample verification, the extrapolation of search and screening results is also limited.

承上述,如何準確地篩選出重要的疾病基因與診斷疾病,已成為醫療界急切進行研究的課題。 In view of the above, how to accurately screen out important disease genes and diagnose diseases has become an urgent research topic in the medical field.

本發明提供一種生物標記的篩選方法、電子裝置及生物標記,其可準確地篩選出重要的疾病基因,並據以診斷與預測病患是否患有疾病。 The invention provides a screening method, an electronic device and a biomarker for biomarkers, which can accurately screen important disease genes and diagnose and predict whether a patient has a disease.

根據基因微陣列所呈現的基因表現程度,本發明可藉由執行基因演算法,並搭配適合度方法,以有效地預測及評估不同 疾病所對應的生物標記。藉此,本發明可根據所篩選出的生物標記,來診斷待測者是否患有疾病。如此一來,醫療人員可準確地依據基因微陣列的基因表現資料瞭解致癌機轉,並且可達到精準的癌症診斷與預測,藉以減低檢驗所花費的時間與成本,且可避免延誤病患的就醫時機。 According to the degree of gene expression presented by the gene microarray, the present invention can effectively predict and evaluate differently by performing a genetic algorithm and a fitness method. The biomarker corresponding to the disease. Thereby, the present invention can diagnose whether the subject has a disease according to the selected biomarkers. In this way, medical personnel can accurately understand the carcinogen transfer based on the gene expression data of the gene microarray, and can achieve accurate cancer diagnosis and prediction, thereby reducing the time and cost of the test, and avoiding delays in medical treatment for patients. opportunity.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。 The above described features and advantages of the invention will be apparent from the following description.

100、400‧‧‧電子裝置 100, 400‧‧‧ electronic devices

110、410‧‧‧輸入模組 110, 410‧‧‧ input module

120、420‧‧‧人工智慧演化模組 120, 420‧‧‧ Artificial Intelligence Evolution Module

1202、4202‧‧‧染色體產生模組 1202, 4202‧‧‧ chromosome generating module

1204、4204‧‧‧評估模組 1204, 4204‧‧‧Evaluation Module

1206、4206‧‧‧選擇模組 1206, 4206‧‧‧Selection module

1208、4208‧‧‧交配模組 1208, 4208‧‧‧ mating module

1210、4210‧‧‧突變模組 1210, 4210‧‧‧ Mutation Module

130‧‧‧輸出模組 130‧‧‧Output module

32、32’、34、34’、36、36’‧‧‧染色體結構 32, 32', 34, 34', 36, 36' ‧ ‧ chromosome structure

302、304、306‧‧‧染色體基因 302, 304, 306‧ ‧ chromosomal genes

440‧‧‧待測資料輸入模組 440‧‧‧Data input module to be tested

450‧‧‧預測模組 450‧‧‧ Prediction Module

C1~C4‧‧‧疾病 C1~C4‧‧ disease

CR‧‧‧交配點 CR‧‧‧ mating point

DT‧‧‧決策樹 DT‧‧ Decision Tree

L1~L4‧‧‧葉部 L1~L4‧‧‧leaf

M1、M3、M5‧‧‧生物標記 M1, M3, M5‧‧ biomarkers

N1、N2‧‧‧節點 N1, N2‧‧‧ nodes

R‧‧‧根部 R‧‧‧ Root

S202~S216‧‧‧生物標記的篩選方法各步驟 S202~S216‧‧‧Steps for screening biomarkers

S502~S504‧‧‧診斷疾病的方法各步驟 S502~S504‧‧‧Methods for diagnosing diseases

圖1是依照本發明一實施例所繪示的電子裝置方塊圖。 FIG. 1 is a block diagram of an electronic device according to an embodiment of the invention.

圖2是依照本發明一實施例所繪示的生物標記的篩選方法流程圖。 2 is a flow chart of a method for screening biomarkers according to an embodiment of the invention.

圖3A為說明輪盤法的示意圖。 Fig. 3A is a schematic view illustrating the roulette method.

圖3B為說明執行交配演化計算的示意圖。 Figure 3B is a schematic diagram illustrating the calculation of performing mating evolution.

圖3C為說明單點突變的示意圖。 Figure 3C is a schematic diagram illustrating single point mutations.

圖3D為說明產生各生物標記的重要性資訊的決策樹示意圖。 Figure 3D is a schematic diagram of a decision tree illustrating the generation of importance information for each biomarker.

圖4是依照本發明一實施例所繪示的電子裝置方塊圖。 4 is a block diagram of an electronic device according to an embodiment of the invention.

圖5為說明本實施例的診斷疾病的方法流程圖。 Fig. 5 is a flow chart showing the method of diagnosing a disease of the present embodiment.

現今的基因微陣列技術,可同時檢測多種基因表現之程 度,並提供正常細胞與癌細胞之差異,以利於大量篩選出癌症標記基因。本發明即依此概念,使用基因微陣列資料以及搭配演化式計算,以準確地篩選出對應於疾病的生物標記,並根據這些生物標記來診斷及預測病患是否患有疾病。為了使本發明之內容更容易明瞭,以下特舉諸實施例作為本發明確實能夠據以實施的範例。 Today's gene microarray technology can simultaneously detect the expression of multiple genes Degree, and provide the difference between normal cells and cancer cells, in order to facilitate the screening of a large number of cancer marker genes. The present invention is based on the concept of using gene microarray data and collocation calculations to accurately screen biomarkers corresponding to diseases and to diagnose and predict whether a patient has a disease based on these biomarkers. In order to make the content of the present invention easier to understand, the following examples are given as examples in which the present invention can be implemented.

圖1是依照本發明一實施例所繪示的電子裝置方塊圖。請參照圖1,電子裝置100包括輸入模組110、人工智慧演化模組120以及輸出模組130。這些模組的功能分述如下。 FIG. 1 is a block diagram of an electronic device according to an embodiment of the invention. Referring to FIG. 1 , the electronic device 100 includes an input module 110 , an artificial intelligence evolution module 120 , and an output module 130 . The functions of these modules are described below.

輸入模組110用以接收基因微陣列所呈現的基因表現程度,以取得基因微陣列中的各基因的基因表現量。在此說明的是,當欲了解正常細胞與癌症細胞的基因表現有何不同時,一般可從正常細胞與癌症細胞中分別抽取出傳訊核糖核酸(messenger ribonucleic acid,簡稱mRNA),再利用反轉錄酶製造互補去氧核糖核酸(complementary deoxy ribonucleic acid,簡稱cDNA),並在正常細胞的cDNA上標定綠色螢光,癌症細胞的cDNA上則標上紅色螢光。將上述兩種cDNA混合後加在同一基因微陣列晶片上,與基因微陣列晶片上的DNA進行雜交反應。由此,藉由基因微陣列晶片,可知道每一個基因在正常細胞與癌症細胞之間表現量的不同。例如,若基因微陣列晶片中的顯色為綠色,則表示此基因只在正常細胞表現,反之亦然。或者,對於正常細胞與癌症細胞都會表現的基因而言,其顏色則會介於兩者之間,例如偏 紅色就表示癌症細胞的表現較多,反之亦然。 The input module 110 is configured to receive the degree of gene expression presented by the gene microarray to obtain the gene expression amount of each gene in the gene microarray. It is explained that when it is desired to understand the difference in gene expression between normal cells and cancer cells, it is generally possible to extract messenger ribonucleic acid (mRNA) from normal cells and cancer cells, and then use reverse transcriptase. Complementary deoxy ribonucleic acid (cDNA) is produced, and green fluorescence is calibrated on the cDNA of normal cells, and red fluorescence is marked on the cDNA of cancer cells. The above two kinds of cDNA are mixed and added to the same gene microarray wafer, and hybridized with the DNA on the gene microarray wafer. Thus, by the gene microarray wafer, the difference in the amount of expression between each of the normal cells and the cancer cells can be known. For example, if the color development in a gene microarray wafer is green, it means that the gene is only expressed in normal cells, and vice versa. Or, for genes that both normal and cancer cells will behave, the color will be somewhere in between, such as Red means that cancer cells behave more, and vice versa.

此外,為了測量基因表現的程度,需將基因微陣列晶片的顯色轉為數位影像。具體而言,本實施例可利用高解析度的平台式掃描器擷取基因微陣列晶片的影像,再數值化所擷取之影像,其中,可先將所擷取的影像轉換為十六位元灰階影像,再以分析軟體將其數值化。在本實施例中,基因微陣列晶片上的每個基因的影像強度數值,依實驗設計及需求經過校正(Re-scaling)、對數轉換(LogTransformation)、資料過濾(Data Filtration)及標準化(Normalization)等處理後,即可用於本實施例的輸入模組110以實現生物標記的篩選方法。 In addition, in order to measure the degree of gene expression, the color development of the gene microarray wafer needs to be converted into a digital image. Specifically, in this embodiment, a high-resolution flatbed scanner can be used to capture an image of the gene microarray wafer, and then the captured image is digitized, wherein the captured image can be first converted into sixteen bits. The meta-gray image is then quantified by the analysis software. In this embodiment, the image intensity values of each gene on the gene microarray wafer are subjected to re-scaling, log-transformation, data filtering, and normalization according to experimental design and requirements. After the processing, it can be used in the input module 110 of the embodiment to implement a biomarker screening method.

人工智慧演化模組120用以執行基因演算法,其是利用達爾文之「物競天擇,適者生存」的原理,在染色體結構中獲得多個候選解,並將這些候選解進行演化,以找出符合基因微陣列晶片的基因表現程度的重要基因,來做重要的生物標記。詳細而言,人工智慧演化模組120的運作流程,例如可分為初始階段(Initialization)、評估階段(Evaluation)、選擇階段(Selection)以及產生階段(Generation),但本實施例並不以此為限制。在此,人工智慧演化模組120可重複執行上述階段,直到符合基因演算法中所設的終止條件為止。 The artificial intelligence evolution module 120 is used to execute a genetic algorithm, which uses Darwin's principle of "natural selection, survival of the fittest" to obtain multiple candidate solutions in the chromosome structure, and to evolve these candidate solutions to find An important gene that matches the degree of gene expression of the gene microarray wafer is used to make important biomarkers. In detail, the operation flow of the artificial intelligence evolution module 120 can be divided into, for example, an initialization, an evaluation phase, a selection phase, and a generation phase, but this embodiment does not For the limit. Here, the artificial intelligence evolution module 120 can repeatedly perform the above stages until it meets the termination condition set in the genetic algorithm.

進一步而言,本實施例的人工智慧演化模組120可包括染色體產生模組1202、評估模組1204以及選擇模組1206,據以執行上述階段。染色體產生模組1202用以執行初始階段以及產生 階段。在初始階段中,染色體產生模組1202會隨機產生最初的群體(Population)(即,母代群體)。需說明的是,此群體是由多個染色體(Chromosome)結構組成,且每一染色體結構具有多個染色體基因(Gene)。此外,在產生階段中,染色體產生模組1202會決定那些染色體結構可成為下一代的群體,直到符合執行基因演算法的終止條件為止。 Further, the artificial intelligence evolution module 120 of the present embodiment may include a chromosome generation module 1202, an evaluation module 1204, and a selection module 1206, to perform the above stages. Chromosome generation module 1202 is used to perform the initial phase and generate stage. In the initial phase, the chromosome generation module 1202 will randomly generate the initial population (ie, the parent population). It should be noted that this population is composed of a plurality of Chromosome structures, and each chromosome structure has multiple chromosomal genes (Gene). In addition, in the production phase, the chromosome generation module 1202 determines which chromosome structures can be the next generation group until the termination condition of the performing gene algorithm is met.

更具體而言,在基因演算法中,一個世代(Generation)就是一次演化過程,且每一世代的群體會經過演化運算子(Evolutionary operator)的運作而發生變化,進而演化出最佳的群體。也就是說,若染色體產生模組1202最初所產生的群體稱為母代染色體結構,則染色體產生模組1202經一個世代的演化過程後所產生的子代(offspring)群體則稱為子代染色體結構。如此一來,染色體產生模組1202所產生的染色體結構,經過多個世代的演化過程後可產生符合基因微陣列晶片的基因表現程度的子代染色體結構。 More specifically, in gene algorithms, a generation is an evolutionary process, and each generation of groups undergoes changes through the operation of the Evolutionary operator, which in turn evolves the best population. That is to say, if the population originally generated by the chromosome generating module 1202 is called the mother chromosome structure, the offspring group generated by the chromosome generating module 1202 after a generational evolution process is called the child chromosome. structure. In this way, the chromosome structure generated by the chromosome generating module 1202 can generate a progeny chromosome structure conforming to the degree of gene expression of the gene microarray wafer after a plurality of generations of evolution processes.

評估模組1204用以執行評估階段(Evaluation),以評估群體中每個染色體結構的適合度(Fitness)。也就是說,由於在染色體產生模組1202所產生的染色體結構中,需選擇哪些染色體結構可保留下來以進行後續的演化,而哪些染色體結構需被淘汰而不被保留,因此評估模組1204會產生並評估染色體結構的適合度,以做為上述選擇的依據。 The evaluation module 1204 is configured to perform an evaluation phase to assess the fitness of each chromosome structure in the population. That is to say, due to the chromosomal structure generated by the chromosome generating module 1202, which chromosome structure needs to be selected for subsequent evolution, and which chromosome structures need to be eliminated without being retained, the evaluation module 1204 will The fitness of the chromosome structure is generated and evaluated as a basis for the above selection.

選擇模組1206用以執行選擇階段(Selection),以選擇出 部分的染色體結構來進行演化運算並產生下一代的染色體結構,藉以淘汰適合度較差的染色體結構。也就是說,染色體產生模組1202會根據選擇模組1206所選出的染色體結構以產生下一代的群體,即子代染色體結構,直到符合執行基因演算法的終止條件為止。 The selection module 1206 is configured to perform a selection phase to select Part of the chromosome structure to carry out evolutionary operations and generate the next generation of chromosome structure, in order to eliminate the poorly suited chromosome structure. That is, the chromosome generation module 1202 will generate the next generation population, ie, the progeny chromosome structure, according to the chromosome structure selected by the selection module 1206 until it meets the termination condition of the performing gene algorithm.

此外,本實施例的人工智慧演化模組120還可包括交配模組1208以及突變模組1210,以產生具有更佳染色體基因組合的染色體結構。詳細而言,交配模組1208用以執行交配演化計算。交配演化計算為基因演算法的核心機制之一,主要目的是讓所挑選出來的兩個染色體結構交換彼此染色體基因的訊息,以產生更優良的染色體基因組合。在世代演化的過程中,若僅是藉由上述選擇的機制,一般較難得使所求解結果有新的染色體基因組合,且一個染色體結構的進化,通常也需搭配其他染色體結構上的基因功能才能補全,因此必須藉由交配來達成以上目的。另一方面,突變模組1210用以執行突變演化計算。突變演化計算可提供一個跳出目前搜尋空間,以避免落入局部最佳解的方式。一般來說,執行突變演化計算的目的有二種,其一為開發新的搜尋區域;另一則為重新導入群體在演化過程中所遺失的重要資訊。如此一來,藉由交配模組1208所執行的交配演化計算以及突變模組1210所執行的突變演化計算,本實施例可使各染色體結構之間能有效結合來擴大基因演算法的搜尋空間,同時避免落入局部最佳解,以找出重要的生物標記。 In addition, the artificial intelligence evolution module 120 of the present embodiment may further include a mating module 1208 and a mutation module 1210 to generate a chromosome structure having a better combination of chromosomal genes. In detail, the mating module 1208 is configured to perform mating evolution calculations. Mating evolution calculation is one of the core mechanisms of gene algorithm. The main purpose is to exchange the two chromosome structures selected to exchange chromosomal gene information to produce better chromosomal gene combinations. In the process of generational evolution, if only through the above-mentioned selection mechanism, it is generally difficult to make a new chromosomal gene combination, and the evolution of a chromosome structure usually needs to be combined with the gene function of other chromosome structures. Completion, so the above purpose must be achieved by mating. On the other hand, the mutation module 1210 is used to perform a mutation evolution calculation. Mutation evolution calculations provide a way to jump out of the current search space to avoid falling into local optimal solutions. In general, there are two purposes for performing mutation evolution calculations, one is to develop a new search area; the other is to re-import important information lost by the group during the evolution process. In this way, by the mating evolution calculation performed by the mating module 1208 and the mutation evolution calculation performed by the mutation module 1210, the present embodiment can effectively combine the chromosome structures to expand the search space of the gene algorithm. At the same time avoid falling into the local optimal solution to find important biomarkers.

在此說明的是,本實施例之輸入模組110、人工智慧演化模組120、染色體產生模組1202、評估模組1204、選擇模組1206以及輸出模組130分別為由一個或數個邏輯閘組合而成的硬體電路來實作。或者,在本發明另一實施例中,輸入模組110、人工智慧演化模組120、染色體產生模組1202、評估模組1204、選擇模組1206以及輸出模組130可以是以電腦程式碼來實作。舉例來說,輸入模組110、人工智慧演化模組120、染色體產生模組1202、評估模組1204、選擇模組1206以及輸出模組130例如是由程式語言所撰寫的程式碼片段來實作於應用程式、作業系統或驅動程式等,而這些程式碼片段儲存在儲存單元中,並藉由處理單元來執行之。 It is explained that the input module 110, the artificial intelligence evolution module 120, the chromosome generation module 1202, the evaluation module 1204, the selection module 1206, and the output module 130 of the embodiment are respectively composed of one or several logics. A hard circuit composed of gates is implemented. Alternatively, in another embodiment of the present invention, the input module 110, the artificial intelligence evolution module 120, the chromosome generation module 1202, the evaluation module 1204, the selection module 1206, and the output module 130 may be based on computer code. Implementation. For example, the input module 110, the artificial intelligence evolution module 120, the chromosome generation module 1202, the evaluation module 1204, the selection module 1206, and the output module 130 are implemented, for example, by a code segment written by a programming language. In the application, operating system or driver, etc., these code segments are stored in the storage unit and executed by the processing unit.

另外值得一提的是,在其他實施例中,電子裝置100還包括處理單元與儲存單元,而處理單元分別耦接至輸入模組110、人工智慧演化模組120、染色體產生模組1202、評估模組1204、選擇模組1206以及輸出模組130,藉以驅動上述各模組,上述各模組透過處理單元的控制來協同完成上述功能。 In addition, in other embodiments, the electronic device 100 further includes a processing unit and a storage unit, and the processing unit is coupled to the input module 110, the artificial intelligence evolution module 120, the chromosome generation module 1202, and the evaluation unit. The module 1204, the selection module 1206, and the output module 130 drive the modules, and the modules cooperate to complete the functions through the control of the processing unit.

進一步而言,上述處理單元為具備運算能力的硬體(例如晶片組、處理器等),用以控制電子裝置100的整體運作。處理單元例如是中央處理單元(Central Processing Unit,CPU),或是其他可程式化之微處理器(Microprocessor)、數位訊號處理器(Digital Signal Processor,DSP)、可程式化控制器、特殊應用積體電路(Application Specific Integrated Circuits,ASIC)、可程式化邏輯裝 置(Programmable Logic Device,PLD)或其他類似裝置。 Further, the processing unit is a hardware (for example, a chipset, a processor, or the like) having computing power for controlling the overall operation of the electronic device 100. The processing unit is, for example, a central processing unit (CPU), or other programmable microprocessor (Microprocessor), a digital signal processor (DSP), a programmable controller, and a special application product. Application Specific Integrated Circuits (ASIC), programmable logic Programmable Logic Device (PLD) or other similar device.

此外,上述儲存單元可以是內嵌式儲存單元或外接式儲存單元。內嵌式儲存單元可為隨機存取記憶體(Random Access Memory,RAM)、唯讀記憶體(Read-Only Memory,ROM)、快閃記憶體(Flash memory)、磁碟儲存裝置(Magnetic disk storage device)等。外接式儲存單元可為小型快閃(Compact Flash,CF)記憶卡、安全數位(Secure Digital,SD)記憶卡、微安全數位(Micro SD)記憶卡、記憶棒(Memory Stick,MS)等。在本實施例中,儲存單元可儲存一或多個用來執行生物標記的篩選方法的程式碼以及資料(例如,輸入模組110所讀取的基因表現量、人工智慧演化模組120執行基因演算法值所需的參數、人工智慧演化模組120所產生的染色體結構及人工智慧演化模組120所產生的染色體結構的適合度)等。 In addition, the above storage unit may be an in-line storage unit or an external storage unit. The embedded storage unit can be a random access memory (RAM), a read-only memory (ROM), a flash memory, a magnetic disk storage device (Magnetic disk storage). Device) and so on. The external storage unit can be a Compact Flash (CF) memory card, a Secure Digital (SD) memory card, a Micro SD memory card, a Memory Stick (MS), and the like. In this embodiment, the storage unit may store one or more codes and data for performing the screening method of the biomarker (for example, the gene expression amount read by the input module 110, the artificial intelligence evolution module 120 executing the gene The parameters required for the algorithm value, the chromosome structure generated by the artificial intelligence evolution module 120, and the fitness of the chromosome structure generated by the artificial intelligence evolution module 120, and the like.

底下即搭配上述電子裝置100來說明本實施例的生物標記的篩選方法的各步驟。 The steps of the biomarker screening method of the present embodiment will be described below with the electronic device 100 described above.

圖2是依照本發明一實施例所繪示的生物標記的篩選方法流程圖。請參照圖1及圖2,於步驟S202中,輸入模組110接收基因微陣列所呈現的基因表現程度,並分析基因微陣列的多個樣本與各樣本對應的多個樣本基因,而獲得各樣本基因的基因表現量。在本實施例中,基因微陣列例如是呈現乳癌細胞、子宮頸癌細胞、肝炎細胞、肺癌細胞、大腸癌細胞等癌症細胞的基因表現程度,不限於上述。此外,輸入模組110所接收的樣本的數量 例如為500個,樣本基因的數量例如為6500個,而每個樣本基因所對應的基因表現量介於1~99,999之間。上述的樣本數量、樣本基因的數量以及基因表現量為舉例說明,本實施例並不以此為限制。 2 is a flow chart of a method for screening biomarkers according to an embodiment of the invention. Referring to FIG. 1 and FIG. 2, in step S202, the input module 110 receives the degree of gene expression presented by the gene microarray, and analyzes a plurality of samples of the gene microarray and a plurality of sample genes corresponding to each sample, thereby obtaining each The gene expression of the sample gene. In the present embodiment, the gene microarray is, for example, a gene expression degree of cancer cells such as breast cancer cells, cervical cancer cells, hepatitis cells, lung cancer cells, and colon cancer cells, and is not limited to the above. In addition, the number of samples received by the input module 110 For example, there are 500, and the number of sample genes is, for example, 6,500, and the gene expression corresponding to each sample gene is between 1 and 99,999. The above-mentioned sample number, the number of sample genes, and the gene expression amount are exemplified, and the present embodiment is not limited thereto.

接著,人工智慧演化模組120會執行基因演算法而獲得一組最佳解,來做為重要的生物標記。在此,人工智慧演化模組120會先產生母代的多個染色體結構,並執行基因演算法以產生多個子代染色體結構,直到基因演算法已達到終止條件為止。 Next, the artificial intelligence evolution module 120 performs a genetic algorithm to obtain a set of optimal solutions as important biomarkers. Here, the artificial intelligence evolution module 120 first generates a plurality of chromosome structures of the mother, and performs a genetic algorithm to generate a plurality of child chromosome structures until the gene algorithm has reached the termination condition.

詳言之,於步驟S204,染色體產生模組1202會產生多個染色體結構,其中各染色體結構具有多個染色體基因,各染色體基因與各樣本的各樣本基因的位置對應,且各染色體基因具有第一值或第二值其中之一。在此,染色體產生模組1202最初所產生的這些染色體結構為母體,即所謂的母代染色體結構,其中所有的染色體結構中的染色體基因是隨機產生的,並做為基因演算法的初始值。 In detail, in step S204, the chromosome generating module 1202 generates a plurality of chromosome structures, wherein each chromosome structure has a plurality of chromosome genes, each chromosome gene corresponds to the position of each sample gene of each sample, and each chromosome gene has a One of a value or a second value. Here, the chromosome structures originally generated by the chromosome generating module 1202 are maternal, a so-called maternal chromosome structure in which chromosomal genes in all chromosome structures are randomly generated and used as initial values of the gene algorithm.

需說明的是,本實施例將欲解問題的解(即,重要的生物標記)以染色體結構表示,並對各染色體結構進行編碼,以將染色體結構上的染色體基因以數字或符號來表示。其中,本實施例所採取的編碼例如是二元編碼(Binary encoding)、實數編碼(Real-number encoding)、整數或文字變換編碼(Integer or literal permutation encoding)或一般資料結構編碼(General data structure encoding)等,不限於上述。舉例來說,本實施例的染色體結構例 如是以“10010001000...001”表示(即二元編碼表示),而此處的“1”(即第一值)用以表示重要的染色體基因,而“0”(即第二值)用以表示非重要的染色體基因。此外,倘若樣本基因的數量為6500則染色體結構中“1”或“0”的數量共為6500,且每個“1”或“0”所對應的染色體基因的位置與樣本基因相對應。 It should be noted that, in this embodiment, the solution to the problem (ie, the important biomarker) is represented by a chromosome structure, and each chromosome structure is encoded to represent the chromosome gene on the chromosome structure by a numeral or a symbol. The coding adopted in this embodiment is, for example, Binary encoding, Real-number encoding, Integer or literal permutation encoding, or General data structure encoding. ), etc., are not limited to the above. For example, the chromosome structure example of the present embodiment For example, it is represented by "10010001000...001" (ie, binary code representation), and here "1" (ie, the first value) is used to represent important chromosomal genes, and "0" (ie, second value) is used. To represent non-important chromosomal genes. Further, if the number of sample genes is 6,500, the number of "1" or "0" in the chromosome structure is 6,500, and the position of the chromosomal gene corresponding to each "1" or "0" corresponds to the sample gene.

於步驟S206,評估模組1204會針對各染色體結構,將與具有第一值的染色體基因的位置對應的基因表現量,進行適合度方法,例如決策樹演算法,而獲得各染色體結構的適合度。在此說明的是,為了評估母代染色體結構中各個染色體結構的優劣,評估模組1204會在基因演算法中,導入一個評估指標作為基因演算法機制運作的參考依據,此評估指標即是上述的適合度。評估模組1204會依投入變數的不同,將適合度估算方式定義成適合度函數(Fitness function),此函數是評估個體是否能適應模式所設定之環境的指標,亦是各染色體結構是否能繼續演化的標準。在本實施例中,評估模組1204會依據各染色體結構中,用以表示重要的染色體基因來計算適合度。 In step S206, the evaluation module 1204 performs a suitability method, such as a decision tree algorithm, on the gene expression amount corresponding to the position of the chromosomal gene having the first value for each chromosome structure, and obtains the fitness of each chromosome structure. . Herein, in order to evaluate the pros and cons of each chromosome structure in the maternal chromosome structure, the evaluation module 1204 introduces an evaluation index as a reference for the operation of the gene algorithm mechanism in the gene algorithm, and the evaluation index is the above. Fitness. The evaluation module 1204 defines the fitness estimation method as a fitness function according to the input variable. This function is an index for evaluating whether the individual can adapt to the environment set by the mode, and whether the chromosome structure can continue. The standard of evolution. In this embodiment, the evaluation module 1204 calculates the fitness according to each chromosome structure to represent an important chromosomal gene.

更進一步地說,評估模組1204可透過決策樹演算法來計算適合度。其中,決策樹演算法是資料探勘中的分類法之一,其可藉由已知的資料或分類項目之特徵,將資料分成不同種類,而每一種類再相應不同之決策模式。一般而言,決策樹演算法可透過一個判斷條件來將資料分類。例如,一筆資料透過根部的節點進入決策樹,在根部經由判斷值來決定這筆資料進入下一層哪一 個子節點(Child node),其中判斷值經由不同的演算法而獲得。此過程一再重複,直到資料到達葉部節點(Leaf node)。 More specifically, the evaluation module 1204 can calculate the fitness through a decision tree algorithm. Among them, the decision tree algorithm is one of the classification methods in data exploration. It can divide the data into different types by the characteristics of known data or classification items, and each type has different decision modes. In general, decision tree algorithms can classify data by a judgment condition. For example, a piece of data enters the decision tree through the node at the root, and the root determines the value of the data into the next layer. Child nodes, where the judgment values are obtained via different algorithms. This process is repeated over and over until the data reaches the leaf node.

在本實施例中,評估模組1204會依據具有第一值的染色體基因的位置對應的基因表現量,進行決策樹演算法,以利用資訊含量(Information gain),在資料集中尋找最大信息量之變數以建立資料節點,再根據變數間之特性建立分枝(Branch),且每一分枝子集重複進行建立分枝及節點之過程,直至完成整株決策樹。在此,決策樹每一路徑(Path)即代表一個分類規則,而新資料則會以此分類規則進行預測。此外,本實施例可採用反覆運算二叉樹3代(Iterative Dichotomiser 3,ID3)演算法、C4.5演算法、分類與迴歸樹(Classification and Regression Tree,CART)演算法的決策樹分類法來將資料進行分類。 In this embodiment, the evaluation module 1204 performs a decision tree algorithm based on the gene expression amount corresponding to the position of the chromosomal gene having the first value, so as to use the information gain to find the maximum amount of information in the data set. Variables are used to establish data nodes, and branches are established according to the characteristics of the variables, and each branch subset repeats the process of establishing branches and nodes until the whole decision tree is completed. Here, each path of the decision tree represents a classification rule, and new data is predicted by this classification rule. In addition, in this embodiment, the decision tree classification method of the inverse operation of the Iterative Dichotomiser 3 (ID3) algorithm, the C4.5 algorithm, and the Classification and Regression Tree (CART) algorithm can be used. sort.

本實施例的評估模組1204,即是透過決策樹演算法所計算出的正確率來做為適合度。具體而言,倘若以“01101”這個染色體結構為例,且染色體結構中的每一個染色體基因依序編號為1、2、3、4、5,由於“1”表示重要的染色體基因,則評估模組1204會將對應於“1”這些位置(即編號為2、3、5的位置)的樣本基因的基因表現量,代入決策樹中計算這個染色體結構的適合度。 The evaluation module 1204 of this embodiment is a suitability determined by the accuracy rate calculated by the decision tree algorithm. Specifically, if the chromosome structure "01101" is taken as an example, and each chromosomal gene in the chromosome structure is numbered 1, 2, 3, 4, and 5 sequentially, since "1" indicates an important chromosomal gene, the evaluation is performed. The module 1204 substitutes the gene expression amount of the sample gene corresponding to the positions of "1" (i.e., the positions numbered 2, 3, and 5) into the decision tree to calculate the fitness of the chromosome structure.

於步驟S208,選擇模組1206會選擇部分的染色體結構,直到挑選出選擇結果的數目達到群體數為止。其中,群體數例如為樣本基因的數目的10%。在本實施例中,選擇模組1206可透過一個挑選方法,例如輪盤法,以隨機挑選染色體結構的其中之一 來做為一個選擇結果。當然,選擇模組1206亦可執行其他的選擇演算法,例如(μ+λ)-選擇法((μ+λ)-selection)、競賽法(Tournament selection)、穩定狀態法(Steady-state selection)、排序法(Ranking and scaling)或共享法(Sharing)等方法來進行選擇,但並不以此為限制。 In step S208, the selection module 1206 selects a portion of the chromosome structure until the number of selection results is selected to reach the population number. Among them, the number of populations is, for example, 10% of the number of sample genes. In this embodiment, the selection module 1206 can randomly select one of the chromosome structures through a selection method, such as the roulette method. Come as a result of choice. Of course, the selection module 1206 can also perform other selection algorithms, such as (μ+λ)-selection method ((μ+λ)-selection), competition method (Tournament selection), and steady state method (Steady-state selection). Methods such as Ranking and Scaling or Sharing are chosen, but are not limited.

以輪盤法為例,選擇模組1206會將母體中的所有染色體結構的適合度予以加總,以將總適合度視為一個輪盤全部面積,而每一個體會依其適合度大小在輪盤上佔有一定面積,而適合度越高,所佔面積越大。在此,選擇模組1206進行輪盤法的演算時,會隨機挑選一編號,看其落於輪盤何處,便選取該處所代表的染色體結構出來進入下一步基因演算機制中。假設母體大小為N,每個個體適合度為f i ,則個體被挑選機率為Taking the roulette method as an example, the selection module 1206 sums the fitness of all the chromosome structures in the parent to treat the total fitness as the entire area of the wheel, and each individual will be in the wheel according to its suitability. The disk occupies a certain area, and the higher the fit, the larger the area. Here, when the selection module 1206 performs the calculation of the roulette method, a number is randomly selected to see where it falls on the roulette, and the chromosome structure represented by the location is selected to enter the next genetic algorithm. Assuming that the parent size is N and each individual fitness is f i , the individual is selected .

圖3A為說明輪盤法的示意圖。請參照圖3A與底下表一,表一將染色體結構分別予以編號(即編號1~5),而這些染色體結構的適合度占總適合度的比例為所佔比例(%),而圖3A將這些染色體結構的適合度的所佔比例以圓餅圖來呈現。在圖3A中,適合度越高的染色體結構,其所佔比例越高,因此在進行輪盤法的演算時,具有較高所佔比例的染色體結構被挑選到的機率亦較高。舉例來說,假設在圖3A的圓餅圖中隨機選擇一位置,則此位置位於編號2的區域的機率,會大於位於其他區域。 Fig. 3A is a schematic view illustrating the roulette method. Please refer to FIG. 3A and the bottom table below. Table 1 lists the chromosome structures separately (ie, numbers 1 to 5), and the ratio of the fitness of these chromosome structures to the total fitness is the proportion (%), and FIG. 3A The proportion of the fitness of these chromosome structures is presented in a pie chart. In Fig. 3A, the higher the fitness of the chromosome structure, the higher the proportion of the chromosome structure. Therefore, when the roulette method is performed, the probability that the chromosome structure having a higher proportion is selected is also higher. For example, assuming that a position is randomly selected in the pie chart of FIG. 3A, the probability that the position is located in the area of number 2 will be greater than that in other areas.

表一 Table I

值得一提的是,在選擇模組1206選擇出部分的染色體結構之後,如步驟S210所示,交配模組1208還可在所選擇到的部分染色體結構中,依據一交配率(Crossover rate)對染色體結構執行交配(Crossover)演化計算,其中交配率例如是介於0.6~0.8之間。需說明的是,如同自然界的基因演化,在基因演算法中,並非所有被配對的染色體基因都會進行交配,故有上述的交配率產生,以控制染色體基因交配的頻率。一般而言交配率會設定在一個較高的值,使各染色體間能有效結合來擴大執行基因演算法時的搜尋空間。舉例而言,交配模組1208會針對每一組被配對的染色體基因,以亂數值(介於0與1之間)來決定是否進行交配。若得到的數值小於交配率,則進行交配,反之則否。 It is worth mentioning that after the selection module 1206 selects a part of the chromosome structure, as shown in step S210, the mating module 1208 can also be based on a crossover rate in the selected partial chromosome structure. The chromosome structure performs a crossover evolution calculation in which the mating rate is, for example, between 0.6 and 0.8. It should be noted that, like the genetic evolution of nature, not all paired chromosomal genes are mated in the gene algorithm, so the above mating rate is generated to control the frequency of chromosomal gene mating. In general, the mating rate is set at a higher value, so that the chromosomes can be effectively combined to expand the search space when performing the gene algorithm. For example, the mating module 1208 will determine whether to mate with a random number (between 0 and 1) for each set of chromosomal genes being paired. If the obtained value is less than the mating rate, mating is performed, otherwise it is not.

在本實施例中,交配模組1208可透過單點交配(One-point crossover)、雙點交配(Two-point crossover)、均勻交配(Uniform crossover)、字罩交配(Mask crossover)等方式來執行交配演化計 算,以將上述所選擇到的部分染色體結構中的染色體基因隨機地配對。圖3B為說明執行交配演化計算的示意圖,其中以二元編碼的染色體結構來進行說明。請參照圖3B,以單點交配為例,交配模組1208可在染色體結構32及34中隨機挑選一個交配點CR,將兩染色體結構32及34於交配點CR後的染色體基因302及304交換,以產生一對新的染色體結構32’及34’,即子代染色體結構。 In this embodiment, the mating module 1208 can be executed by means of a one-point crossover, a two-point crossover, a uniform crossover, and a mask crossover. Mating evolution Calculated to randomly pair the chromosomal genes in the selected partial chromosome structure. Fig. 3B is a schematic diagram illustrating the calculation of performing mating evolution, which is illustrated by a binary encoded chromosome structure. Referring to FIG. 3B, taking a single point mating as an example, the mating module 1208 can randomly select a mating point CR in the chromosome structures 32 and 34, and exchange the chromosome structures 302 and 304 of the two chromosome structures 32 and 34 after the mating point CR. To generate a new pair of chromosome structures 32' and 34', ie, the progeny chromosome structure.

此外,在交配模組1208完成交配演化計算之後,如步驟S212所示,突變模組1210更可在所選擇到的部分染色體結構中,依據一突變率(Mutation rate)對染色體結構執行突變演化計算,其中突變率例如是介於0.1~0.2之間。如同自然界的基因演化,突變並非經常發生,因此在基因演算法中,交配模組1208會依照突變率,以亂數產生的數值決定染色體基因是否發生突變。一般來說,突變率可被設定在一個較低的值,以免過份破壞掉經由選擇、交配演化計算等機制所產生的較佳染色體結構。舉例而言,對一染色體基因來說,若經亂數產生所得到的數值小於突變率,則對此染色體基因進行突變,反之則否。 In addition, after the mating module 1208 completes the mating evolution calculation, as shown in step S212, the mutation module 1210 can perform mutation evolution calculation on the chromosome structure according to a Mutation rate in the selected partial chromosome structure. , wherein the mutation rate is, for example, between 0.1 and 0.2. Like the genetic evolution of nature, mutations do not occur frequently. Therefore, in the gene algorithm, the mating module 1208 determines whether the chromosomal gene is mutated according to the mutation rate and the number generated by the random number. In general, the mutation rate can be set to a lower value to avoid excessive destruction of the preferred chromosomal structure produced by mechanisms such as selection, mating evolution calculations. For example, for a chromosomal gene, if the value obtained by random number is smaller than the mutation rate, the chromosomal gene is mutated, and vice versa.

在本實施例中,突變模組1210可透過單點突變(single point mutation)、倒置(inversion mutation)突變或浮點突變(floating point mutation)等方式來執行突變演化計算。圖3C為說明單點突變的示意圖,其中以二元編碼的染色體結構來進行說明。請參照圖3C,以單點突變為例,當突變模組1210對一染色體結構36進行突變時,會將染色體結構36中的染色體基因306 從“0”改為“1”,以產生經突變的染色體結構36’。 In this embodiment, the mutation module 1210 can perform the mutation evolution calculation by means of a single point mutation, an inversion mutation, or a floating point mutation. Figure 3C is a schematic diagram illustrating single point mutations, which are illustrated by a binary encoded chromosome structure. Referring to FIG. 3C, taking a single point mutation as an example, when the mutation module 1210 mutates a chromosome structure 36, the chromosomal gene 306 in the chromosome structure 36 is Change from "0" to "1" to produce a mutated chromosome structure 36'.

接著,請參照圖2,於步驟S214,染色體產生模組1202會判斷基因演算法是否已達到終止條件。在此,終止條件例如是染色體產生模組1202會判斷產生這些子代染色體結構的次數是否已達到預設的演化次數。亦即,若每產生一代的子代染色體結構為一個世代,且預設的演化次數設為1000,則染色體產生模組1202判斷所產生的子代染色體結構是否達到第1000世代。當然,在其他實施例中,染色體產生模組1202亦可依據每一演化中的子代染色體結構的適合度是否已達收斂,例如子代染色體結構之間適合度的相似度已大於一設定值,藉以判斷所產生的子代染色體結構是否已達到終止條件。 Next, referring to FIG. 2, in step S214, the chromosome generation module 1202 determines whether the genetic algorithm has reached the termination condition. Here, the termination condition is, for example, that the chromosome generation module 1202 determines whether the number of times the generation of these progeny chromosome structures has reached a preset number of evolutions. That is, if the generation of the progeny chromosome structure is one generation, and the preset number of evolutions is set to 1000, the chromosome generation module 1202 determines whether the generated progeny chromosome structure reaches the 1000th generation. Of course, in other embodiments, the chromosome generating module 1202 can also converge according to the fitness of the descendant chromosome structure in each evolution, for example, the similarity of the fitness between the progeny chromosome structures is greater than a set value. To determine whether the resulting chromosome structure of the offspring has reached the termination condition.

倘若人工智慧演化模組120所執行的基因演算法尚未達到終止條件,則回到步驟S204,以產生多個子代染色體結構。也就是說,人工智慧演化模組120會重複執行基因演算法,以產生多個子代染色體結構,並判斷基因演算法是否已達到上述的終止條件。 If the gene algorithm executed by the artificial intelligence evolution module 120 has not reached the termination condition, then return to step S204 to generate a plurality of child chromosome structures. That is to say, the artificial intelligence evolution module 120 repeatedly performs a genetic algorithm to generate a plurality of progeny chromosome structures, and determines whether the gene algorithm has reached the above termination condition.

倘若人工智慧演化模組120所執行的基因演算法已達到終止條件,則如步驟S216所示,輸出模組130會自基因演算法停止時所獲得的子代染色體結構,選擇具有適合度最大者的子代染色體結構,並依據這個子代染色體結構輸出多個生物標記,其中這些生物標記與基因微陣列中各樣本基因的基因表現量相符,且在這些生物標記,部分生物標記對應於一疾病。換言之,輸出模 組130會輸出重要的生物標記,且可進一步產生各生物標記的重要性資訊。 If the gene algorithm executed by the artificial intelligence evolution module 120 has reached the termination condition, as shown in step S216, the output module 130 selects the child with the greatest fitness from the chromosome structure obtained when the gene algorithm stops. a progeny chromosome structure, and outputting a plurality of biomarkers according to the chromosome structure of the progeny, wherein the biomarkers are consistent with the gene expression of each sample gene in the gene microarray, and in these biomarkers, part of the biomarkers correspond to a disease . In other words, the output mode Group 130 will output important biomarkers and further generate information on the importance of each biomarker.

舉例來說,假設人工智慧演化模組120在基因演算法結束時所獲得具有適合度最大者的子代染色體結構為“10101”,如下表二所示,則編號為1、3及5的染色體基因即是輸出模組130所輸出重要的生物標記。 For example, suppose that the artificial intelligence evolution module 120 obtains the chromosome structure of the child having the greatest fitness at the end of the gene algorithm is "10101", as shown in Table 2 below, the chromosomes numbered 1, 3, and 5 The gene is an important biomarker output by the output module 130.

此外,為了評估輸出模組130所的生物標記的重要性,輸出模組130可依據這些生物標記以及決策樹演算法,篩選出哪些重要的生物標記對應於哪些疾病。圖3D為說明產生各生物標記的重要性資訊的決策樹示意圖,其中搭配表二的染色體結構(即“10101”)來進行說明。請參照圖3D與表二,決策樹DT具有根部R、節點N1與N2以及葉部L1~L4,其中葉部L1~L4分別對應於不同的疾病預測,而對應於重要生物標記的計算資料則會依照決策樹DT而被分配到不同的節點。亦即,在決策樹DT中,每一條自根部R到葉部L1~L4其中之一的路徑,為一組重要生物標記以及對應疾病預測的關係。舉例來說,對於根部R、節點N1與葉部 L1所建立的路徑而言,根部R、節點N1與葉部L1所對應的生物標記M1與M3即是對應於疾病C1的重要生物標記。對於根部R、節點N2與葉部L3所建立的路徑而言,根部R、節點N2與葉部L3所對應的生物標記M1與M5即是對應於疾病C3的重要生物標記。在此,葉部L1~L4所對應到的疾病預測可為相同或不同的疾病。如此一來,根據決策樹DT上的根部與節點,輸出模組130便可產生對應於各疾病預測的重要生物標記。 In addition, in order to evaluate the importance of the biomarkers of the output module 130, the output module 130 can filter which biomarkers correspond to which diseases according to these biomarkers and decision tree algorithms. Fig. 3D is a schematic diagram showing a decision tree for generating importance information of each biomarker, which is described in conjunction with the chromosome structure of Table 2 (i.e., "10101"). Referring to FIG. 3D and Table 2, the decision tree DT has a root R, nodes N1 and N2, and leaf portions L1 to L4, wherein the leaf portions L1 to L4 respectively correspond to different disease predictions, and the calculation data corresponding to the important biomarkers are Will be assigned to different nodes according to the decision tree DT. That is, in the decision tree DT, each path from the root R to one of the leaves L1 to L4 is a set of important biomarkers and a relationship corresponding to disease prediction. For example, for root R, node N1 and leaf In the path established by L1, the biomarkers M1 and M3 corresponding to the root R, the node N1 and the leaf portion L1 are important biomarkers corresponding to the disease C1. For the path established by the root R, the node N2 and the leaf portion L3, the biomarkers M1 and M5 corresponding to the root portion R, the node N2 and the leaf portion L3 are important biomarkers corresponding to the disease C3. Here, the disease predictions corresponding to the leaf portions L1 to L4 may be the same or different diseases. In this way, based on the roots and nodes on the decision tree DT, the output module 130 can generate important biomarkers corresponding to each disease prediction.

為了使本領域的技術人員進一步了解本實施例的生物標記的篩選方法以及使用此方法的電子裝置,底下再舉一實施例,其中仍搭配圖1的電子裝置100來進行說明,並以子宮癌、子宮頸癌、卵巢癌、舌癌、口腔癌以及食道癌的生物標記預測為例。 In order to enable those skilled in the art to further understand the screening method of the biomarker of the present embodiment and the electronic device using the same, an embodiment will be described below, which is still described with the electronic device 100 of FIG. For example, biomarkers of cervical cancer, ovarian cancer, tongue cancer, oral cancer, and esophageal cancer are predicted.

在本實施例中,分別取得罹患子宮癌、子宮頸癌、卵巢癌、舌癌、口腔癌以及食道癌的病患的癌症細胞,並根據基因微陣列的基因表現程度,以分析出哪些基因在上述的癌症細胞中大量表現,並依實驗設計及需求經過校正、對數轉換、資料過濾及標準化等處理後,即可用於本實施例的輸入模組110。輸入模組110根據基因微陣列所呈現的基因表現程度,可獲得各樣本基因的基因表現量。在此,人工智慧演化模組120會執行基因演算法,而輸出模組130則會根據基因演算法的結果,找出符合基因微陣列所呈現的基因表現程度的多個生物標記,並依據不同的癌症分析各生物標記的重要性。亦即,輸出模組130會輸出分別針對子宮癌、子宮頸癌、卵巢癌、舌癌、口腔癌以及食道癌的重要生物 標記,其中這些癌症所對應的生物標記如底下表三所示。 In the present embodiment, cancer cells of patients suffering from uterine cancer, cervical cancer, ovarian cancer, tongue cancer, oral cancer, and esophageal cancer are respectively obtained, and according to the degree of gene expression of the gene microarray, which genes are analyzed The above-mentioned cancer cells can be used in the input module 110 of the present embodiment after being extensively expressed in the above-mentioned cancer cells and subjected to correction, logarithmic conversion, data filtering and standardization according to the experimental design and requirements. The input module 110 can obtain the gene expression amount of each sample gene according to the degree of gene expression exhibited by the gene microarray. Here, the artificial intelligence evolution module 120 performs a genetic algorithm, and the output module 130 finds a plurality of biomarkers according to the degree of gene expression exhibited by the gene microarray according to the results of the genetic algorithm, and according to different The cancer analyzes the importance of each biomarker. That is, the output module 130 outputs important organisms for uterine cancer, cervical cancer, ovarian cancer, tongue cancer, oral cancer, and esophageal cancer, respectively. Markers, wherein the biomarkers corresponding to these cancers are shown in Table 3 below.

上述基因名稱與對應的序列如下表四 The above gene names and corresponding sequences are shown in Table 4 below.

在本實施例中,根據基因微陣列所呈現的基因表現程度,人工智慧演化模組120可藉由執行基因演算法,並搭配決策樹演算法以評估基因演算法在每一個世代所獲得的結果,人工智慧演化模組120可有效地預測重要的生物標記。此外,人工智慧演化模組120還可評估這些生物標記的重要性,以找出不同疾病所對應的生物標記。如此一來,醫療人員可以利用本實施例的生物標記的篩選方法及電子裝置瞭解致癌機轉,並針對對應於生物的基因的表現量來作為臨床診斷、治療及追蹤的依據,藉以減低檢驗所花費的時間與成本,且可避免延誤病患的就醫時機。 In this embodiment, according to the degree of gene expression presented by the gene microarray, the artificial intelligence evolution module 120 can perform the gene algorithm and the decision tree algorithm to evaluate the results obtained by the gene algorithm in each generation. The Artificial Intelligence Evolution Module 120 can effectively predict important biomarkers. In addition, the Artificial Intelligence Evolution Module 120 can also assess the importance of these biomarkers to identify biomarkers for different diseases. In this way, the medical staff can use the biomarker screening method and the electronic device of the embodiment to understand the carcinogen transfer, and use the performance amount of the gene corresponding to the organism as a basis for clinical diagnosis, treatment, and tracking, thereby reducing the test site. Time and cost, and avoid delays in patient visits.

值得一提的是,本實施例亦可依據自上述生物標記的篩選方法所得到的生物標記,來預測待測者對應於這些生物標記的基因是否有大量表現。 It is worth mentioning that the present embodiment can also predict whether the genes corresponding to the biomarkers of the test subject have a large amount of performance based on the biomarkers obtained from the screening method of the above biomarkers.

圖4是依照本發明一實施例所繪示的電子裝置方塊圖。請參照圖4,圖4的電子裝置400與圖1的電子裝置100相似,其差異處在於:電子裝置400還具有待測資料輸入模組440以及預測模組450。待測資料輸入模組440用以接收待測基因表現資料,例如是來自待測者的基因表現資料。預測模組450會分析與生物標記的位置對應的待測基因,以判對與生物標記的位置對應的待測基因的待測表現量是否是大量表現,進而判斷此待測者是否罹患疾病,例如子宮癌、子宮頸癌、卵巢癌、舌癌、口腔癌以及食道癌等等。 4 is a block diagram of an electronic device according to an embodiment of the invention. Referring to FIG. 4 , the electronic device 400 of FIG. 4 is similar to the electronic device 100 of FIG. 1 . The difference is that the electronic device 400 further has a data input module 440 to be tested and a prediction module 450 . The data input module 440 to be tested is configured to receive gene performance data to be tested, for example, gene performance data from a test subject. The prediction module 450 analyzes the gene to be tested corresponding to the position of the biomarker to determine whether the amount of the test gene to be tested corresponding to the position of the biomarker is a large amount of performance, thereby determining whether the subject is suffering from a disease. For example, uterine cancer, cervical cancer, ovarian cancer, tongue cancer, oral cancer, and esophageal cancer, and the like.

圖5為說明本實施例的診斷疾病的方法流程圖。請參照 圖4與圖5,於步驟S502,待測資料輸入模組440會接收待測基因表現資料。其中,待測基因表現資料具有多個待測基因,而各待測基因與各樣本的各樣本基因的位置對應,且各待測基因具有對應的基因表現量(即待測表現量)。 Fig. 5 is a flow chart showing the method of diagnosing a disease of the present embodiment. Please refer to 4 and FIG. 5, in step S502, the data input module 440 to be tested receives the gene performance data to be tested. Wherein, the gene expression data to be tested has a plurality of genes to be tested, and each of the genes to be tested corresponds to the position of each sample gene of each sample, and each gene to be tested has a corresponding gene expression amount (ie, a performance amount to be tested).

於步驟S504,預測模組450會針對與生物標記的位置對應的待測基因,以依據這些待測基因的待測表現量來執行決策樹演算法。舉例來說,假設依照上述生物標記的篩選方法所篩選出的生物標記是位於基因序列的編號1、3及5這些位置上,則預測模組450會針對與編號1、3及5這些位置對應的待測基因的待測表現量來進行決策樹演算法的計算。在此,藉由決策樹演算法的計算,這些待測基因的待測表現量,以決策樹來看,可被劃分成多個由根部、節點與葉部所組成的路徑,其中每一個路徑分別對應於一個預測結果,例如,對應到子宮癌、子宮頸癌、卵巢癌、舌癌、口腔癌以及食道癌其中之一或其他疾病,或者不具癌症的預測結果。 In step S504, the prediction module 450 performs a decision tree algorithm on the gene to be tested corresponding to the position of the biomarker according to the amount of the detected gene to be tested. For example, if the biomarkers screened according to the above biomarker screening method are located at positions 1, 3, and 5 of the gene sequence, the prediction module 450 will correspond to the positions 1, 3, and 5. The amount of the test gene to be tested is used to calculate the decision tree algorithm. Here, by the calculation of the decision tree algorithm, the measured performance of the test genes can be divided into a plurality of paths composed of roots, nodes and leaves, each of which is determined by the decision tree. Corresponding to a predicted result, for example, corresponding to one of uterine cancer, cervical cancer, ovarian cancer, tongue cancer, oral cancer, and esophageal cancer or other diseases, or no prediction of cancer.

如此一來,如步驟S506所示,預測模組450會依據執行決策樹演算法所得到的預測結果,判斷待測基因的待測表現量是否與疾病相關。換言之,倘若這些待測表現量經決策樹演算法的計算後對應到疾病,則預測模組450會判斷待測者具有此疾病。反之,倘若這些待測表現量經決策樹演算法的計算後,未對應到疾病,則預測模組450會判斷待測者不具有疾病。 In this way, as shown in step S506, the prediction module 450 determines whether the amount of the test gene to be tested is related to the disease according to the prediction result obtained by executing the decision tree algorithm. In other words, if the amount of performance to be measured corresponds to the disease after calculation by the decision tree algorithm, the prediction module 450 determines that the subject has the disease. On the other hand, if the measured performance is not calculated by the decision tree algorithm, the prediction module 450 determines that the test subject does not have the disease.

在本實施例中、由於不同的疾病可對應於不同的生物標 記,因此倘若欲知待測者是否患有某一種或多種疾病,則只需自待測者取得與上述生物標記對應的待測基因,以透過決策樹演算法的計算,來判斷這些待測基因的待測表現量是否有大量表現。如此一來,醫療人員可以利用本實施例的生物標記的篩選方法與診斷疾病的方法,準確地依據基因微陣列的基因表現資料,以達到精準的診斷與預測病患是否患有癌症,藉以減低檢驗所花費的時間與成本,且可避免延誤病患的就醫時機。 In this embodiment, different diseases may correspond to different biological targets. Therefore, if you want to know whether the person to be tested has one or more diseases, you only need to obtain the test genes corresponding to the above biomarkers from the test subject to judge the test by the calculation of the decision tree algorithm. Whether the amount of gene to be tested has a large amount of performance. In this way, the medical staff can use the biomarker screening method and the method for diagnosing the disease according to the embodiment, and accurately according to the gene expression data of the gene microarray, so as to accurately diagnose and predict whether the patient has cancer, thereby reducing the disease. Time and cost of testing, and avoiding delays in patient visits.

綜上所述,在本實施例的生物標記的篩選方法、電子裝置以及生物標記中,根據基因微陣列所呈現的基因表現程度,電子裝置可藉由執行基因演算法,並搭配決策樹演算法以評估基因演算法在每一個世代所獲得的結果,有效地預測及評估不同疾病所對應的生物標記。此外,倘若欲知待測者是否患有某一種或多種疾病,則只需自待測者取得與上述生物標記對應的待測基因,以透過決策樹演算法的計算,來判斷這些待測基因的待測表現量是否有大量表現。如此一來,醫療人員可準確地依據基因微陣列的基因表現資料瞭解致癌機轉,並且可達到精準的診斷與預測病患是否患有癌症,藉以減低檢驗所花費的時間與成本,且可避免延誤病患的就醫時機。 In summary, in the screening method, electronic device and biomarker of the biomarker of the present embodiment, according to the degree of gene expression presented by the gene microarray, the electronic device can perform the gene algorithm and match the decision tree algorithm. Effectively predict and evaluate biomarkers for different diseases by evaluating the results obtained by gene algorithms in each generation. In addition, if the subject is to be diagnosed with one or more diseases, the test subject is required to obtain the test gene corresponding to the above biomarker, and the test gene algorithm is used to calculate the test gene. Whether the amount of performance to be tested has a large amount of performance. In this way, medical personnel can accurately understand the carcinogen transfer based on the gene expression data of the gene microarray, and can achieve accurate diagnosis and predict whether the patient has cancer, thereby reducing the time and cost of the test, and can avoid Delay the patient's time for medical treatment.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

<110> 國立台灣科技大學 王孔政 陳昆皇 楊自森 鄧乃嘉 <110> National Taiwan University of Science and Technology Wang Kongzheng Chen Kunhuang Yang Zisen Deng Naijia

<120> 生物標記的篩選方法、電子裝置及生物標記 <120> Screening methods, electronic devices and biomarkers for biomarkers

<130> 46371-TW-PA <130> 46371-TW-PA

<160> 6 <160> 6

<170> PatentIn version 3.5 <170> PatentIn version 3.5

<210> 1 <210> 1

<211> 2908 <211> 2908

<212> DNA <212> DNA

<213> Human adenovirus type 1 <213> Human adenovirus type 1

<400> 1 <400> 1

<210> 2 <210> 2

<211> 5745 <211> 5745

<212> DNA <212> DNA

<213> Human adenovirus type 1 <213> Human adenovirus type 1

<400> 2 <400> 2

<210> 3 <210> 3

<211> 6282 <211> 6282

<212> DNA <212> DNA

<213> Human adenovirus type 1 <213> Human adenovirus type 1

<400> 3 <400> 3

<210> 4 <210> 4

<211> 2970 <211> 2970

<212> DNA <212> DNA

<213> Human adenovirus type 1 <213> Human adenovirus type 1

<400> 4 <400> 4

<210> 5 <210> 5

<211> 2666 <211> 2666

<212> DNA <212> DNA

<213> Human adenovirus type 1 <213> Human adenovirus type 1

<400> 5 <400> 5

<210> 6 <210> 6

<211> 2004 <211> 2004

<212> DNA <212> DNA

<213> Human adenovirus type 1 <213> Human adenovirus type 1

<400> 6 <400> 6

Claims (4)

一種生物標記的篩選方法,用於一電子裝置,該方法包括:分析一基因微陣列的多個樣本與各該樣本對應的多個樣本基因,而獲得各該樣本基因的一基因表現量;執行一基因演算法,包括:隨機產生多個染色體結構,其中各該染色體結構具有多個染色體基因,各該染色體基因與各該樣本的各該樣本基因的位置對應,且對各該染色體結構進行二元編碼,以使各該染色體基因具有一第一值或一第二值其中之一;針對各該染色體結構,將與具有該第一值的該染色體基因的位置對應的該些基因表現量,執行一適合度方法,其中該適合度方法為決策樹演算法,而獲得各該染色體結構的一適合度;以及選擇部分的該些染色體結構依據一交配率對各該染色體結構進行交配演算法,其中該交配率介於0.6至0.8之間,以及,依據一突變率對各該染色體結構進行突變演算法,其中該突變率介於0.1至0.2之間,且據以重複執行該基因演算法,以產生多個子代染色體結構,直到達到一終止條件為止,該終止條件為該子代染色體結構之間該適合度的相似度已經大於一設定值;以及自該基因演算法停止時所獲得的該些子代染色體結構,選擇 具有該適合度最大者的該子代染色體結構,並依據該子代染色體結構獲得多個生物標記,其中該些生物標記對應於一疾病。 A method for screening a biomarker for an electronic device, the method comprising: analyzing a plurality of samples of a gene microarray and a plurality of sample genes corresponding to each sample, and obtaining a gene expression amount of each of the sample genes; performing A gene algorithm comprises: randomly generating a plurality of chromosome structures, wherein each of the chromosome structures has a plurality of chromosome genes, each of the chromosome genes corresponding to a position of each sample gene of each sample, and performing two Metacoding, such that each of the chromosomal genes has a first value or a second value; for each of the chromosomal structures, the expression amount of the genes corresponding to the position of the chromosomal gene having the first value, Performing a fitness method, wherein the fitness method is a decision tree algorithm, and obtaining a fitness degree of each of the chromosome structures; and selecting a portion of the chromosome structures to perform a mating algorithm for each of the chromosome structures according to a mating rate, Where the mating rate is between 0.6 and 0.8, and the mutation calculus is performed on each chromosome structure according to a mutation rate Wherein the mutation rate is between 0.1 and 0.2, and the gene algorithm is repeatedly performed to generate a plurality of progeny chromosome structures until a termination condition is reached, the termination condition being between the progeny chromosome structures The similarity of fitness has been greater than a set value; and the chromosome structure of the progeny obtained from the stop of the gene algorithm is selected. The progeny chromosome structure having the greatest fitness, and obtaining a plurality of biomarkers according to the progeny chromosome structure, wherein the biomarkers correspond to a disease. 如申請專利範圍第1項所述的生物標記的篩選方法,其中該些染色體結構的數目為執行該基因演算法的一群體數,且該群體數為該些樣本基因的數目的10%,以及在選擇部分的該些染色體結構的步驟包括:透過一挑選方法,隨機挑選該些染色體結構的其中之一做為一選擇結果,直到挑選出該選擇結果的數目達到該群體數為止。 The method for screening biomarkers according to claim 1, wherein the number of the chromosome structures is a population number of performing the gene algorithm, and the number of the population is 10% of the number of the sample genes, and The step of selecting the portion of the chromosome structure includes randomly selecting one of the chromosome structures as a selection result by a selection method until the number of the selection results is selected to reach the population number. 如申請專利範圍第1項所述的生物標記的篩選方法,其中在獲得該些生物標記的步驟之後,更包括:接收一待測基因表現資料,其中該待測基因表現資料具有多個待測基因,各該待測基因與各該樣本的各該樣本基因的位置對應,且各該待測基因具有對應的一待測表現量;針對與具有該第一值的該染色體基因的位置對應的該些待測基因,並依據該些待測表現量執行該適合度方法;以及依據執行該適合度方法所得到的一預測結果,判斷該待測基因的該待測表現量是否與該疾病相關。 The method for screening biomarkers according to claim 1, wherein after the step of obtaining the biomarkers, the method further comprises: receiving a gene expression data to be tested, wherein the test gene performance data has multiple to be tested. a gene, each of the test genes corresponding to a position of each of the sample genes of each sample, and each of the test genes has a corresponding test amount to be tested; and corresponding to a position of the chromosomal gene having the first value Genes to be tested, and performing the fitness method according to the amount of the test to be tested; and determining whether the amount of the test gene to be tested is related to the disease according to a prediction result obtained by performing the fitness method . 一種電子裝置,包括:一輸入模組,接收一基因微陣列所呈現的基因表現程度,並分析該基因微陣列的多個樣本與各該樣本對應的多個樣本基因,而獲得各該樣本基因的一基因表現量;一人工智慧演化模組,執行一基因演算法,該人工智慧演化 模組包括:一染色體產生模組,用以隨機產生多個染色體結構,其中各該染色體結構具有多個染色體基因,各該染色體基因與各該樣本的各該樣本基因的位置對應,且對各該染色體結構進行二元編碼,以使各該染色體基因具有一第一值或一第二值其中之一;一評估模組,用以針對各該染色體結構,將與具有該第一值的該染色體基因的位置對應的該些基因表現量,執行一適合度方法,其中該適合度方法為決策樹演算法,而獲得各該染色體結構的一適合度;以及一選擇模組,用以選擇部分的該些染色體結構;一交配模組,用以依據一交配率執行一交配演化計算,該交配率介於0.6至0.8之間;一突變模組,用以依據一突變率執行一突變演化計算,該突變率介於0.1至0.2之間,其中該人工智慧演化模組重複執行該基因演算法,以產生多個子代染色體結構,直到達到一終止條件為止,該終止條件為該子代染色體結構之間該適合度的相似度已經大於一設定值;以及一輸出模組,自該基因演算法停止時所獲得的該些子代染色體結構,選擇具有該適合度最大者的該子代染色體結構,並依據該子代染色體結構輸出多個生物標記,其中該些生物標記對應於一疾病。 An electronic device comprising: an input module, receiving a gene expression degree of a gene microarray, and analyzing a plurality of samples of the gene microarray and a plurality of sample genes corresponding to each sample, thereby obtaining each sample gene a gene expression; an artificial intelligence evolution module, performing a genetic algorithm, the artificial intelligence evolution The module comprises: a chromosome generating module for randomly generating a plurality of chromosome structures, wherein each of the chromosome structures has a plurality of chromosome genes, and each of the chromosome genes corresponds to a position of each sample gene of each sample, and each The chromosome structure is binary coded such that each of the chromosomal genes has a first value or a second value; an evaluation module for each of the chromosome structures will be associated with the first value Performing a fitness method for the amount of expression of the genes corresponding to the positions of the chromosomal genes, wherein the fitness method is a decision tree algorithm to obtain a fitness degree of each of the chromosome structures; and a selection module for selecting a portion The chromosomal structure; a mating module for performing a mating evolution calculation based on a mating rate, the mating rate is between 0.6 and 0.8; and a mutation module for performing a mutation evolution calculation based on a mutation rate The mutation rate is between 0.1 and 0.2, wherein the artificial intelligence evolution module repeatedly executes the gene algorithm to generate a plurality of progeny chromosome structures until Up to a termination condition, the termination condition is that the similarity of the fitness between the progeny chromosome structures is greater than a set value; and an output module, the progeny chromosomes obtained from the stop of the gene algorithm Structure, selecting the progeny chromosome structure having the greatest fitness, and outputting a plurality of biomarkers according to the progeny chromosome structure, wherein the biomarkers correspond to a disease.
TW102131680A 2013-09-03 2013-09-03 Method for screening biomarker, electronic apparatus, and biomarker TWI633453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW102131680A TWI633453B (en) 2013-09-03 2013-09-03 Method for screening biomarker, electronic apparatus, and biomarker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW102131680A TWI633453B (en) 2013-09-03 2013-09-03 Method for screening biomarker, electronic apparatus, and biomarker

Publications (2)

Publication Number Publication Date
TW201510759A TW201510759A (en) 2015-03-16
TWI633453B true TWI633453B (en) 2018-08-21

Family

ID=53186726

Family Applications (1)

Application Number Title Priority Date Filing Date
TW102131680A TWI633453B (en) 2013-09-03 2013-09-03 Method for screening biomarker, electronic apparatus, and biomarker

Country Status (1)

Country Link
TW (1) TWI633453B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102272764A (en) * 2009-01-06 2011-12-07 皇家飞利浦电子股份有限公司 Evolutionary clustering algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102272764A (en) * 2009-01-06 2011-12-07 皇家飞利浦电子股份有限公司 Evolutionary clustering algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吳奇儒,2007,多目標路徑最佳化問題求解方法之研究,逢甲大學都市計畫與空間資訊學系碩士班碩士論文 *

Also Published As

Publication number Publication date
TW201510759A (en) 2015-03-16

Similar Documents

Publication Publication Date Title
Khosravi et al. Deep convolutional neural networks enable discrimination of heterogeneous digital pathology images
Bartholomai et al. Lung cancer survival prediction via machine learning regression, classification, and statistical techniques
TW201938798A (en) Anomalous fragment detection and classification
US20230222311A1 (en) Generating machine learning models using genetic data
JP2023507252A (en) Cancer classification using patch convolutional neural networks
US20200239965A1 (en) Source of origin deconvolution based on methylation fragments in cell-free dna samples
US20050159896A1 (en) Apparatus and method for analyzing data
Vannucci et al. Bayesian models for variable selection that incorporate biological information
CN113113150A (en) Lymph node metastasis prediction model construction and training method, device, equipment and medium
Yan et al. Radiomics analysis using stability selection supervised component analysis for right-censored survival data
Bhonde et al. Deep Learning Techniques in Cancer Prediction Using Genomic Profiles
Alzboon et al. Prostate Cancer Detection and Analysis using Advanced Machine Learning
TWI633453B (en) Method for screening biomarker, electronic apparatus, and biomarker
Yang et al. Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data
Teixeira et al. A review of machine learning methods for cancer characterization from microbiome data
CN107710206B (en) Methods, systems, and apparatus for subpopulation detection based on biological data
KR20150125030A (en) Method for detecting genomic expressions as a diagnostic factor for metastasis to lymph nodes or oral squamous cancer
CN118116600B (en) Colorectal cancer prognosis method based on multiple sets of clinical test data
Nagaveni et al. Various string matching algorithms for DNA sequences to detect breast cancer using CUDA processors
Yan et al. Radiomics Analysis Using Stability Selection Supervised Principal Component Analysis for Right-censored Survival Data
Hölscher et al. Decoding pathology: the role of computational pathology in research and diagnostics
Akalın et al. DETECTION OF EXON AND INTRON REGIONS IN DNA SEQUENCES BY THE PROPOSED HASHING FUNCTION
Zhao et al. LSTM neural network for pan-cancer classification & marker gene discovery by symmetrical occlusion method
García-Núñez et al. Neuroevolution based multi-objective algorithm for gene selection and microarray classification
Zhao et al. Hist2Cell: Deciphering Fine-grained Cellular Architectures from Histology Images

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees