JP2007207113A

JP2007207113A - Genealogical tree display system

Info

Publication number: JP2007207113A
Application number: JP2006027528A
Authority: JP
Inventors: Hisayoshi Yamamoto; 悠喜山本
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 2006-02-03
Filing date: 2006-02-03
Publication date: 2007-08-16

Abstract

PROBLEM TO BE SOLVED: To provide a genealogical tree display system allowing visual acquisition of arrangement similarity and a function of a gene or a protein on one genealogical tree. SOLUTION: In this genealogical tree display system, the genealogical tree is created wherein the arrangement similarity and the function of the gene or the protein are simultaneously displayed. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、塩基配列及びアミノ酸配列の配列類似度を視覚的に表示する系統樹表示システムに関する。 The present invention relates to a phylogenetic tree display system that visually displays sequence similarity between base sequences and amino acid sequences.

近年、コンピュータの性能向上が著しく、あらゆる産業で大きな恩恵を受けている。バイオテクノロジーの分野では、シーケンサ技術における貢献が著しい。シーケンサ技術とは、ヒト及びマウスなどあらゆる生物のゲノム配列の情報を、コンピュータを使って解読する技術のことである。コンピュータを使ったシーケンサ技術の代表的なプロジェクトとして、ヒトゲノム計画が挙げられる。ヒトゲノム計画は１９８６年にアメリカで提案され、日本も１９９１年から東大及び理化学研究所等が参加した。開始当初ヒトゲノム計画の完了は２００５年を予定していたが、高速シーケンサの開発によって２００３年に解読完了の宣言がされた。その後、高速シーケンサによって、マウスゲノム及びイネゲノムなどについて塩基配列の情報が解読されている。 In recent years, the performance of computers has been remarkably improved and has greatly benefited in all industries. In the biotechnology field, contributions in sequencer technology are significant. The sequencer technology is a technology for decoding information on the genome sequences of all living organisms such as humans and mice using a computer. A typical project of computer sequencer technology is the Human Genome Project. The Human Genome Project was proposed in the United States in 1986, and since 1991, the University of Tokyo and RIKEN have participated. Completion of the Human Genome Project was initially scheduled for 2005, but the completion of decoding was declared in 2003 due to the development of a high-speed sequencer. Thereafter, the base sequence information of the mouse genome and rice genome is decoded by a high-speed sequencer.

このようにして得られた膨大な量の塩基配列及び塩基配列を翻訳して得られるアミノ酸配列の情報から、新たな知見を発見するためには、配列情報の整理が必須である。例えば、タンパク質の立体構造を予測するためには、予め立体構造既知のタンパク質のアミノ酸配列をデータベース化しておき、そのデータベースからホモロジー検索を行い、配列相同性の高いタンパク質の立体構造を基にモデリングを行っている。また、配列情報の整理を行う際、膨大な量の配列類似度を計算し、その結果を視覚的に系統樹表示する手法もある。これは、配列類似性の高いグループごとに重要な塩基及びアミノ酸を見出す際に有効である。 Arrangement of sequence information is indispensable in order to discover new knowledge from a huge amount of base sequences and amino acid sequence information obtained by translating the base sequences. For example, in order to predict the three-dimensional structure of a protein, an amino acid sequence of a protein with a known three-dimensional structure is made into a database in advance, a homology search is performed from the database, and modeling is performed based on the three-dimensional structure of a protein with high sequence homology. Is going. There is also a technique for calculating a huge amount of sequence similarity when arranging sequence information, and visually displaying the result as a phylogenetic tree. This is effective in finding important bases and amino acids for each group having high sequence similarity.

例えば、特許文献１には、系統樹表示を実現する技術が記載されている。これは、系統樹表示の結果を見易く且つ比較し易くすることを目的としている。特許文献１に記載された例では、類似度行列から最大の類似度の２つの配列を新たな１つの配列にした系統樹情報を生成し、この系統樹情報をもとに各ノードの最大類似度を求め、この最大類似度の順に系統樹を生成し、類似度が徐々に増加するように揃えて配置する。 For example, Patent Document 1 describes a technique for realizing phylogenetic tree display. This is intended to make it easy to see and compare the results of the phylogenetic tree display. In the example described in Patent Document 1, phylogenetic tree information in which two arrays having the maximum similarity are made one new array from the similarity matrix is generated, and the maximum similarity of each node is generated based on this phylogenetic tree information. Degrees are obtained, a phylogenetic tree is generated in the order of the maximum similarity, and arranged so that the similarity gradually increases.

さらにコンピュータの解析技術は、遺伝子及びタンパク質の配列解析に限らず、遺伝子及びタンパク質の機能解明に関する研究にも大きく寄与している。例えば、生体分子間相互作用解析装置は、タンパク質間の結合力、化合物とタンパク質との結合力、遺伝子とタンパク質との結合力を定量的に測定することができる。生体分子間相互作用による結合力は結合定数として求めることが出来る。このように、解析装置の発展によって遺伝子及びタンパク質の機能に関する情報が定量的に扱えるようになってきた。 Furthermore, computer analysis techniques are not limited to gene and protein sequence analysis, but have greatly contributed to research on the elucidation of gene and protein functions. For example, the biomolecule interaction analyzer can quantitatively measure the binding force between proteins, the binding force between a compound and a protein, and the binding force between a gene and a protein. The binding force due to the interaction between biomolecules can be determined as a binding constant. As described above, the information on the functions of genes and proteins can be quantitatively handled by the development of analysis devices.

特開平9-44523号公報Japanese Patent Laid-Open No. 9-44523

ヒト及びマウスなど生物の生命情報に関する情報は主に、（１）配列、（２）構造、（３）機能の３つがある。一般に、（１）配列とは、遺伝子の塩基配列及びタンパク質のアミノ酸配列を指す。（２）構造とは、アミノ酸配列から作られるタンパク質の立体構造を指す。（３）機能とは、タンパク質とタンパク質の結合、タンパク質とＤＮＡの結合、タンパク質と化合物の結合を指す。 There are mainly three types of information related to life information of organisms such as humans and mice: (1) sequence, (2) structure, and (3) function. Generally, (1) sequence refers to the base sequence of a gene and the amino acid sequence of a protein. (2) The structure refers to the three-dimensional structure of a protein made from an amino acid sequence. (3) The function refers to protein-protein binding, protein-DNA binding, and protein-compound binding.

ある遺伝子が存在する場合、遺伝子の塩基配列からmRNAを経てアミノ酸配列に翻訳される。アミノ酸配列の並びに応じて、立体構造が決定され、タンパク質の立体構造の違いはタンパク質特有の機能に反映される。このように配列、構造、及び、機能は密接に関連しており、網羅的に研究することが望ましい。 When a certain gene exists, it is translated from the nucleotide sequence of the gene into an amino acid sequence via mRNA. Depending on the arrangement of the amino acid sequences, the three-dimensional structure is determined, and the difference in the three-dimensional structure of the protein is reflected in the function specific to the protein. Thus, the sequence, structure, and function are closely related, and it is desirable to study them exhaustively.

しかしながら、特許文献１に記載の技術では、塩基配列及びアミノ酸配列の配列を基に配列類似度を計算し系統樹表示する。この場合、（１）遺伝子の塩基配列及びタンパク質のアミノ酸配列の情報を平面上の系統樹で表示することによって配列情報を整理し配列類似性の高いグループごとで重要な塩基及びアミノ酸をユーザに提供できる。しかしながら、（２）タンパク質の立体構造及び（３）遺伝子及びタンパク質の機能に関する情報は系統樹画面には含まれていない。 However, in the technique described in Patent Document 1, the sequence similarity is calculated based on the base sequence and amino acid sequence, and displayed as a phylogenetic tree. In this case, (1) By displaying the information of the base sequence of the gene and the amino acid sequence of the protein in a phylogenetic tree on the plane, the sequence information is organized and important bases and amino acids are provided to the user for each group having high sequence similarity it can. However, information regarding (2) the three-dimensional structure of the protein and (3) the function of the gene and protein is not included in the phylogenetic tree screen.

このため、系統樹表示の結果から機能を直接検討することができない。そのため、遺伝子、タンパク質あるいは配列類似性の高い配列グループの機能を実験ノート及び文献中から確認していく作業が必要であり、非常に手間がかかっていた。 For this reason, the function cannot be directly examined from the result of the phylogenetic tree display. For this reason, it is necessary to confirm the functions of genes, proteins, or sequence groups having high sequence similarity from experimental notes and literature, which is very time-consuming.

本発明の目的は、遺伝子又はタンパク質の配列類似度と機能を一つの系統樹上で視覚的に把握することができる系統樹表示システムを提供することにある。 An object of the present invention is to provide a phylogenetic tree display system capable of visually grasping the sequence similarity and function of a gene or protein on one phylogenetic tree.

本発明によると、遺伝子又はタンパク質の配列類似度と機能を同時に表示する系統樹を作成する。 According to the present invention, a phylogenetic tree that simultaneously displays the sequence similarity and function of a gene or protein is created.

本発明によると、遺伝子又はタンパク質の配列類似度と機能を一つの系統樹上で視覚的に把握することができる。 According to the present invention, the sequence similarity and function of a gene or protein can be visually grasped on one phylogenetic tree.

以下、本発明の一実施形態について図面により詳しく説明する。本例の系統樹表示システムは、系統樹作成サーバ１０とクライアント端末２０を有し、両者は通信ネットワーク３０経由で接続されている。即ち、クライアントサーバ型のシステムである。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. The phylogenetic tree display system of this example includes a phylogenetic tree creation server 10 and a client terminal 20, both of which are connected via a communication network 30. That is, it is a client server type system.

クライアント端末２０は、制御部２１、配列類似度及び機能に関するデータの入力処理を行う配列類似度及び機能データ入力処理部２２、名称及び配列の入力処理を行う名称及び配列入力処理部２３、機能付き系統樹を表示処理する機能付き系統樹表示処理部２４、入力部２５、及び、表示部２６を有する。 The client terminal 20 includes a control unit 21, an array similarity and function data input processing unit 22 that performs input processing of data related to sequence similarity and functions, a name and array input processing unit 23 that performs input processing of names and arrays, and functions. It has a phylogenetic tree display processing unit 24 with a function of displaying a phylogenetic tree, an input unit 25, and a display unit 26.

系統樹作成サーバ１０は、制御部１１、認証部１２、配列類似度を計算する配列類似度計算部１３、機能付き系統樹ファイルを作成する機能付き系統樹ファイル作成部１４、配列類似度及び機能を格納する配列類似度及び機能データベース１５、及び、機能付き系統樹ファイルを格納する系統樹ファイルデータベース１６を有する。 The phylogenetic tree creation server 10 includes a control unit 11, an authentication unit 12, a sequence similarity calculation unit 13 that calculates sequence similarity, a function-based phylogenetic file creation unit 14 that creates a functional phylogenetic file, sequence similarity and function The sequence similarity and function database 15 for storing the phylogenetic tree, and the phylogenetic tree file database 16 for storing the phylogenetic tree file with functions.

本例の系統樹表示システムによると、クライアント端末２０から、既知の遺伝子又はタンパク質の名称が入力された場合には、入力された遺伝子又はタンパク質の機能が強調表示された機能付き系統樹を作成し、表示する。 According to the phylogenetic tree display system of this example, when a name of a known gene or protein is input from the client terminal 20, a functional phylogenetic tree in which the function of the input gene or protein is highlighted is created. ,indicate.

強調表示の方法には、その名称を太字、斜体、色彩が異なる文字等によって表わす方法がある。機能の表示方法には、系統を表わす水平な線に対して垂直な線を描き、その垂直な線の長さが機能の強さに比例するように描く。それによって平面的な系統樹に対して、立体的な機能表示が得られる。 As a highlighting method, there is a method of expressing the name by bold characters, italics, characters having different colors, or the like. In the function display method, a line perpendicular to the horizontal line representing the system is drawn and the length of the vertical line is proportional to the strength of the function. As a result, a three-dimensional function display can be obtained for a planar tree.

クライアント端末２０から、機能が未知の配列が入力された場合には、入力された配列の配列類似度を表わす機能付き系統樹を作成し、それを表示する。従って、入力された配列の配列類似度から、その配列の機能を推定することができる。 When a sequence with an unknown function is input from the client terminal 20, a phylogenetic tree with a function representing the sequence similarity of the input sequence is created and displayed. Therefore, the function of the sequence can be estimated from the sequence similarity of the input sequence.

本例の系統樹表示システムによると、配列類似度のみを表示する既存の系統樹表示システムとは異なり、配列類似度に機能を加えて系統樹表示するから、配列類似度と機能を視覚的に把握することができる。 Unlike the existing phylogenetic tree display system that displays only the sequence similarity, the phylogenetic tree display system of this example adds a function to the sequence similarity and displays the phylogenetic tree. I can grasp it.

図２を参照して、配列類似度及び機能データベース１５のデータの作成方法を説明する。ステップＳ１０１にて、クライアント端末２０のWebブラウザ(図示しない)は、表示部２６に配列類似度及び機能の入力画面を表示する。図４は、配列類似度及び機能の入力画面の例を示す。クライアントは入力部２５を介して配列類似度及び機能を入力する。ステップＳ１０２にて、クライアント端末２０の配列類似度及び機能データ入力処理部２２は、配列類似度及び機能を、系統樹作成サーバ１０に送信する。ステップＳ１０３にて、系統樹作成サーバ１０の認証部１２は、ユーザＩＤとパスワードを確認する。ステップＳ１０４にて、系統樹作成サーバ１０の制御部１１は、クライアント端末２０からの配列類似度及び機能を配列類似度及び機能データベース１５に格納する。 With reference to FIG. 2, a method of creating data of the sequence similarity and function database 15 will be described. In step S 101, a Web browser (not shown) of the client terminal 20 displays an array similarity and function input screen on the display unit 26. FIG. 4 shows an example of an input screen for sequence similarity and function. The client inputs the sequence similarity and function via the input unit 25. In step S 102, the sequence similarity and function data input processing unit 22 of the client terminal 20 transmits the sequence similarity and function to the phylogenetic tree creation server 10. In step S103, the authentication unit 12 of the phylogenetic tree creation server 10 confirms the user ID and password. In step S 104, the control unit 11 of the phylogenetic tree creation server 10 stores the sequence similarity and function from the client terminal 20 in the sequence similarity and function database 15.

図３を参照して、機能付き系統樹ファイルを作成しそれを系統樹データベースに格納する方法を説明する。ステップＳ２０１にて、クライアント端末２０のWebブラウザは、表示部２６に名称及び配列の入力画面を表示する。図５は、名称及び配列の入力画面の例を示す。クライアントは、入力部２５を介して名称及び配列を入力する。既知の遺伝子又はタンパク質の機能が強調表示された機能付き系統樹ファイルを作成する場合には、既知の遺伝子又はタンパク質の名称を入力する。機能が未知の配列について機能付き系統樹ファイルを作成する場合には、塩基配列又はアミノ酸配列を入力する。 With reference to FIG. 3, a method of creating a phylogenetic tree file with functions and storing it in the phylogenetic tree database will be described. In step S 201, the Web browser of the client terminal 20 displays a name and arrangement input screen on the display unit 26. FIG. 5 shows an example of a name and arrangement input screen. The client inputs the name and arrangement via the input unit 25. When creating a functional phylogenetic tree file in which the functions of known genes or proteins are highlighted, the names of known genes or proteins are input. When creating a phylogenetic tree file with functions for sequences whose functions are unknown, a base sequence or amino acid sequence is input.

ステップＳ２０２にて、クライアント端末２０の名称及び配列入力処理部２３は名称及び配列を、系統樹作成サーバ１０に送信する。ステップＳ２０３にて、系統樹作成サーバ１０の認証部１２は、ユーザＩＤとパスワードを確認する。ステップＳ２０４にて、系統樹作成サーバ１０の制御部１１は、クライアント端末２０から送信された名称及び配列と同一の名称及び配列を配列類似度及び機能データベース１５から検索する。ステップＳ２０５にて、同一の名称及び配列が存在するか否かを判定する。同一の名称及び配列が存在しない場合には、ステップＳ２０１に戻る。同一の名称及び配列が存在する場合には、ステップＳ２０６に進む。ステップＳ２０６にて、系統樹作成サーバ１０の配列類似度計算部１３は、配列類似度を計算する。クライアント端末２０から名称が送信された場合には、配列類似度及び機能データベース１５から読み出した配列間の配列類似度を計算する。クライアント端末２０から配列が送信された場合には、その配列と配列類似度及び機能データベース１５から読み出した配列の間の配列類似度を計算する。配列類似度として、例えば、配列間距離の計算する。 In step S 202, the name and arrangement input processing unit 23 of the client terminal 20 transmits the name and arrangement to the phylogenetic tree creation server 10. In step S203, the authentication unit 12 of the phylogenetic tree creation server 10 confirms the user ID and password. In step S 204, the control unit 11 of the phylogenetic tree creation server 10 searches the sequence similarity and function database 15 for the same name and sequence as the name and sequence transmitted from the client terminal 20. In step S205, it is determined whether or not the same name and arrangement exist. If the same name and arrangement do not exist, the process returns to step S201. If the same name and arrangement exist, the process proceeds to step S206. In step S206, the sequence similarity calculation unit 13 of the phylogenetic tree creation server 10 calculates the sequence similarity. When the name is transmitted from the client terminal 20, the sequence similarity and the sequence similarity between the sequences read from the function database 15 are calculated. When an array is transmitted from the client terminal 20, the array and the array similarity and the array similarity between the arrays read from the function database 15 are calculated. For example, the inter-sequence distance is calculated as the sequence similarity.

ステップＳ２０７にて、系統樹ファイル作成部１４は、計算した配列類似度に基づいて系統樹ファイルを作成し、それに配列類似度及び機能データベース１５から読み出した機能を付加して、機能付き系統樹ファイルを作成する。ステップＳ２０１にて、クライアントが名称を入力した場合には、その名称の機能が強調表示された機能付き系統樹が作成される。ステップＳ２０１にて、クライアントが、機能が未知の配列を入力した場合には、その配列と同一の配列が配列類似度及び機能データベース１５に存在してもその配列の機能は、通常、配列類似度及び機能データベース１５に存在しない。従って、作成された機能付き系統樹では、その配列の機能は表示されない。 In step S207, the phylogenetic tree file creation unit 14 creates a phylogenetic tree file based on the calculated sequence similarity, adds the sequence similarity and the function read from the function database 15, and adds a function tree. Create In step S201, when the client inputs a name, a function-related phylogenetic tree in which the function of the name is highlighted is created. In step S201, when the client inputs an array whose function is unknown, the function of the array is usually the sequence similarity even if the same array exists in the sequence similarity and function database 15. And does not exist in the function database 15. Therefore, the function of the arrangement is not displayed in the created phylogenetic tree with functions.

ステップＳ２０８にて、制御部１１は、機能付き系統樹ファイルを系統樹ファイルデータベース１６に格納する。 In step S208, the control unit 11 stores the phylogenetic tree file with functions in the phylogenetic tree file database 16.

図４は、クライアントが配列類似度及び機能を配列類似度及び機能データベース１５に登録するための入力画面の例を示す。図２のステップＳ１０１の処理においてこの入力画面が用いられる。入力画面４０１は、系統樹作成サーバ１０からクライアント端末２０に送信され、クライアント端末２０のWebブラウザによって表示される。入力画面４０１は、登録切替ボタン４０２、系統樹切替ボタン４０３、ファイル読み込みボタン４０４、登録ボタン４０５、クリアボタン４０６、及び、６つのデータ入力フィールド４０７〜４１２を有する。 FIG. 4 shows an example of an input screen for the client to register the sequence similarity and function in the sequence similarity and function database 15. This input screen is used in the process of step S101 in FIG. The input screen 401 is transmitted from the phylogenetic tree creation server 10 to the client terminal 20 and displayed by the Web browser of the client terminal 20. The input screen 401 includes a registration switching button 402, a phylogenetic tree switching button 403, a file reading button 404, a registration button 405, a clear button 406, and six data input fields 407 to 412.

登録切替ボタン４０２は、配列類似度及び機能を登録するための入力画面４０１を表示するためのボタンである。図示の状態では、登録切替ボタン４０２がクリックされており、図４には入力画面４０１が表示されている。系統樹切替ボタン４０３は、入力画面を切り換えるためのボタンである。系統樹切替ボタン４０３をクリックすると図５に示す入力画面が表示される。 The registration switching button 402 is a button for displaying an input screen 401 for registering the sequence similarity and the function. In the state shown in the figure, the registration switching button 402 is clicked, and an input screen 401 is displayed in FIG. The phylogenetic tree switching button 403 is a button for switching the input screen. When the phylogenetic tree switching button 403 is clicked, an input screen shown in FIG. 5 is displayed.

ファイル読み込みボタン４０４は、配列類似度及び機能データベース１５から配列類似度及び機能を取得するためのボタンである。登録ボタン４０５は、配列類似度及び機能を、配列類似度及び機能デーベース１５に登録する際に使用する。クリアボタン４０６は、表示しているデータを一括して消去することができる。 The file reading button 404 is a button for acquiring the sequence similarity and function from the sequence similarity and function database 15. The registration button 405 is used when registering the sequence similarity and function in the sequence similarity and function database 15. The clear button 406 can erase the displayed data all at once.

データ入力フィールドの第１列４０７は、各行を識別するＩＤを入力するフィールドである。ＩＤは通し番号である。第２列４０８は、各行の識別する名称を入力するフィールドである。名称は、各行を表わす遺伝子又はタンパク質の識別子である。第３列４０９は、塩基配列及びアミノ酸配列の配列類似度を入力するフィールドである。第４列４１０は、機能である結合力を入力するフィールドである。結合力は、実験で得た値及び文献中に記載された値を用いる。ここでは機能として結合力を用いるが、RT-PCR（Reverse Transcriptase-Polymerase Chain Reaction）を用いた実験結果であるｍＲＮＡの発現量を機能として用いてもよい。 The first column 407 of the data input field is a field for inputting an ID for identifying each row. ID is a serial number. The second column 408 is a field for inputting a name for identifying each row. The name is an identifier of a gene or protein representing each row. The third column 409 is a field for inputting the sequence similarity between the base sequence and the amino acid sequence. The fourth column 410 is a field for inputting a binding force that is a function. For the binding force, values obtained in experiments and values described in the literature are used. Here, binding force is used as a function, but the expression level of mRNA, which is an experimental result using RT-PCR (Reverse Transcriptase-Polymerase Chain Reaction), may be used as a function.

第５列４１１は登録名を入力するフィールドである。例えば、配列類似度及び結合力などのデータを入力したユーザ名を入力する。第６列４１１は、配列類似度及び結合力などのデータを入力した年月日を入力するフィールドである。尚、ログインする際に使用したユーザＩＤとログイン時間を予め記憶しておき、クライアントが配列類似度及び機能を登録する際、自動的にユーザ名と年月日を入力することで、第５列４１０と第６列４１１の入力作業を省くことができる。 The fifth column 411 is a field for inputting a registered name. For example, the user name that inputs data such as sequence similarity and binding strength is input. The sixth column 411 is a field for inputting the date on which data such as sequence similarity and binding force is input. The user ID and login time used when logging in are stored in advance, and when the client registers the sequence similarity and function, the user name and date are automatically entered in the fifth column. 410 and the sixth column 411 can be omitted.

図５は、クライアントが機能付き系統樹ファイルを作成するときに用いる入力画面の例を示す。図３のステップＳ２０１の処理においてこの入力画面が用いられる。入力画面５０１は、系統樹作成サーバ１０からクライアント端末２０に送信され、クライアント端末２０のWebブラウザによって表示される。入力画面５０１は、登録切替ボタン５０２、系統樹切替ボタン５０３、実行ボタン５０４、クリアボタン５０５、名称入力欄５０６、及び、配列入力欄５０７を有する。登録切替ボタン５０２及び系統樹切替ボタン５０３の機能は、図４の入力画面４０１の登録切替ボタン４０２、系統樹切替ボタン４０３の機能と同一である。 FIG. 5 shows an example of an input screen used when the client creates a functional tree file with functions. This input screen is used in the process of step S201 in FIG. The input screen 501 is transmitted from the phylogenetic tree creation server 10 to the client terminal 20 and displayed by the Web browser of the client terminal 20. The input screen 501 includes a registration switching button 502, a phylogenetic tree switching button 503, an execution button 504, a clear button 505, a name input field 506, and an array input field 507. The functions of the registration switching button 502 and the phylogenetic tree switching button 503 are the same as the functions of the registration switching button 402 and the phylogenetic tree switching button 403 on the input screen 401 in FIG.

名称入力欄５０６は、機能付き系統樹において強調表示する対象である遺伝子又はタンパク質を特定する名称を入力するフィールドである。ここに入力する名称の例は、図４の第２列４０８の名称の欄に記載されている。配列入力欄５０７は、機能が未知の配列を含む機能付き系統樹ファイルを作成するとき、機能が未知の配列である塩基配列又はアミノ酸配列を入力するフィールドである。機能が未知であっても、その配列の名称が既知である場合には、その名称も入力する。実行ボタン５０４は機能付き系統樹ファイルを作成するためのボタンであり、クリアボタン５０５は入力した名称入力欄５０６及び配列入力欄５０７の内容を一括で消去するためのボタンである。 The name input field 506 is a field for inputting a name for specifying a gene or protein to be highlighted in the functional tree. Examples of names to be entered here are described in the name column of the second column 408 in FIG. The sequence input field 507 is a field for inputting a base sequence or amino acid sequence, which is a sequence with an unknown function, when creating a phylogenetic tree file with a function including a sequence with an unknown function. Even if the function is unknown, if the name of the sequence is known, the name is also input. The execute button 504 is a button for creating a function-related phylogenetic tree file, and the clear button 505 is a button for erasing the contents of the input name input field 506 and array input field 507 at once.

図６は、ＸＭＬ形式の系統樹ファイルの例である。尚、他の形式のファイルであってもよい。図６（a）の系統樹ファイル６０１は、図５の入力画面５０１の配列入力欄５０７に、機能が未知の配列の名称Ｅ５を入力した場合の例である。タグ名をtagとしtagの属性として、名称を示すname属性、配列類似度を示すdistance、tagのテキスト中に、結合力の値を持っている。名称Ｅ５の機能が未知であるため、機能として結合力を示す<tag>のテキスト中には、nullが表示されている。 FIG. 6 shows an example of a phylogenetic tree file in the XML format. Note that other types of files may be used. The phylogenetic tree file 601 in FIG. 6A is an example when the name E5 of an array whose function is unknown is entered in the array input field 507 of the input screen 501 in FIG. The tag name is tag, and the tag attribute has the name attribute indicating the name, the distance indicating the sequence similarity, and the value of the binding force in the text of tag. Since the function of the name E5 is unknown, null is displayed in the text of <tag> indicating the binding force as a function.

図６（b）の系統樹ファイル６０２は、図５の入力画面５０１の名称入力欄５０６に名称Ａ１を入力し、配列入力欄５０７に、名称Ｅ５を入力した場合の例である。名称入力欄５０６に入力した名称が配列類似度及び機能データベース１５に存在しているか否かは、系統樹ファイル６０２のtagの属性値flgで判断する。入力した名称が配列類似度及び機能データベース１５に含まれている場合にはflgの値を１とし、含まれていない場合にはflgの値を0とする。本例では、名称検索欄５０６に入力した名称Ａ１は配列類似度及び機能データベース１５に含まれているため、tagの属性flgが”１”となっている。 The phylogenetic tree file 602 in FIG. 6B is an example when the name A1 is input into the name input field 506 of the input screen 501 in FIG. 5 and the name E5 is input into the array input field 507. Whether the name input in the name input field 506 exists in the sequence similarity and function database 15 is determined by the tag attribute value flg in the phylogenetic tree file 602. When the input name is included in the sequence similarity and function database 15, the value of flg is set to 1, and when it is not included, the value of flg is set to 0. In this example, since the name A1 input in the name search field 506 is included in the sequence similarity and function database 15, the tag attribute flg is “1”.

図７は、系統樹ファイルを画面に表示した例を示す。図７(a)は、図６（a）の系統樹ファイル６０１の系統樹を表示した画面である。この系統樹は、配列類似度、即ち、配列間の距離のみを表わし、機能を表示しない。このように、平面的な系統樹は配列類似性の高い各グループで重要な塩基及びアミノ酸を見出す際に有効であるが、機能に関する情報が含まれていないため、機能未知の塩基配列及びアミノ酸配列の機能を推定するには向いていない。このような系統樹を用いて機能を推定するにはグループごとあるいは、配列ごとに機能を推定する作業が必要となる。 FIG. 7 shows an example in which a phylogenetic tree file is displayed on the screen. FIG. 7 (a) is a screen displaying the phylogenetic tree of the phylogenetic tree file 601 of FIG. 6 (a). This phylogenetic tree represents only the sequence similarity, ie, the distance between sequences, and does not display the function. Thus, a planar phylogenetic tree is effective in finding important bases and amino acids in each group with high sequence similarity, but does not include information on functions, so base sequences and amino acid sequences with unknown functions are not included. It is not suitable for estimating the function. In order to estimate the function using such a phylogenetic tree, it is necessary to estimate the function for each group or each sequence.

図７(b)は、図６（b）の系統樹ファイル６０２の系統樹を表示した例である。この系統樹は、配列類似度と機能を表示する。平面上の系統樹に、機能として結合力が高さとして表示されている。こうして、機能を立体表示することによって、配列類似度と機能を一つの系統樹で表示することとができる。結合力の強い遺伝子及びタンパク質ほど高さが高くなる。 FIG. 7B is an example in which the phylogenetic tree of the phylogenetic tree file 602 in FIG. 6B is displayed. This phylogenetic tree displays sequence similarity and function. In the tree on the plane, the bond strength is displayed as a height as a function. Thus, by displaying the function in three dimensions, the sequence similarity and the function can be displayed in one phylogenetic tree. Genes and proteins with stronger binding power are higher in height.

本例は、図５の入力画面５０１の名称入力欄５０６に名称Ａ１を入力し、配列入力欄５０７に、名称Ｅ５を入力した場合である。名称Ａ１が配列類似度及び機能データベース１５に存在する場合、系統樹表示画面７０２では、名称Ａ１が太字で表示される。 In this example, the name A1 is input in the name input field 506 of the input screen 501 in FIG. 5 and the name E5 is input in the array input field 507. When the name A1 exists in the sequence similarity and function database 15, the name A1 is displayed in bold on the phylogenetic tree display screen 702.

本例によると、配列類似度と機能が立体的に表示されるから、視覚的に把握することが容易である。そこで、機能未知の配列の機能を推定することができる。例えば、配列類似度を計算した結果、機能未知の名称Ｅ５は、名称Ａ１と名称Ｂ２のグループより名称Ｃ３と名称Ｄ４のグループとの配列類似度が高い。配列類似度と機能類似性との関係は例外があるものの一般的に相関関係を示すことが多い。従って、名称Ｅ５の機能は、名称Ｃ３と名称Ｄ４のグループの機能に近いことが推定することができる。 According to this example, since the sequence similarity and the function are displayed in a three-dimensional manner, it is easy to grasp visually. Therefore, the function of the sequence whose function is unknown can be estimated. For example, as a result of calculating the sequence similarity, the name E5 whose function is unknown has a higher sequence similarity between the group of the name C3 and the name D4 than the group of the name A1 and the name B2. In general, the relationship between sequence similarity and functional similarity generally shows a correlation, although there are exceptions. Therefore, it can be estimated that the function of the name E5 is close to the function of the group of the name C3 and the name D4.

図８は、クライアント端末２０と系統樹作成サーバ１０の間の動作形態の推移を示す図である。クライアントは、先ず、Webブラウザ上でユーザＩＤとパスワードを入力し、ログインする。系統樹作成サーバ１０はユーザＩＤとパスワードを確認し、ログインの結果をWebブラウザ上に表示する。次に、クライアントは、配列類似度及び機能データの作成を行うか、機能付き系統樹ファイルを作成するために名称又は配列の入力を行うかを選択する。 FIG. 8 is a diagram illustrating the transition of the operation mode between the client terminal 20 and the phylogenetic tree creation server 10. First, the client inputs the user ID and password on the Web browser and logs in. The phylogenetic tree creation server 10 confirms the user ID and password, and displays the login result on the Web browser. Next, the client selects whether to create sequence similarity and function data, or to input a name or sequence to create a functional tree file.

配列類似度及び機能データを作成する場合には、クライアント端末２０から系統樹作成サーバ１０へ、配列類似度及び機能を送信する。系統樹作成サーバ１０は、配列類似度及び機能を配列類似度及び機能データベース１５に格納する。 When creating sequence similarity and function data, the sequence similarity and function are transmitted from the client terminal 20 to the phylogenetic tree creation server 10. The phylogenetic tree creation server 10 stores the sequence similarity and function in the sequence similarity and function database 15.

機能付き系統樹ファイルを作成する場合には、クライアント端末２０から系統樹作成サーバ１０へ、名称及び配列を送信する。系統樹作成サーバ１０は、配列類似度及び機能データベース１６を検索し、同一の名称及び配列を読み出す。配列間の配列類似度を計算し、機能付き系統樹ファイルを作成し、それを系統樹ファイルデータベース１６に格納する。 When creating a phylogenetic tree file with functions, a name and an array are transmitted from the client terminal 20 to the phylogenetic tree creation server 10. The phylogenetic tree creation server 10 searches the sequence similarity and function database 16 and reads the same name and sequence. The degree of sequence similarity between the sequences is calculated, and a phylogenetic tree file with functions is created and stored in the phylogenetic tree file database 16.

クライアント端末２０は、系統樹ファイルデータベース１６に格納された機能付き系統樹ファイルをWebサーバから取得し、系統樹ファイルを解析し、系統樹を画面に表示する。 The client terminal 20 acquires the phylogenetic tree file with functions stored in the phylogenetic tree file database 16 from the Web server, analyzes the phylogenetic tree file, and displays the phylogenetic tree on the screen.

図９を参照して、機能付き系統樹の表示方法を説明する。ステップＳ３０１にて、クライアントは入力部２５を介して、系統樹ファイル名を入力する。ステップＳ３０２にて、クライアント端末２０は、系統樹ファイル名を、系統樹作成サーバ１０に送信する。ステップＳ３０３にて、系統樹作成サーバ１０の認証部１２は、ユーザＩＤとパスワードを確認する。ステップＳ３０４にて、系統樹作成サーバ１０の制御部１１は、系統樹ファイルデータベース１６から機能付き系統樹ファイルを読み出し、それをクライアント端末２０に送信する。ステップＳ３０５にて、機能付き系統樹表示処理部２４は機能付き系統樹の画像データを作成する。ステップＳ３０６にて、機能付き系統樹の画像を表示部２６に表示する。 With reference to FIG. 9, the display method of a phylogenetic tree with a function is demonstrated. In step S301, the client inputs a phylogenetic tree file name via the input unit 25. In step S 302, the client terminal 20 transmits the phylogenetic tree file name to the phylogenetic tree creation server 10. In step S303, the authentication unit 12 of the phylogenetic tree creation server 10 confirms the user ID and password. In step S 304, the control unit 11 of the phylogenetic tree creation server 10 reads out a phylogenetic tree file with functions from the phylogenetic tree file database 16 and transmits it to the client terminal 20. In step S305, the function-related phylogenetic tree display processing unit 24 creates image data of the function-equipped phylogenetic tree. In step S306, an image of the phylogenetic tree with functions is displayed on the display unit 26.

以上、本発明の例を説明したが本発明は上述の例に限定されるものではなく、特許請求の範囲に記載された発明の範囲にて様々な変更が可能であることは当業者に容易に理解されよう。 The example of the present invention has been described above, but the present invention is not limited to the above-described example, and various modifications can be easily made by those skilled in the art within the scope of the invention described in the claims. Will be understood.

本発明による系統樹表示システムの全体の構成を示す図である。It is a figure which shows the whole structure of the phylogenetic tree display system by this invention. 配列類似度及び機能を配列類似度及び機能データベースに格納する処理を示す図である。It is a figure which shows the process which stores sequence similarity and a function in a sequence similarity and function database. 機能付き系統樹ファイルを作成し、それを系統樹ファイルデータベースに格納する処理を示す図である。It is a figure which shows the process which creates the phylogenetic tree file with a function and stores it in a phylogenetic tree file database. 配列類似度及び機能データを登録するための入力画面の例を示す図である。It is a figure which shows the example of the input screen for registering sequence similarity and functional data. 名称及び配列を入力するための入力画面の例を示す図である。It is a figure which shows the example of the input screen for inputting a name and arrangement | sequence. ＸＭＬ形式の系統樹ファイルの例を示す画面を示す図である。It is a figure which shows the screen which shows the example of the phylogenetic tree file of an XML format. 機能付き系統樹を表示する画面を示す図である。It is a figure which shows the screen which displays a phylogenetic tree with a function. 動作形態の遷移を示す図である。It is a figure which shows the transition of an operation | movement form. 機能付き系統樹を表示する処理を示す図である。It is a figure which shows the process which displays the phylogenetic tree with a function.

符号の説明Explanation of symbols

１０…系統樹作成サーバ、１１…制御部、１２…認証部、１３…配列類似度計算部、１４…機能付き系統樹ファイル作成部、１５…配列類似度及び機能データベース、１６…系統樹ファイルデータベース、２０…クライアント端末、２２…配列類似度及び機能データ入力処理部、２３…名称及び配列入力処理部、２４…機能付き系統樹表示処理部、２５…入力部、２６…表示部、３０…通信ネットワーク DESCRIPTION OF SYMBOLS 10 ... Phylogenetic tree creation server, 11 ... Control part, 12 ... Authentication part, 13 ... Sequence similarity calculation part, 14 ... Functional tree file creation part with function, 15 ... Sequence similarity and function database, 16 ... Phylogenetic tree file database , 20 ... Client terminal, 22 ... Sequence similarity and function data input processing unit, 23 ... Name and sequence input processing unit, 24 ... Systematic tree display processing unit with function, 25 ... Input unit, 26 ... Display unit, 30 ... Communication network

Claims

塩基配列及びアミノ酸配列の配列類似度と機能を格納したデータベースと、
配列類似度を計算する配列類似度計算部と、
配列類似度と機能を表わす系統樹ファイルを作成する系統樹ファイル作成部と、
上記系統樹ファイルを入力して系統樹を表示する表示部と、
を有し、
上記系統樹作成部は、遺伝子又はタンパク質名と塩基配列又はアミノ酸配列が入力されたとき、上記遺伝子又はタンパク質名と上記塩基配列又はアミノ酸配列と同一物を上記データベースから検索し、上記配列類似度計算部によって計算された配列類似度と上記データベースから得られた機能を同時に表示する系統樹を作成することを特徴とする系統樹表示システム。 A database storing sequence similarity and function of base sequence and amino acid sequence;
An array similarity calculation unit for calculating the sequence similarity;
A phylogenetic tree file creation unit for creating a phylogenetic tree file representing sequence similarity and function;
A display unit for displaying the phylogenetic tree by inputting the phylogenetic tree file;
Have
When the gene or protein name and the base sequence or amino acid sequence are input, the phylogenetic tree creation unit searches the database for the same gene or protein name and the base sequence or amino acid sequence, and calculates the sequence similarity A phylogenetic tree display system that creates a phylogenetic tree that simultaneously displays the sequence similarity calculated by the section and the function obtained from the database.

入力装置を介して遺伝子又はタンパク質名と塩基配列又はアミノ酸配列を入力することと、
上記遺伝子又はタンパク質名と上記塩基配列又はアミノ酸配列と同一物をデータベースから検索すること、
配列間の類似度を計算することと、
上記計算結果に基づいて系統樹を作成することと、
該系統樹にて上記遺伝子又はタンパク質の機能を表示することと、
上記遺伝子又はタンパク質名を強調して表示することと、
を含む系統樹表示方法をコンピュータに実行させるためのコンピュータに読み取り可能なプログラム。 Inputting a gene or protein name and a base sequence or amino acid sequence via an input device;
Searching the database for the same name as the gene or protein name and the base sequence or amino acid sequence;
Calculating similarity between sequences;
Creating a phylogenetic tree based on the above calculation results;
Displaying the function of the gene or protein in the phylogenetic tree;
Highlighting and displaying the gene or protein name,
A computer-readable program for causing a computer to execute a phylogenetic tree display method including: