JPH05168500A

JPH05168500A - Determination of nucleic acid base sequence

Info

Publication number: JPH05168500A
Application number: JP34435491A
Authority: JP
Inventors: Tetsuo Nishikawa; 哲夫西川; Hideki Kanbara; 秀記神原; Katsuji Murakawa; 克二村川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-12-26
Filing date: 1991-12-26
Publication date: 1993-07-02

Abstract

PURPOSE:To extend the range of the detectable base length by analyzing the electrophoretic pattern of a nucleic acid fragment generated by the dideoxy method by using a specified method. CONSTITUTION:Based on peak spectrum 8 of a signal from a nucleic acid fragment generated by the dideoxy method, the number Np of the kinds of bases corresponding to the respective peaks and having a length mutually different by the length of one base is obtained. The area surrounded by the line 11 perpendicular to the bottom of a valley in a group of the peaks, the circular part line 12 of the peak and the base line 13 is measured and normalized by a normalization factor of each peak. The average area of a prescribed number of right and left peaks involving the aimed peak is calculated and the calculated value is multiplied by a prescribed factor to determine a threshold value. After excluding peak areas having a larger value than the threshold value, the average value is calculated again so as to determine the normalization factor. Even when separation of the peaks based on detection of the maximum value is impossible because of overlap between the peaks, the number of the kinds of DNA fragments constituting the respective peaks and having a length mutually different by the length of one base can be detected. Thereby, the base sequence can be determined.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は核酸の塩基配列決定のた
めのアルゴリズム、及びソフトウェアに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an algorithm and software for determining the base sequence of nucleic acid.

【０００２】[0002]

【従来の技術】ＤＮＡの塩基配列を決定する方法とし
て，蛍光体で標識したＤＮＡからの蛍光を電気泳動中に
実時間検出する方法が最近用いられている（バイオテク
ノロジー，1988年，6巻，pp816-821(Bio/Technology,19
88, 6, pp816-821）)。ヒト遺伝子ＤＮＡなどの長大な
ＤＮＡの塩基配列決定を行うためには、一回の測定で決
定可能な塩基長をできるだけ大きくすることが望まれ
る。一回の測定で決定可能な塩基長には限界があり、こ
の限界はポリアクリルアミドゲルを使用する電気泳動に
おけるDNA断片の分離限界塩基長によって決定される。
すなわち、ゲル電気泳動においては、１塩基長だけ異な
るDNA断片どうしのピーク分離が塩基長が大きくなると
共に困難になり、ある塩基長以上になると分離検出がで
きなくなる。これは、塩基長の増大に伴うピーク半値幅
の減少の度合いがピーク間隔の減少の度合いに比べて小
さく、ある塩基長以上になると半値幅が間隔に比べて小
さくなり、隣り合ったピーク同士の分離が不可能になる
ことによっている。１塩基の分離が可能な最大の塩基長
を大きくするためには、泳動板長を長くすることと共に
蛍光スペクトル中のピークの認識法の精度が重要であ
る。従来法では、ピークの認識は基本的に極大値の認識
に基づいており、泳動路長が30cmのときには約500塩基
長までの１塩基分離認識、泳動路長が90cmのときには約
800塩基長までの１塩基分離認識が可能である。2. Description of the Related Art As a method for determining the base sequence of DNA, a method for detecting fluorescence from DNA labeled with a fluorophore in real time during electrophoresis has recently been used (Biotechnology, 1988, Volume 6, pp816-821 (Bio / Technology, 19
88, 6, pp816-821)). In order to determine the base sequence of a long DNA such as human gene DNA, it is desired to make the base length that can be determined by one measurement as large as possible. There is a limit to the base length that can be determined by one measurement, and this limit is determined by the separation limit base length of DNA fragments in electrophoresis using polyacrylamide gel.
That is, in gel electrophoresis, it becomes difficult to separate peaks between DNA fragments that differ by one base length as the base length increases, and separation detection becomes impossible when the base length exceeds a certain length. This is because the degree of decrease in peak half-width with increasing base length is smaller than the degree of decrease in peak interval, and when the base length is longer than a certain value, half-value width becomes smaller than the interval, and adjacent peaks It depends on the fact that separation becomes impossible. In order to increase the maximum base length capable of separating one base, it is important to increase the length of the electrophoresis plate and the accuracy of the method for recognizing peaks in the fluorescence spectrum. In the conventional method, the recognition of the peak is basically based on the recognition of the maximum value. When the migration path length is 30 cm, 1-base separation recognition up to about 500 base length is performed, and when the migration path length is 90 cm, the recognition is performed.
One base separation recognition up to 800 base length is possible.

【０００３】[0003]

【発明が解決しようとする課題】従来のピークの認識法
では、極大値の認識のみで塩基配列の決定を行っている
ため、ピーク同士が重なってしまうとピークの分離認識
ができなくなる。そのため、泳動路長が30cmのときには
約500塩基長までの１塩基分離認識、泳動路長が90cmの
ときには約800塩基長までの１塩基分離認識しかできな
かった。本発明の目的は、長いＤＮＡ塩基長において、
ピーク同士が重なって極大値の認識ではピークの分離が
不可能な場合に於ても、当該ピークが１塩基長ずつ長さ
の異なる核酸断片種の何種類からの信号で構成されるか
(ピーク中の核酸断片種の数)を認識することによって、
塩基配列決定可能な最大の塩基長を大きくし得る塩基配
列決定法を提供することにある。In the conventional method for recognizing peaks, the base sequence is determined only by recognizing the maximum value. Therefore, if the peaks overlap with each other, the peaks cannot be recognized separately. Therefore, when the migration path length was 30 cm, single base separation recognition up to about 500 base length was possible, and when the migration path length was 90 cm, only single base separation recognition up to about 800 base length was possible. The object of the present invention is to achieve a long DNA base length,
Even if the peaks overlap and the peaks cannot be separated by recognizing the maximum value, how many kinds of the nucleic acid fragment species the signals are composed of differing in length by one base length.
By recognizing (the number of nucleic acid fragment species in the peak),
It is to provide a base sequence determination method capable of increasing the maximum base sequence-determinable base length.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するため
に、核酸断片からの信号を検出時刻もしくは検出時刻を
単調増加で滑らかな関数によって変換した変数に対して
に対してプロットしたグラフ上で、信号ピークの輪郭線
とグラフのベース線とで囲まれるピーク面積を測定し、
このピーク面積を当該ピーク中の核酸断片種の数の指標
に用いることによって、連続した核酸断片からの信号ピ
ークの実質的な分離認識を行う。In order to achieve the above object, a signal from a nucleic acid fragment is plotted on a graph plotted against a detection time or a variable obtained by converting the detection time by a monotonically increasing smooth function. , Measure the peak area enclosed by the contour line of the signal peak and the base line of the graph,
By using this peak area as an index of the number of nucleic acid fragment species in the peak, the signal peaks from continuous nucleic acid fragments are substantially separated and recognized.

【０００５】[0005]

【作用】上記手段を用いることによって以下のことが可
能になる。ピーク同士が重なって極大値の認識ではピー
クの分離が不可能な場合に於ても、当該ピーク中に含ま
れる核酸断片種の数を認識することによって、塩基配列
の決定を行うことができる。これによって、従来法にて
約500塩基長までの１塩基分離が可能なピークスペクト
ルに対しては約700塩基長までの塩基配列決定が可能に
なり、従来法にて約800塩基長までの１塩基分離が可能
なピークスペクトルに対しては約1000塩基長までの塩基
配列決定が可能になる。このことは、塩基配列決定の決
定効率を数倍高め、ヒト遺伝子ＤＮＡなどの長大なＤＮ
Ａの塩基配列決定に非常に有効となる。By using the above means, the following becomes possible. Even when the peaks overlap and the peaks cannot be separated by the recognition of the maximum value, the nucleotide sequence can be determined by recognizing the number of the nucleic acid fragment species contained in the peaks. As a result, it becomes possible to determine the base sequence up to about 700 base length for the peak spectrum that can separate one base up to about 500 base length by the conventional method, It is possible to determine the base sequence up to about 1000 bases long for the peak spectrum that allows base separation. This enhances the determination efficiency of nucleotide sequencing several times, and increases the size of large DNA such as human gene DNA.
It is very effective in determining the nucleotide sequence of A.

【０００６】[0006]

【実施例】以下、本発明の実施例を説明する。EXAMPLES Examples of the present invention will be described below.

【０００７】（実施例１）実施例１を図１、図２、図
３、及び図４を用いて説明する。(First Embodiment) A first embodiment will be described with reference to FIGS. 1, 2, 3, and 4.

【０００８】図１はピーク面積を用いたピーク分離認識
アルゴリズムのフローチャートであり、図２は93cmの泳
動によって分離検出したＡ反応ＤＮＡ断片のピークスペ
クトルである。図３はＡ反応ＤＮＡ断片のピークスペク
トル中である。図４はＡ反応ＤＮＡ断片の規格化面積の
グラフである。FIG. 1 is a flow chart of a peak separation recognition algorithm using peak areas, and FIG. 2 is a peak spectrum of an A-reacted DNA fragment separated and detected by 93 cm migration. FIG. 3 is in the peak spectrum of the A reaction DNA fragment. FIG. 4 is a graph of the normalized area of the A reaction DNA fragment.

【０００９】本方法の基本的な考え方を図２を用いて説
明する。図２のグラフはＡ反応ＤＮＡ断片のピークスペ
クトル７を塩基長を横軸にして描いたものである。ＤＮ
Ａ断片はＭ１３ｍｐ８ファージを鋳型にしてシーケンシ
ングＡ反応を行ったものである。ＤＮＡ断片の末端をフ
ルオレセインイソチオシアネートによって標識し、アル
ゴンレーザーの励起で蛍光スペクトルを得た。泳動距離
は30cmであり、蛍光スペクトルからは500塩基長までは
隣り合ったピーク同士が分離して検出されており、正確
な塩基配列決定が可能である。500塩基長以上になると
ピークの幅が大きくなり隣り合ったピーク同士が分離し
なくなる。従って、500塩基長以下のピークは全て単一
の長さのＤＮＡ断片からのピークであるが、500塩基長
以上のピークには、塩基長が１塩基ずつ異なる２個以上
のピークが重なり合って単一のピークとして観察される
ものが存在する。図２からわかるように、500塩基長以
下の単一のピークはピーク強度が塩基長とともに非常に
一様に変化しており、平均曲線からの変動は約15％以内
である。この一様性は、シーケンシング反応においてＭ
nを用いた反応バッファーを使用したことによって実現
されている(文献；ジャーナルオブバイオロジカル
ケミストリー，1990年。265巻，pp 8322-8328.(J. Bi
ol. Chem. , 1990, 265, pp8322-8328.)）。ピーク強度
が一様に変化するということは、ピーク面積が一様に変
化することを意味する。故に、ｎ個重なり合ったピーク
の面積はその近傍の単一ピーク面積の約ｎ倍になる。従
って、逆にピーク面積を測定しモニターしていけば、分
離されていない各ピークが実際には何種類の塩基長の断
片種の信号から構成されるかを推定することができる。The basic concept of this method will be described with reference to FIG. The graph of FIG. 2 shows the peak spectrum 7 of the A reaction DNA fragment with the base length as the horizontal axis. DN
The A fragment was obtained by performing a sequencing A reaction using M13mp8 phage as a template. The ends of the DNA fragments were labeled with fluorescein isothiocyanate, and fluorescence spectra were obtained by excitation with an argon laser. The migration distance is 30 cm, and adjacent peaks are separated and detected in the fluorescence spectrum up to a length of 500 bases, which enables accurate base sequence determination. When the length is 500 bases or more, the width of the peak becomes large and the adjacent peaks cannot be separated from each other. Therefore, all peaks with a length of 500 bases or less are peaks from a DNA fragment with a single length, but peaks with a length of 500 bases or more are overlapped with two or more peaks having different base lengths. Some are observed as one peak. As can be seen from FIG. 2, the peak intensity of a single peak having a length of 500 bases or less changes very uniformly with the base length, and the variation from the average curve is within about 15%. This homogeneity is
This is achieved by using a reaction buffer containing n (Reference; Journal of Biological Chemistry, 1990. Volume 265, pp 8322-8328. (J. Bi.
ol. Chem., 1990, 265, pp8322-8328.)). A uniform change in peak intensity means a uniform change in peak area. Therefore, the area of n overlapping peaks is about n times the area of a single peak in the vicinity thereof. Therefore, conversely, if the peak areas are measured and monitored, it can be estimated how many kinds of base length signals each unseparated peak actually consists of.

【００１０】次に、ピーク分離認識アルゴリズムの詳細
を図１のフローチャートを用いて説明する。本方法は、
各ピーク群の分割１、分割面積の測定２、ピーク面積の
規格化因子の計算３、各ピーク面積の規格化４、ピーク
個数の推定５、ピーク個数の表示６から成る。Next, details of the peak separation recognition algorithm will be described with reference to the flowchart of FIG. The method is
It consists of division 1 of each peak group, measurement of division area 2, calculation of normalization factor of peak area 3, normalization 4 of each peak area, estimation 5 of peak number, and display 6 of peak number.

【００１１】まず、各ピーク群の分割１を行う。各ピー
ク群の分割は図３に示したように行う。図３は93cmの泳
動によって分離検出したＡ反応断片のピークスペクトル
８であり、850塩基長から1063塩基長までのピークを含
んでいる。横軸は塩基長で表示しているが、実際の検出
信号は泳動開始からの検出時刻を変数としてプロットさ
れる。さらに、その検出時刻をある単調増加で滑らかな
関数によって変換した変数に対して信号をプロットする
ことにより、図示したように横軸をリニアな塩基長軸と
みなすことができる。各ピークの上には各ピークに対応
する１塩基長ずつ長さの異なる塩基種の数Ｎｐを表示し
ている。図３からわかるように、塩基長が一つ異なる２
種の断片からの信号は分離されてない。図には示してい
ないが、同条件の泳動で800塩基長までのピークスペク
トルでは、連続したピークが分離して検出される。一
方、800塩基長を越えるピークスペクトルについては、
まずピークスペクトルの谷のうち極小値が所定の閾値よ
り小さい谷の極小点をつないでべース線１３とする。こ
れらの極小点の間に極小値が上記閾値を越える小さな谷
がある場合には、つまりピーク群が隣接して連なってい
るときには単独のピーク毎への分割が必要となる。そこ
で、ピーク群中の谷の底（極小点）を通る垂線11を引き
ピーク群を分割する。なお、上記の閾値はプロットされ
たグラフに応じて、所定の区域がごとに定めるのが好ま
しい。First, division 1 of each peak group is performed. The division of each peak group is performed as shown in FIG. FIG. 3 is a peak spectrum 8 of the A reaction fragment separated and detected by electrophoresis at 93 cm, and includes peaks from 850 bases to 1063 bases in length. The horizontal axis represents the base length, but the actual detection signal is plotted with the detection time from the start of migration as a variable. Further, by plotting the signal for the variable obtained by converting the detection time by a certain monotonically increasing and smooth function, the horizontal axis can be regarded as a linear base length axis as shown in the figure. Above each peak, the number Np of base species having different lengths corresponding to each peak is displayed. As can be seen from FIG. 3, two base lengths differ by one.
The signals from the seed fragments are not separated. Although not shown in the figure, consecutive peaks are separated and detected in the peak spectrum up to 800 base length under the same conditions. On the other hand, for peak spectra over 800 bases long,
First, among the valleys of the peak spectrum, the minimum points of the valleys having the minimum values smaller than the predetermined threshold value are connected to form the base line 13. When there is a small valley between these local minimum points whose local minimum value exceeds the above threshold value, that is, when peak groups are adjacent to each other, division into individual single peaks is required. Therefore, the peak group is divided by drawing a perpendicular line 11 passing through the bottom (minimum point) of the valley in the peak group. In addition, it is preferable that the above-mentioned threshold value is determined for each predetermined area according to the plotted graph.

【００１２】分割面積の測定２は、この垂線11、ピーク
の輪郭線12、及びベース線13で囲まれる面積の測定によ
って行う。もともと単独のピークについては、ピークの
輪郭線１２とベース線１３とで囲まれる面積が測定され
る。以上の操作はＣ、Ｇ、Ｔの反応断片からそれぞれ得
られたピークについても行う。The measurement 2 of the divided area is performed by measuring the area surrounded by the perpendicular line 11, the peak contour line 12, and the base line 13. For an originally single peak, the area surrounded by the contour line 12 of the peak and the base line 13 is measured. The above operation is also performed for the peaks obtained from the C, G, and T reaction fragments.

【００１３】次に、ピーク面積の規格化因子の計算３を
次のように行う。まず、着目するピークを含めて前後一
定数（例えば全部で５個）のピークの面積を平均し、こ
の平均値をＰとする。この平均の中からＮｐ＝１のピー
ク、つまり１種類の塩基長の断片の信号ピークを抽出す
る、ピーク面積がＰの定数倍（例えば1.1倍）以上のピ
ークを除外して、再度平均する。これによって、ピーク
面積の規格化因子Ｐ'が求まる。この平均操作において
は、平均の範囲をピーク個数ではなく、着目するピーク
の前後一定時間（例えば5分間）ととってもよい。ま
た、規格化因子としては、前後一定数あるいは一定時間
中のピークの面積が最小のものを採用してもよい。Next, calculation 3 of the peak area normalization factor is performed as follows. First, the area of a fixed number of peaks (for example, 5 peaks in total) including the peak of interest is averaged, and the average value is defined as P. From this average, a peak of Np = 1, that is, a signal peak of a fragment of one type of base length is extracted, and a peak having a peak area equal to or larger than a constant multiple (for example, 1.1 times) of P is excluded and averaged again. As a result, the peak area normalization factor P ′ is obtained. In this averaging operation, the average range may be set to a fixed time (for example, 5 minutes) before and after the peak of interest instead of the number of peaks. Further, as the normalizing factor, a factor having a minimum number of peaks before and after a certain number of times or a certain period of time may be adopted.

【００１４】各ピーク面積の規格化４は、各々のピーク
の規格化因子Ｐ'で各ピークの面積を除することによっ
て行われる。ＤＮＡ断片種の数の推定５は、規格化面積
を四捨五入することによって行われる。図４に図３のピ
ークスペクトルから上述の方法によって規格化ピーク面
積を求め、塩基長に対してプロットしたグラフを示す。
Ｎｐの値によって異なったシンボル（14；Ｎｐ＝１、1
5；Ｎｐ＝２、16；Ｎｐ＝３、17；Ｎｐ＝４）で表示し
た。Ｎｐ＝１のピークの規格化ピーク面積をＳ１、Ｎｐ
＝２のピークの規格化ピーク面積をＳ２、Ｎｐ＝３のピ
ークの規格化ピーク面積をＳ３、Ｎｐ＝４のピークの規
格化ピーク面積をＳ４と表せば、Ｓ１、Ｓ２、Ｓ３、Ｓ
４の平均値はそれぞれ1.0、2.0、3.0、3.95である。Ｓ
１、Ｓ２、Ｓ３、Ｓ４のそれぞれ1、2、3、4からのずれ
の幅はそれぞれ＋0.15〜−0.15、＋0.3〜−0.2、＋0.35
〜−0.25、−0.1であるから、Ｓ１、Ｓ２、Ｓ３、Ｓ４
を四捨五入した値は全ピークについてそれぞれ、1、2、
3、4になり、間違いなくピーク数の認識が可能である。
Ｃ、Ｇ、Ｔの反応断片から得られたピークについてもそ
れぞれ同様なことを行えば、約1050塩基長までの塩基配
列が正確に決定されることになる。Normalization 4 of each peak area is performed by dividing the area of each peak by the normalization factor P'of each peak. The estimation 5 of the number of DNA fragment species is performed by rounding off the normalized area. FIG. 4 shows a graph in which the normalized peak area is obtained from the peak spectrum of FIG. 3 by the above method and plotted against the base length.
Different symbols depending on the value of Np (14; Np = 1, 1
5; Np = 2, 16; Np = 3, 17; Np = 4). The normalized peak area of the peak of Np = 1 is S1, Np
= 2, the normalized peak area of the peak of Np = 3 is S3, and the normalized peak area of the peak of Np = 4 is S4, S1, S2, S3, S
The average values of 4 are 1.0, 2.0, 3.0 and 3.95, respectively. S
The deviation widths of 1, S2, S3, and S4 from 1, 2, 3, and 4 are +0.15 to -0.15, +0.3 to -0.2, and +0.35, respectively.
~ -0.25, -0.1, S1, S2, S3, S4
Rounded values are 1, 2, and
It becomes 3 or 4, and it is definitely possible to recognize the number of peaks.
If the same applies to the peaks obtained from the reaction fragments of C, G, and T, the base sequence up to about 1050 bases in length can be accurately determined.

【００１５】以上の方法によって求めたＤＮＡ断片種の
数の表示６は、図２中のピークスペクトル中に表示した
ように行われる。泳動距離が30cmの場合にも、規格化面
積を用いたピーク分離認識法を適用すれば、約700塩基
長までの塩基配列決定が正確に行えるようになる。The display 6 of the number of DNA fragment species obtained by the above method is performed as shown in the peak spectrum in FIG. Even when the migration distance is 30 cm, if the peak separation recognition method using the standardized area is applied, the base sequence up to about 700 base length can be accurately determined.

【００１６】[0016]

【発明の効果】本発明によれば、ピーク同士が重なって
極大値の認識ではピークの分離が不可能な場合に於て
も、各ピークを構成する１塩基長ずつ長さの異なるＤＮ
Ａ断片種の数を認識することによって、塩基配列の決定
を行うことができる。これによって、約500塩基長まで
の１塩基分離が可能なピークスペクトルに対しては約70
0塩基長までの塩基配列決定が可能になり、約800塩基長
までの１塩基分離が可能なピークスペクトルに対しては
約1000塩基長までの塩基配列決定が可能になる。このこ
とは、塩基配列決定の決定効率を数倍高め、ヒト遺伝子
ＤＮＡなどの長大なＤＮＡの塩基配列決定に非常に有効
となる。According to the present invention, even when peaks are overlapped with each other and it is impossible to separate the peaks by recognizing the maximum value, DNs having different lengths by one base length constituting each peak are provided.
The base sequence can be determined by recognizing the number of A fragment species. This gives about 70 for peak spectra that can separate 1 base up to about 500 bases long.
It is possible to determine the base sequence up to 0 base length, and it is also possible to determine the base sequence up to about 1000 base length for the peak spectrum capable of separating 1 base up to about 800 base length. This enhances the determination efficiency of base sequence determination several times, and is very effective for base sequence determination of long DNA such as human gene DNA.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の実施例１の説明図で、ピーク面積を用
いたピーク分離認識アルゴリズムのフローチャートであ
る。FIG. 1 is an explanatory diagram of a first embodiment of the present invention and is a flowchart of a peak separation recognition algorithm using a peak area.

【図２】Ａ反応ＤＮＡ断片のピークスペクトルである。FIG. 2 is a peak spectrum of A reaction DNA fragment.

【図３】93cmの泳動によって分離検出したＡ反応ＤＮＡ
断片のピークスペクトルである。FIG. 3 A reaction DNA separated and detected by 93 cm electrophoresis
It is a peak spectrum of a fragment.

【図４】Ａ反応ＤＮＡ断片の規格化面積のグラフであ
る。FIG. 4 is a graph of the normalized area of A reaction DNA fragment.

【符号の説明】[Explanation of symbols]

１…各ピーク群の分割、２…分割面積の測定、３…ピーク面積の規格化因子の計算、４…各ピーク面積の規格化、５…ＤＮＡ断片種の数の推定、６…ＤＮＡ断片種の数の表示、７、８…Ａ反応ＤＮＡ断片のピークスペクトル、１０…ピーク群、１１…ピーク群中の谷の底を通る垂線、１２…ピークの輪郭線、１３…ベース線、１４…Ｎｐ＝１のピーク、１５…Ｎｐ＝２のピーク、１６…Ｎｐ＝３のピーク、１７…Ｎｐ＝４のピーク。 1 ... Division of each peak group, 2 ... Measurement of division area, 3 ... Calculation of normalization factor of peak area, 4 ... Normalization of each peak area, 5 ... Estimation of number of DNA fragment species, 6 ... DNA fragment species , 8 ... peak spectrum of A-reacted DNA fragment, 10 ... peak group, 11 ... perpendicular line passing through bottom of valley in peak group, 12 ... peak contour line, 13 ... base line, 14 ... Np = 1 peak, 15 ... Np = 2 peak, 16 ... Np = 3 peak, 17 ... Np = 4 peak.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁵ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０６Ｆ 15/20 Ｆ 7218−5Ｌ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁵ Identification code Office reference number FI technical display location G06F 15/20 F 7218-5L

Claims

【特許請求の範囲】[Claims]

【請求項１】ダイデオキシ法によって生成した核酸断片
の電気泳動パターンを解析する核酸塩基配列決定方法に
おいて、核酸断片からの信号を検出時刻もしくは検出時
刻を単調増加で滑らかな関数によって変換した変数に対
してプロットしたグラフを作成し、上記グラフ上で各々
の信号ピークの輪郭線とグラフのベース線とで囲まれる
ピーク面積を測定し、各々のピークが１塩基長ずつ長さ
の異なる核酸断片種の何種類からの信号で構成されるか
(ピーク中の核酸断片種の数)を前記ピーク面積から得る
指標で決定して連続した核酸断片からの信号ピークの分
離認識を行うことを特徴とする核酸塩基配列決定方法。1. A method for determining a nucleic acid base sequence for analyzing an electrophoretic pattern of a nucleic acid fragment produced by the dideoxy method, wherein a signal from the nucleic acid fragment is converted into a variable obtained by converting the detection time by a monotonically increasing smooth function. A graph plotted against each other is prepared, and the peak areas surrounded by the contour lines of the respective signal peaks and the base line of the graph are measured on the graph, and each peak is a nucleic acid fragment species different in length by one base length. From what kind of signal
A method for determining a nucleic acid base sequence, characterized in that (the number of nucleic acid fragment species in a peak) is determined by an index obtained from the peak area, and signal peaks from consecutive nucleic acid fragments are separately recognized.

【請求項２】前記ベース線は所定の閾値より小さい極小
値を有する上記グラフの谷の極小点同士を結ぶ線であ
り、前記閾値より大きな極小値を有する谷をはさんで連
続する複数の信号ピーク群については、その谷の極小点
を通る時間軸と垂直な分割線で分割し、前記ベース線、
前記輪郭線及び前記分割線で囲まれる面積を個々の信号
ピークのピーク面積とすることを特徴とする請求項１に
記載の核酸塩基配列決定方法。2. The base line is a line connecting local minimum points of valleys of the graph having a minimum value smaller than a predetermined threshold value, and a plurality of signals continuous across a valley having a minimum value larger than the threshold value. The peak group is divided by a dividing line perpendicular to the time axis passing through the minimum point of the valley, and the base line,
The nucleic acid base sequence determination method according to claim 1, wherein an area surrounded by the contour line and the division line is set as a peak area of each signal peak.

【請求項３】上記指標として、当該ピーク出現時刻の前
後一定時間に検出したピーク、あるいは当該ピークの前
後一定数のピークの面積情報を用いて得られる量によっ
て規格化したピーク面積を用いることを特徴とする請求
項１に記載の核酸塩基配列決定方法。3. As the index, it is preferable to use a peak detected at a certain time before and after the peak appearance time, or a peak area standardized by an amount obtained by using area information of a certain number of peaks before and after the peak. The method of claim 1, wherein the nucleic acid base sequence is determined.

【請求項４】上記ピーク面積の規格化因子として、該ピ
ーク出現時刻の前後一定時間に検出したピーク、あるい
は該ピークの前後一定数のピークの面積のうち最小のも
のを用いることを特徴とする請求項３に記載の核酸塩基
配列決定方法。4. The peak area normalizing factor is the peak detected at a certain time before and after the peak appearance time, or the smallest of the areas of a certain number of peaks before and after the peak is used. The method for determining a nucleic acid base sequence according to claim 3.

【請求項５】上記ピーク面積の規格化因子として、該ピ
ーク出現時刻の前後一定時間に検出したピーク、あるい
は該ピークの前後一定数のピークの面積の平均値を求
め、該平均値の定数倍以上の面積を持つピークを除いて
再度平均した値を用いるとを特徴とする請求項３に記載
の核酸塩基配列決定方法。5. As a normalizing factor for the peak area, the average of the peaks detected at a certain time before and after the appearance of the peak or the area of a certain number of peaks before and after the peak is calculated, and is a constant multiple of the average value. The method for determining a nucleic acid base sequence according to claim 3, wherein a value obtained by averaging again excluding a peak having the above area is used.

【請求項６】上記ピーク面積の規格化因子を求める時間
範囲として、規格化するピークの出現時刻の前後５分以
上の時間を用いることを特徴とする請求項３に記載の核
酸塩基配列決定方法。6. The method for determining a nucleic acid base sequence according to claim 3, wherein a time range of 5 minutes or more before and after the appearance time of the peak to be standardized is used as a time range for obtaining the normalization factor of the peak area. .

【請求項７】上記ピーク面積の規格化因子を求めるため
に用いるピークとして、規格化するピークの前後２個以
上のピークを用いることを特徴とする請求項３に記載の
核酸塩基配列決定方法。7. The method for determining a nucleic acid base sequence according to claim 3, wherein two or more peaks before and after the peak to be standardized are used as the peaks used for obtaining the normalizing factor of the peak area.

【請求項８】請求項５に記載の核酸塩基配列決定法にお
いて、ピーク面積の平均値を求めるため際に使用する定
数として、1.1から２までの値を用いることを特徴とす
る核酸塩基配列決定方法。8. The nucleic acid base sequence determination method according to claim 5, wherein a value from 1.1 to 2 is used as a constant used in obtaining the average value of the peak areas. Method.

【請求項９】上記各ピークに対して各ピークのピーク面
積から得た指標を表示することを特徴とする請求項１に
記載の核酸塩基配列決定方法。9. The method for determining a nucleic acid base sequence according to claim 1, wherein an index obtained from the peak area of each peak is displayed for each peak.

【請求項１０】上記電気泳動パターンとしてＭnバッフ
ァーを使用したシーケンシング反応生成断片からのスペ
クトルを用いたことを特徴とする請求項１に記載の核酸
塩基配列決定法。10. The method for determining a nucleic acid base sequence according to claim 1, wherein a spectrum from a sequencing reaction fragment using an Mn buffer is used as the electrophoresis pattern.