WO2017047580A1

WO2017047580A1 - Peptide assignment method and peptide assignment system

Info

Publication number: WO2017047580A1
Application number: PCT/JP2016/076963
Authority: WO
Inventors: 雅樹村瀬; 田中　耕一
Original assignee: 株式会社島津製作所
Priority date: 2015-09-14
Filing date: 2016-09-13
Publication date: 2017-03-23
Also published as: US20190041393A1; JPWO2017047580A1; JP6489224B2

Abstract

A peptide assignment method wherein, on the basis of full length sequences of endogenous peptides with known peptide sequence, from among endogenous peptides produced in vivo, and precursor proteins of the endogenous peptides, a database creation unit 11 generates target peptide sequences which are peptide sequences containing one or more residues in the partial sequences of the endogenous peptides so as to create a target peptide sequence database 111 containing multiple target peptide sequences. Next, a mass spectrometry unit 12 conducts mass spectrometry of a peptide sample. Then, a peptide assignment unit 14 determines the peptide sequence of an endogenous peptide contained in the peptide sample on the basis of the multiple target peptide sequences created by the database creation unit 11 and a mass spectrum obtained by the mass spectrometry unit 12.

Description

ペプチド帰属方法及びペプチド帰属システムPeptide assignment method and peptide assignment system

　本発明は、生体内で産生（生成）される内在性ペプチドのペプチド配列を決定するペプチド帰属方法及びペプチド帰属システムに関するものである。 The present invention relates to a peptide attribution method and a peptide attribution system for determining the peptide sequence of an endogenous peptide produced (generated) in vivo.

　タンパク質を対象とする一般的なペプチド帰属方法として、データベース検索を用いた方法（例えば下記非特許文献１参照）や、デノボシーケンシング（de novo sequencing）を用いた方法（例えば下記非特許文献２参照）が知られている。 As a general peptide attribution method for proteins, a method using database search (for example, see Non-Patent Document 1 below) or a method using de novo sequencing (for example, see Non-Patent Document 2 below) )It has been known.

　データベース検索を用いた方法には、例えばマトリックスサイエンス社が提供しているマスコット（Ｍａｓｃｏｔ）などのデータベース検索法が用いられる（例えば下記非特許文献１参照）。具体的には、タンパク質データベースに収録されているタンパク質アミノ酸配列から想定される全てのペプチド断片の組み合わせが、インシリコ酵素消化（in silico digestion）により求められる。そして、求められたペプチド断片の分子量がＭＳ^２プリカーサイオン質量と照合され、所定の質量許容誤差の範囲で合致したものについて理論プロダクトイオン質量が算出される。算出された理論プロダクトイオン質量はＭＳ^２測定データと照合され、一致度の高いペプチドが探索される。 As a method using database search, for example, a database search method such as Mascot provided by Matrix Science is used (for example, see Non-Patent Document 1 below). Specifically, combinations of all peptide fragments assumed from protein amino acid sequences recorded in a protein database are obtained by in silico digestion. Then, the molecular weight of the obtained peptide fragment is collated with the MS ² precursor ion mass, and the theoretical product ion mass is calculated for those that match within a predetermined mass tolerance range. The calculated theoretical product ion mass is collated with the MS ² measurement data, and a peptide having a high degree of coincidence is searched.

　一方、デノボシーケンシングでは、データベースを用いずに、測定データからアミノ酸配列が読み取られる。具体的には、ペプチドの末端から何らかの方法によりアミノ酸残基が１残基ずつ取り除かれたペプチド断片群が生成され、それらのペプチド断片群に由来するイオンピークの質量差からアミノ酸配列が読み取られる。代表的な実装としては、ＰＥＡＫＳと呼ばれるソフトウェアが広く知られている（例えば下記非特許文献２参照）。 On the other hand, in de novo sequencing, amino acid sequences are read from measurement data without using a database. Specifically, a peptide fragment group in which amino acid residues are removed one by one from the end of the peptide by some method is generated, and the amino acid sequence is read from the mass difference of ion peaks derived from these peptide fragment groups. As a typical implementation, software called PEAKS is widely known (for example, see Non-Patent Document 2 below).

　タンパク質を対象とした分析では、上記のようなペプチド帰属方法のいずれを用いた場合でも、分析の難易度を下げるために、しばしば消化酵素を用いてタンパク質をペプチドに断片化してから分析が行われる。このように、タンパク質を断片化して、サイズを小さくすることにより、質量分析時のイオン化が促進され、分析感度が向上する。 In the analysis for proteins, in any of the above peptide assignment methods, analysis is often performed after the protein is fragmented into peptides using digestive enzymes in order to reduce the difficulty of the analysis. . Thus, by fragmenting the protein and reducing the size, ionization during mass spectrometry is promoted, and analysis sensitivity is improved.

　この場合、消化酵素の選択は、データ解析の上でも重要となる。例えばタンパク質データベースを用いたデータベース検索では、タンパク質を特定の部位（特定の配列）で切断するような消化酵素を選択することにより、無作為に切断するような方法と比べて探索空間を小さくすることができる。その結果、検索時間や誤同定を実用的な水準に抑えつつ、ペプチド同定数を増やすことができる。また、デノボシーケンシングにおいては、ペプチド内のアミノ酸分布をシーケンシングしやすいプロダクトイオンが生成されるように、消化酵素を選択することができる。例えば、トリプシンを消化酵素として選択した場合には、イオンがＣＩＤ（衝突誘起解離）により開裂される際に、ｙ／ｂ系列イオンが特異的に生成されるため、シーケンシングが容易となる。 In this case, selection of the digestive enzyme is important for data analysis. For example, in a database search using a protein database, a search space can be reduced by selecting a digestive enzyme that cleaves a protein at a specific site (specific sequence) compared to a method that randomly cleaves the protein. Can do. As a result, the number of peptide identifications can be increased while suppressing the search time and misidentification to a practical level. In de novo sequencing, a digestive enzyme can be selected so that a product ion that can easily sequence the amino acid distribution in the peptide is generated. For example, when trypsin is selected as the digestive enzyme, y / b series ions are specifically generated when ions are cleaved by CID (collision-induced dissociation), so that sequencing is facilitated.

　内在性ペプチドは、生体内で産生されるペプチドであり、体内の情報伝達や機能制御に関わる分子として血液等の体液を通して輸送される他、代謝産物として尿中に排出されるものもある。この内在性ペプチドの構造を解析すれば、新薬の開発や病気の診断などに有用な情報を得ることが可能である。しかしながら、上記のような従来のペプチド帰属方法を内在性ペプチドの分析に用いることは困難であった。 Endogenous peptides are peptides produced in vivo, and are transported through body fluids such as blood as molecules involved in information transmission and function control in the body, and some are excreted in urine as metabolites. By analyzing the structure of this endogenous peptide, it is possible to obtain useful information for development of new drugs and diagnosis of diseases. However, it has been difficult to use the conventional peptide assignment method as described above for analysis of endogenous peptides.

　具体的には、内在性ペプチドは、タンパク質が生体内のプロセッシング機構や代謝機構により切断されて産生される。１つのタンパク質からは、多数の異なるペプチド断片が産生され、それらのペプチド断片の中には、部分配列を共有するペプチド断片、及び、部分配列を共有しないペプチド断片が含まれ得る。プロテオーム解析では、消化酵素を用いてタンパク質が特定の部位で切断されるため、一部の検出量（イオン化効率）の高いペプチド断片の配列を決定することによりタンパク質全体を帰属可能である。 Specifically, an endogenous peptide is produced by cleaving a protein by an in vivo processing mechanism or metabolic mechanism. A single protein produces a number of different peptide fragments, which may include peptide fragments that share partial sequences and peptide fragments that do not share partial sequences. In proteome analysis, a protein is cleaved at a specific site using a digestive enzyme. Therefore, the entire protein can be assigned by determining the sequence of a peptide fragment with a high detection amount (ionization efficiency).

　しかし、内在性ペプチドの場合には、生体内で様々な切断機序により産生されるため、同一タンパク質由来のペプチドであっても産生量は異なる。従って、産生量やイオン化効率が低く検出されにくいペプチドに対しては、より高感度な帰属技術が求められる。また、切断機序が既知の内在性ペプチドを除いて、タンパク質からの切断部位は予め特定されていない。したがって、従来法であるタンパク質データベースを用いたデータベース検索により内在性ペプチドの構造解析を行うためには、あらゆる部位で切断されて産生されたペプチド断片を想定して探索を行う必要があり、そのため、探索空間が著しく増大する。著しい探索空間の増大は、単に検索時間の増大をもたらすだけでなく、ペプチドの同定数の低下、すなわち同定感度の低下をもたらすという問題がある。 However, in the case of an endogenous peptide, it is produced by various cleavage mechanisms in a living body, and therefore the production amount is different even for peptides derived from the same protein. Therefore, a more sensitive assignment technique is required for peptides with low production and ionization efficiency that are difficult to detect. Moreover, the cleavage site | part from protein is not specified previously except the endogenous peptide with a known cutting mechanism. Therefore, in order to perform structural analysis of endogenous peptides by database search using a conventional protein database, it is necessary to perform a search assuming peptide fragments produced by cleavage at any site. The search space is significantly increased. There is a problem that a significant increase in search space not only results in an increase in search time but also a decrease in the number of identified peptides, that is, a decrease in identification sensitivity.

　また、内在性ペプチドは、ペプチド内のアミノ酸の空間分布が多様であるため、プロダクトイオンの生成パターンが多様かつ複雑である。そのため、プロテオーム解析のように消化酵素の導入によりペプチド断片のアミノ酸分布が均質的になるようデザインした場合とは異なり、デノボシーケンシングによる解析も困難となる。 In addition, since endogenous peptides have various spatial distributions of amino acids in the peptides, the production pattern of product ions is diverse and complicated. Therefore, unlike the case where the amino acid distribution of the peptide fragment is designed to be uniform by introducing a digestive enzyme as in proteome analysis, analysis by de novo sequencing becomes difficult.

　本発明は、上記実情に鑑みてなされたものであり、より多くの内在性ペプチドのペプチド配列を高感度で決定することができるペプチド帰属方法及びペプチド帰属システムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a peptide assignment method and a peptide assignment system that can determine peptide sequences of more endogenous peptides with high sensitivity.

　本発明に係るペプチド帰属方法は、データベース作成ステップと、質量分析ステップと、ペプチド帰属ステップとを含む。前記データベース作成ステップでは、生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を１残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成する。前記質量分析ステップでは、ペプチド試料に対して質量分析を行う。前記ペプチド帰属ステップでは、前記データベース作成ステップにより作成された複数の標的ペプチド配列と、前記質量分析ステップにより得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定する。 The peptide attribution method according to the present invention includes a database creation step, a mass analysis step, and a peptide attribution step. In the database creation step, a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence among the endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide. Is generated as a target peptide sequence, thereby creating a target peptide sequence database including a plurality of target peptide sequences. In the mass analysis step, mass analysis is performed on the peptide sample. In the peptide assignment step, the peptide sequence of the endogenous peptide contained in the peptide sample is determined based on the plurality of target peptide sequences created in the database creation step and the mass spectrum obtained in the mass analysis step. .

　このような構成によれば、ペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を１残基以上含むペプチド配列のデータベースが生成される。ペプチド配列が既知の内在性ペプチドと部分配列が一部共通しているペプチド配列（標的ペプチド配列）は、従来法では帰属されない質量スペクトル中に未知の内在性ペプチドのペプチド配列として残されている可能性がある。 According to such a configuration, a peptide sequence comprising one or more residues of a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence and the full-length sequence of the precursor protein of the endogenous peptide. A database is generated. Peptide sequences (target peptide sequences) that have a partial sequence in common with endogenous peptides with known peptide sequences (target peptide sequences) may be left as peptide sequences of unknown endogenous peptides in mass spectra that cannot be assigned by conventional methods There is sex.

　したがって、標的ペプチド配列のデータベース（標的ペプチド配列データベース）を生成すれば、探索空間の増大を効果的に防止することができる。そして、当該標的ペプチド配列データベースとペプチド試料の質量分析により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる標的ペプチドのペプチド配列を優先的に探索することによって、より多くの内在性ペプチドのペプチド配列を高感度で決定することができる。 Therefore, if a database of target peptide sequences (target peptide sequence database) is generated, an increase in search space can be effectively prevented. Based on the target peptide sequence database and the mass spectrum obtained by mass analysis of the peptide sample, the peptide sequence of the target peptide contained in the peptide sample is preferentially searched, so that more endogenous peptides can be detected. Peptide sequences can be determined with high sensitivity.

　本発明に係るペプチド帰属システムは、データベース作成部と、質量分析部と、ペプチド帰属部とを備える。前記データベース作成部は、生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を１残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成する。前記質量分析部は、ペプチド試料に対して質量分析を行う。前記ペプチド帰属部は、前記データベース作成部により作成された複数の標的ペプチド配列と、前記質量分析部により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定する。 The peptide attribution system according to the present invention includes a database creation unit, a mass analysis unit, and a peptide attribution unit. The database creation unit is a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence among the endogenous peptides produced in vivo, and the full-length sequence of the precursor protein of the endogenous peptide. Is generated as a target peptide sequence, thereby creating a target peptide sequence database including a plurality of target peptide sequences. The mass spectrometer performs mass analysis on the peptide sample. The peptide assignment unit determines a peptide sequence of an endogenous peptide contained in a peptide sample based on a plurality of target peptide sequences created by the database creation unit and a mass spectrum obtained by the mass analysis unit. .

　本発明によれば、標的ペプチド配列データベースを生成することにより探索空間の増大を効果的に防止することができ、当該標的ペプチド配列データベースとペプチド試料の質量分析により得られた質量スペクトルとに基づいて、より多くの内在性ペプチドのペプチド配列を高感度で決定することができる。 According to the present invention, it is possible to effectively prevent an increase in search space by generating a target peptide sequence database, and based on the target peptide sequence database and a mass spectrum obtained by mass analysis of a peptide sample. The peptide sequence of more endogenous peptides can be determined with high sensitivity.

本発明の第１実施形態に係るペプチド帰属システムの構成例を示したブロック図である。It is the block diagram which showed the structural example of the peptide attribution system which concerns on 1st Embodiment of this invention. データベース作成部により標的ペプチド配列を生成する際の態様について説明するための図である。It is a figure for demonstrating the aspect at the time of producing | generating a target peptide sequence by a database preparation part. データベース作成部による処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process by a database preparation part. 質量分析部及びピークリスト作成部による処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process by a mass spectrometer part and a peak list preparation part. ペプチド帰属部による処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process by a peptide attribution part. 尿試料から得られたＭＳ^２スペクトルについて実際に解析を行った結果を示す図である。It is a diagram showing a result of actual analyzed MS ² spectra obtained from urine samples. 本発明の第２実施形態に係るペプチド帰属システムの構成例を示したブロック図である。It is the block diagram which showed the structural example of the peptide attribution system which concerns on 2nd Embodiment of this invention.

＜第１実施形態＞
１．第１実施形態に係るペプチド帰属システムの構成
　図１は、本発明の第１実施形態に係るペプチド帰属システム１の構成例を示したブロック図である。 <First Embodiment>
1. Configuration of Peptide Attribution System According to First Embodiment FIG. 1 is a block diagram showing a configuration example of a peptide attribution system 1 according to the first embodiment of the present invention.

　このペプチド帰属システム１は、分析対象となる試料（ペプチド試料）の中から、生体内で産生される内在性ペプチドのペプチド配列を決定するためのものであり、データベース作成部１１、質量分析部１２、ピークリスト作成部１３及びペプチド帰属部１４などを備えている。これらの各部１１～１４の少なくとも一部は、ＣＰＵ（Central Processing Unit）を含む情報処理装置により構成されている。 The peptide assignment system 1 is for determining a peptide sequence of an endogenous peptide produced in a living body from a sample to be analyzed (peptide sample), and includes a database creation unit 11 and a mass analysis unit 12. , A peak list creation unit 13 and a peptide attribution unit 14. At least a part of each of these units 11 to 14 is configured by an information processing apparatus including a CPU (Central Processing Unit).

　このペプチド帰属システム１は、内在性ペプチド配列データベース２に格納されている複数のペプチド配列を用いて、ペプチド試料中の内在性ペプチドのペプチド配列を決定する。内在性ペプチド配列データベース２には、ペプチド配列が既知である複数の内在性ペプチドについて、それらの内在性ペプチドのペプチド配列が格納されている。ここで、「ペプチド配列が既知」とは、公開されている内在性ペプチドの配列データベースや文献情報にペプチド配列が収録されている場合や、従来の解析法（手動解析を含む。）により高い信頼度でペプチド配列が帰属されている場合を含む概念である。 The peptide attribution system 1 determines a peptide sequence of an endogenous peptide in a peptide sample using a plurality of peptide sequences stored in the endogenous peptide sequence database 2. The endogenous peptide sequence database 2 stores the peptide sequences of a plurality of endogenous peptides whose peptide sequences are known. Here, “the peptide sequence is known” means that the peptide sequence is recorded in a publicly available sequence database of endogenous peptides or literature information, or is highly reliable by conventional analysis methods (including manual analysis). This is a concept including the case where a peptide sequence is assigned at a degree.

　公開されている内在性ペプチドの配列データベースや文献情報としては、例えば尿に含まれる内在性ペプチドの配列データベースであるＭｏｓａｉｑｕｅｓ　ＤＢ（http://mosaiques-diagnostics.de/diapatpcms/mosaiquescms/front_content.php?idcat=257, Siwy et al., “Human urinary peptide database for multiple disease biomarker discovery”, Proteomics Clin. Appl., 2011, 5, 367-374）や、データベース化されていない文献（Smith, et al., “Deciphering the peptidome of urine from ovarian cancer patients and healthy controls”, Clin. Proteomics, 2014, 11(1):23）などが知られている。 Examples of the published endogenous peptide sequence database and literature information include, for example, the Mosaiques DB (http://mosaiques-diagnostics.de/diapatpcms/mosaiquescms/front_content.php?), Which is a sequence database of endogenous peptides contained in urine. idcat = 257, Siwy et al., “Human urinary peptide database for multiple disease biomarker discovery”, Proteomics Clin. Appl., 2011, 5, 367-374) and non-database documents (Smith, et al., “Deciphering the peptidome of urine from ovarian cancer patients and healthy controls”, Clin. Proteomics, 2014, 11 (1): 23).

　データベース作成部１１は、内在性ペプチド配列データベース２に格納されている複数の内在性ペプチドのペプチド配列と、タンパク質配列データベース３に格納されている当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、これらの内在性ペプチドとは異なるペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベース１１１を作成する（データベース作成ステップ）。タンパク質配列データベース３は、内在性ペプチド配列データベース２には未登録の配列を伸長する際に参照されるタンパク質の全長配列データベースである。このとき生成される標的ペプチド配列は、内在性ペプチド配列データベース２にペプチド配列が格納されている内在性ペプチドの部分配列を１残基以上含むペプチド配列である。すなわち、データベース作成部１１により生成される標的ペプチド配列は、ペプチド配列が既知の内在性ペプチドと部分配列（あるいは全ての配列）が共通している。 The database creation unit 11 is based on the peptide sequences of a plurality of endogenous peptides stored in the endogenous peptide sequence database 2 and the full-length sequences of the precursor proteins of the endogenous peptides stored in the protein sequence database 3. The target peptide sequence database 111 including a plurality of target peptide sequences is created by generating a peptide sequence different from these endogenous peptides as the target peptide sequence (database creation step). The protein sequence database 3 is a full-length sequence database of proteins that are referred to when a sequence not registered in the endogenous peptide sequence database 2 is extended. The target peptide sequence generated at this time is a peptide sequence including one or more residues of a partial sequence of the endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2. That is, the target peptide sequence generated by the database creation unit 11 has a partial sequence (or all sequences) in common with an endogenous peptide whose peptide sequence is known.

　質量分析部１２は、ペプチド試料に対して質量分析を行う（質量分析ステップ）。質量分析部１２による質量分析の方法としては、特に限定されるものではないが、例えばイオントラップ飛行時間型質量分析装置（ＩＴ－ＴＯＦ　ＭＳ）を用いた方法を採用することができる。この方法を用いた場合には、例えばイオン化部、イオントラップ及びＴＯＦ　ＭＳ（いずれも図示せず）を備えたＩＴ－ＴＯＦ　ＭＳを用いてペプチド試料に対する質量分析が行われる。 The mass spectrometer 12 performs mass analysis on the peptide sample (mass analysis step). A method of mass spectrometry by the mass analyzer 12 is not particularly limited, and for example, a method using an ion trap time-of-flight mass spectrometer (IT-TOF MS) can be employed. When this method is used, mass analysis is performed on a peptide sample using, for example, IT-TOF MS equipped with an ionization section, an ion trap, and TOF MS (all not shown).

　具体的には、ペプチド試料はイオン化部においてイオン化され、そのイオンがイオントラップにより捕捉される。イオントラップとしては、例えば三次元四重極型のものを用いることができるが、これに限られるものではない。イオントラップ内には、捕捉したイオンの一部が選択的に残され、そのイオンがＣＩＤ（衝突誘起解離）により開裂される。開裂されたイオンは、イオントラップからＴＯＦ　ＭＳ（飛行時間型質量分析器）に送られる。 Specifically, the peptide sample is ionized in the ionization section, and the ions are captured by an ion trap. As the ion trap, for example, a three-dimensional quadrupole type can be used, but the ion trap is not limited thereto. A part of the trapped ions is selectively left in the ion trap, and the ions are cleaved by CID (collision induced dissociation). The cleaved ions are sent from the ion trap to the TOF MS (time-of-flight mass analyzer).

　ＴＯＦ　ＭＳでは、飛行空間を飛行したイオンが検出器により検出される。具体的には、飛行空間に形成された電場により加速されたイオンが、当該飛行空間を飛行する間にｍ／ｚ（質量電荷比）に応じて時間的に分離され、検出器により順次検出される。これにより、ｍ／ｚと検出器における検出強度との関係が質量スペクトルとして測定され、質量分析が実現される。ただし、ＩＴ－ＴＯＦ　ＭＳに限らず、例えばイオントラップのないタンデム型飛行時間型質量分析装置（Ｔａｎｄｅｍ　ＴＯＦ（ＴＯＦ－ＴＯＦ）　ＭＳ）の他、四重極飛行時間型質量分析装置（Ｑ－ＴＯＦ　ＭＳ）、四重極イオントラップ型質量分析装置（Ｑｑ－ＩＴ　ＭＳ）などのハイブリッド型の質量分析装置を用いて質量分析が行われてもよい。また、イオンの開裂方法についても、ＣＩＤに限られるものではなく、ＥＴＤ（電子移動解離）、ＥＣＤ（電子捕獲解離）などの他の開裂方法が用いられてもよい。 In TOF MS, ions flying in the flight space are detected by a detector. Specifically, ions accelerated by an electric field formed in the flight space are temporally separated according to m / z (mass-to-charge ratio) while flying in the flight space, and sequentially detected by a detector. The Thereby, the relationship between m / z and the detection intensity in the detector is measured as a mass spectrum, and mass spectrometry is realized. However, it is not limited to IT-TOF MS, for example, a tandem time-of-flight mass spectrometer without an ion trap (Tandem TOF (TOF-TOF) MS) and a quadrupole time-of-flight mass spectrometer (Q-TOF MS) ), Mass spectrometry may be performed using a hybrid mass spectrometer such as a quadrupole ion trap mass spectrometer (Qq-IT MS). Also, the ion cleavage method is not limited to CID, and other cleavage methods such as ETD (electron transfer dissociation) and ECD (electron capture dissociation) may be used.

　ＩＴ－ＴＯＦ　ＭＳを用いた質量分析では、イオントラップにおいてイオンを開裂させてＴＯＦ　ＭＳで質量分析を行うという一連の動作を繰り返し行うことにより、ＭＳ^ｎ分析（ｎは２以上の整数）を行い、質量スペクトルとしてのＭＳ^ｎスペクトルを測定することができる。 In mass spectrometry using IT-TOF MS, MS ⁿ analysis (n is an integer of 2 or more) is performed by repeating a series of operations of cleaving ions in an ion trap and performing mass analysis with TOF MS. The MS ⁿ spectrum as a mass spectrum can be measured.

　ピークリスト作成部１３は、質量分析部１２により得られたＭＳ^ｎスペクトルに基づいて、そのＭＳ^ｎスペクトルに含まれるピークを抽出したピークリスト（ＭＳ^ｎピークリスト）を作成する。 Peak list creation section 13, based on the MS ⁿ spectrum obtained by a mass analysis unit 12, to create a peak list obtained by extracting peaks included in the MS ⁿ spectra (MS ⁿ peak list).

　ペプチド帰属部１４は、標的ペプチド配列データベース１１１に格納されている複数の標的ペプチド配列と、ピークリスト作成部１３により作成されたピークリストとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定する（ペプチド帰属ステップ）。ペプチド帰属部１４は、例えばＣＰＵがプログラムを実行することにより、配列推定部１４１及びプロダクトイオン照合部１４２などの機能部を含んでいる。 The peptide attribution unit 14 is based on a plurality of target peptide sequences stored in the target peptide sequence database 111 and the peak list created by the peak list creation unit 13, and the peptide sequences of endogenous peptides contained in the peptide sample Is determined (peptide assignment step). The peptide attribution unit 14 includes functional units such as a sequence estimation unit 141 and a product ion collation unit 142 when the CPU executes a program, for example.

　配列推定部１４１は、例えばＭＳ^１ピークリスト中のＭＳ^２プリカーサイオンについて、ＭＳ^２プリカーサイオン質量と所定の質量許容誤差の範囲で合致するペプチド配列を標的ペプチド配列データベース１１１から探索する。配列推定部１４１により探索されたペプチド配列は、ペプチド試料に含まれる内在性ペプチドのペプチド配列の候補（ペプチド配列候補）となる。 For example, for the MS ² precursor ion in the MS ¹ peak list, the sequence estimation unit 141 searches the target peptide sequence database 111 for a peptide sequence that matches the MS ² precursor ion mass within a predetermined range of mass tolerance. The peptide sequence searched by the sequence estimation unit 141 becomes a peptide sequence candidate (peptide sequence candidate) of the endogenous peptide contained in the peptide sample.

　プロダクトイオン照合部１４２は、配列推定部１４１で得られたペプチド配列候補に対してスコア付けを行う。ペプチド配列候補の数が十分に得られた場合には、各ペプチド配列候補のスコアの分布から統計的に有意なペプチド配列候補を求め、そのペプチド配列をペプチド試料に含まれる内在性ペプチドのペプチド配列に決定することができる。 The product ion verification unit 142 scores the peptide sequence candidates obtained by the sequence estimation unit 141. When a sufficient number of peptide sequence candidates are obtained, a statistically significant peptide sequence candidate is obtained from the distribution of scores of each peptide sequence candidate, and the peptide sequence of the endogenous peptide contained in the peptide sample is obtained. Can be determined.

２．データベース作成部による処理
　図２は、データベース作成部１１により標的ペプチド配列を生成する際の態様について説明するための図である。また、図３は、データベース作成部１１による処理の流れを示したフローチャートである。 2. Processing by Database Creation Unit FIG. 2 is a diagram for explaining an aspect when the database creation unit 11 generates a target peptide sequence. FIG. 3 is a flowchart showing the flow of processing by the database creation unit 11.

　図２の例では、既知の全長配列を有するタンパク質（帰属タンパク質）の中に、ペプチド配列が既知の内在性ペプチドＰが含まれる場合について説明する。すなわち、この例では、内在性ペプチドＰのペプチド配列が内在性ペプチド配列データベース２に格納されているものとする。内在性ペプチド配列データベース２にペプチド配列が格納されている内在性ペプチドＰは、タンパク質に帰属されており、このタンパク質の全長配列と全長配列中の内在性ペプチドＰの配列開始残基及び配列終了残基が与えられていることが好ましい。 In the example of FIG. 2, a case where an endogenous peptide P having a known peptide sequence is included in a protein having a known full-length sequence (assigned protein) will be described. That is, in this example, it is assumed that the peptide sequence of the endogenous peptide P is stored in the endogenous peptide sequence database 2. The endogenous peptide P whose peptide sequence is stored in the endogenous peptide sequence database 2 is assigned to the protein. The full-length sequence of this protein and the sequence start residue and sequence end residue of the endogenous peptide P in the full-length sequence are assigned to the protein. It is preferred that a group is provided.

　この場合、データベース作成部１１は、内在性ペプチド配列データベース２を読み込み（ステップＳ１０１）、読み込んだ各内在性ペプチドＰのペプチド配列に基づいて、標的ペプチドのペプチド配列（標的ペプチド配列）を生成する（ステップＳ１０２）。具体的には、データベース作成部１１は、内在性ペプチドＰのペプチド配列の一部（部分配列）を１残基以上残しながら、ペプチド配列を伸縮させることにより標的ペプチド配列を生成する。このとき、データベース作成部１１は、内在性ペプチドＰが含まれる帰属タンパク質の全長配列を参照しながらペプチド配列を伸縮させる。 In this case, the database creation unit 11 reads the endogenous peptide sequence database 2 (step S101), and generates a peptide sequence (target peptide sequence) of the target peptide based on the read peptide sequence of each endogenous peptide P (target peptide sequence) (step S101). Step S102). Specifically, the database creation unit 11 generates a target peptide sequence by stretching the peptide sequence while leaving one or more residues (partial sequence) of the peptide sequence of the endogenous peptide P. At this time, the database creation unit 11 expands and contracts the peptide sequence while referring to the full-length sequence of the assigned protein containing the endogenous peptide P.

　データベース作成部１１は、生成した標的ペプチド配列を標的ペプチド配列データベース１１１に格納する（ステップＳ１０３）。ステップＳ１０２，Ｓ１０３の処理は、内在性ペプチド配列データベース２に格納されている全ての内在性ペプチドＰについて行われ、全ての内在性ペプチドＰについての処理が終了すると（ステップＳ１０４でＹｅｓ）、標的ペプチド配列の全てのバリエーションが標的ペプチド配列データベース１１１に格納される。 The database creation unit 11 stores the generated target peptide sequence in the target peptide sequence database 111 (step S103). The processing in steps S102 and S103 is performed for all endogenous peptides P stored in the endogenous peptide sequence database 2, and when the processing for all endogenous peptides P is completed (Yes in step S104), the target peptide All sequence variations are stored in the target peptide sequence database 111.

　例えば図２に示すように、内在性ペプチドＰのペプチド配列のＮ末端側を伸長させ、Ｃ末端側を短縮させることにより、標的ペプチドＰ１のペプチド配列を生成したり、Ｃ末端側を伸長させ、Ｎ末端側を短縮させることにより、標的ペプチドＰ２のペプチド配列を生成したりすることができる。また、内在性ペプチドＰのペプチド配列のＮ末端側及びＣ末端側の両方を短縮させることにより、標的ペプチドＰ３のペプチド配列を生成したり、Ｎ末端側及びＣ末端側の両方を伸長させることにより、標的ペプチドＰ４のペプチド配列を生成したりすることもできる。ただし、図２中にＰ５，Ｐ６で示すように、内在性ペプチドＰと部分配列が共通していないペプチド配列は、標的ペプチド配列として生成されない。従って、既知ペプチドの帰属タンパク質の全長配列を非特異的に切断処理しペプチド配列を生成した場合と比較しても、さらに探索空間を小さく抑えることが可能である。なお、帰属タンパク質にアイソフォームが存在し、登録配列は一致するが伸長する配列が異なるという場合には、異なるバリエーションの標的ペプチド配列として生成され、標的ペプチド配列データベース１１１に格納される。 For example, as shown in FIG. 2, by extending the N-terminal side of the peptide sequence of the endogenous peptide P and shortening the C-terminal side, a peptide sequence of the target peptide P1 is generated, or the C-terminal side is extended, By shortening the N-terminal side, the peptide sequence of the target peptide P2 can be generated. In addition, by shortening both the N-terminal side and the C-terminal side of the peptide sequence of the endogenous peptide P, the peptide sequence of the target peptide P3 is generated, or both the N-terminal side and the C-terminal side are extended. A peptide sequence of the target peptide P4 can also be generated. However, as indicated by P5 and P6 in FIG. 2, a peptide sequence that does not share a partial sequence with the endogenous peptide P is not generated as a target peptide sequence. Accordingly, the search space can be further reduced even when compared to the case where the full-length sequence of the protein belonging to the known peptide is cleaved non-specifically to generate a peptide sequence. In the case where an isoform exists in the assigned protein and the registered sequences match but the extending sequences are different, they are generated as target peptide sequences of different variations and stored in the target peptide sequence database 111.

３．質量分析部及びピークリスト作成部による処理
　図４は、質量分析部１２及びピークリスト作成部１３による処理の流れを示したフローチャートである。 3. Processing by Mass Analysis Unit and Peak List Creation Unit FIG. 4 is a flowchart showing the flow of processing by the mass analysis unit 12 and the peak list creation unit 13.

　質量分析部１２は、内在性ペプチドを含むペプチド試料をイオン化し、そのイオンを質量分析することによりＭＳ^１スペクトルを測定する（ステップＳ２０１）。このとき、ピークリスト作成部１３は、測定されたＭＳ^１スペクトルからピークを抽出することにより、ＭＳ^１ピークリストを作成する（ステップＳ２０２）。 The mass spectrometer 12 ionizes a peptide sample containing an endogenous peptide, and measures the MS ¹ spectrum by mass-analyzing the ions (step S201). At this time, the peak list creation unit 13 creates an MS ¹ peak list by extracting peaks from the measured MS ¹ spectrum (step S202).

　その後、質量分析部１２は、作成されたＭＳ^１ピークリストから、ＭＳ^２スペクトルの測定対象となる複数のＭＳ^２プリカーサイオンを所定の方法により選択し（ステップＳ２０３）、各ＭＳ^２プリカーサイオンを開裂させて質量分析を行うことによりＭＳ^２スペクトルを測定する（ステップＳ２０４）。ステップＳ２０４の処理は、全てのＭＳ^２プリカーサイオンについて行われ、全てのＭＳ^２プリカーサイオンについての処理が終了すると（ステップＳ２０５でＹｅｓ）、ピークリスト作成部１３が、測定されたＭＳ^２スペクトルからピークを抽出することにより、ＭＳ^２ピークリストを作成する（ステップＳ２０６）。 Thereafter, the mass spectrometer 12 selects a plurality of MS ² precursor ions to be measured for the MS ² spectrum from the created MS ¹ peak list by a predetermined method (step S203), and cleaves each MS ² precursor ion. The MS ² spectrum is measured by performing mass spectrometry (step S204). The process of step S204 is performed for all the MS ² precursor ions, the processing for all the MS ² precursor ions are finished (Yes in step S205), the peak list generating unit 13, a peak from the measured MS ² spectra Is extracted to create an MS ² peak list (step S206).

４．ペプチド帰属部による処理
　図５は、ペプチド帰属部１４による処理の流れを示したフローチャートである。 4). Processing by Peptide Attribution Unit FIG. 5 is a flowchart showing the flow of processing by the peptide attribution unit 14.

　配列推定部１４１は、ＭＳ^１ピークリスト中の各ＭＳ^２プリカーサイオンについて、ＭＳ^２プリカーサイオン質量と所定の質量許容誤差の範囲で合致するペプチド配列を標的ペプチド配列データベース１１１から探索する（ステップＳ３０１）。その結果、該当するペプチド配列（ペプチド配列候補）が１つ以上得られた場合には（ステップＳ３０２でＹｅｓ）、そのペプチド配列候補に対してスコア付けを行う（ステップＳ３０３）。 For each MS ² precursor ion in the MS ¹ peak list, the sequence estimation unit 141 searches the target peptide sequence database 111 for a peptide sequence that matches the MS ² precursor ion mass within a predetermined range of mass tolerance (step S301). . As a result, when one or more corresponding peptide sequences (peptide sequence candidates) are obtained (Yes in step S302), the peptide sequence candidates are scored (step S303).

　ペプチド配列候補に対してスコア付けを行う際には、例えばペプチド配列候補の主要プロダクトイオン（例えばｙ／ｂ系列イオン）の理論プロダクトイオン質量が算出され、ＭＳ^２ピークリスト中の各プロダクトイオンについて、理論プロダクトイオン質量と所定の質量許容誤差の範囲で合致するペプチド配列候補が探索される。主要プロダクトイオンとは、切断されやすい部位が予め分かっているプロダクトイオンを意味しており、切断されやすい部位が予め分かっている結果、理論的なプロダクトイオン質量（理論プロダクトイオン質量）を算出することができる。 When scoring a peptide sequence candidate, for example, the theoretical product ion mass of the main product ions (eg, y / b series ions) of the peptide sequence candidate is calculated, and for each product ion in the MS ² peak list, Peptide sequence candidates that match the theoretical product ion mass within a predetermined mass tolerance range are searched. The main product ion means a product ion whose site that is easily cleaved is known in advance, and the theoretical product ion mass (theoretical product ion mass) is calculated as a result of the site that is easy to cleave. Can do.

　探索された各ペプチド配列候補については、合致したピーク強度やピーク数などを用いてスコア付けが行われる。スコア算出法としては、タンパク質データベースを用いたデータベース検索で使用されている様々なスコア算出法を採用することができる。 For each searched peptide sequence candidate, scoring is performed using the matched peak intensity, number of peaks, and the like. As the score calculation method, various score calculation methods used in database search using a protein database can be employed.

　ステップＳ３０１～Ｓ３０３の処理は、全てのＭＳ^２プリカーサイオンについて行われ、全てのＭＳ^２プリカーサイオンについての処理が終了すると（ステップＳ３０４でＹｅｓ）、各ペプチド配列候補のスコアに基づいてペプチド配列候補の絞り込みが行われる（ステップＳ３０５）。このとき、スコアの有意差に基づいてペプチド配列候補が一意的に絞り込まれ、そのペプチド配列が解析結果として出力される（ステップＳ３０６）。なお、ペプチド配列候補の数が少ないなどの理由で統計的な指標を算出できない場合には、例えばスコアに基づく順位付けまでの処理が行われ、その後の一意的な絞り込みはユーザに委ねてもよい。 Processing of steps S301 ~ S303 are performed for all the MS ² precursor ions, the processing for all the MS ² precursor ion terminated (in step S304 Yes), the peptide sequence candidates based on the scores of each peptide sequence candidates Narrowing is performed (step S305). At this time, peptide sequence candidates are uniquely narrowed down based on a significant difference in scores, and the peptide sequence is output as an analysis result (step S306). In addition, when a statistical index cannot be calculated because the number of peptide sequence candidates is small, for example, processing up to ranking based on the score is performed, and the subsequent unique narrowing may be left to the user. .

５．作用効果
　本実施形態では、ペプチド配列が既知の内在性ペプチドに基づいて、当該内在性ペプチドの部分配列を１残基以上含むペプチド配列のデータベース（標的ペプチド配列データベース１１１）が生成される。ペプチド配列が既知の内在性ペプチドと部分配列が一部共通しているペプチド配列（標的ペプチド配列）は、従来法では帰属されない質量スペクトル中に未知の内在性ペプチドのペプチド配列として残されている可能性がある。 5. Effects In this embodiment, a peptide sequence database (target peptide sequence database 111) including one or more residues of a partial sequence of the endogenous peptide is generated based on the endogenous peptide whose peptide sequence is known. Peptide sequences (target peptide sequences) that have a partial sequence in common with endogenous peptides with known peptide sequences (target peptide sequences) may be left as peptide sequences of unknown endogenous peptides in mass spectra that cannot be assigned by conventional methods There is sex.

　したがって、標的ペプチド配列のデータベース（標的ペプチド配列データベース１１１）を生成すれば、探索空間の増大を効果的に防止することができる。そして、当該標的ペプチド配列データベース１１１と質量分析部１２でのペプチド試料の質量分析により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる標的ペプチドのペプチド配列を優先的に探索することによって、より多くの内在性ペプチドのペプチド配列を高感度で決定することができる。 Therefore, if a database of target peptide sequences (target peptide sequence database 111) is generated, an increase in search space can be effectively prevented. And based on the target peptide sequence database 111 and the mass spectrum obtained by mass analysis of the peptide sample in the mass analyzer 12, by preferentially searching for the peptide sequence of the target peptide contained in the peptide sample, The peptide sequence of more endogenous peptides can be determined with high sensitivity.

６．実施例
　上述のＭｏｓａｉｑｕｅＤＢに収録されている内在性ペプチド及び測定データから帰属した内在性ペプチドのペプチド配列９４４個を基に、７～８０残基長からなる標的ペプチド配列のバリエーションを９４４，３９０個生成し、標的ペプチドデータベースを作成した。 6). Example Generate 944,390 target peptide sequence variations consisting of 7 to 80 residues in length based on 944 peptide sequences of endogenous peptides attributed to the endogenous peptides and measurement data recorded in the above-mentioned MosaiqueDB. A target peptide database was created.

　尿試料から質量分析部により測定された約３８ピーク（プリカーサイオン：ｍ／ｚ＝７９３～２９４３、計７０スペクトル）のＭＳ^２測定データ及びＭＳ^３測定データに対して、配列推定部でペプチド配列を推定した。その結果、３５ピーク（プリカーサイオン質量が質量許容誤差の範囲で重複するもの、かつ、ペプチド配列が異なるものを除く）について、標的ペプチド配列データベースに格納されている標的ペプチド配列と質量許容誤差の範囲で合致するペプチド配列候補が得られ、各ピークから平均して約５０個（計１８００個余り）のペプチド配列候補が得られた。 For the MS ² measurement data and the MS ³ measurement data of about 38 peaks (precursor ions: m / z = 793 to 2943, 70 spectra in total) measured from the urine sample by the mass spectrometry unit, the peptide sequence is determined by the sequence estimation unit. Estimated. As a result, with respect to 35 peaks (excluding those in which the precursor ion mass overlaps within the range of mass tolerance and the peptide sequences are different), the target peptide sequence stored in the target peptide sequence database and the range of mass tolerance Peptide sequence candidates that match each other were obtained, and about 50 peptide sequence candidates on average from each peak (a total of about 1800) were obtained.

　上記配列推定部で推定されたペプチド配列候補について、ｙ／ｂ系列イオンの理論質量を算出し、解析対象ピークから得られた計７０のＭＳ^ｎスペクトル（ｎ＝２又は３）に対してプロダクトイオンの照合を行った。そして、公知の検索エンジンであるＸ！Ｔａｎｄｅｍに類似するスコア算出法により、以下の通りスコア付けを行った。ただし、スコア付けの方法は、本実施例に限定されるものではなく、従来法としてのデータベース検索法で採用されている様々な手法を採用してもよい。 For the peptide sequence candidates estimated by the sequence estimation unit, the theoretical mass of y / b series ions is calculated, and product ions are calculated for a total of 70 MS ⁿ spectra (n = 2 or 3) obtained from the analysis target peaks. Was verified. And X, a well-known search engine! Scoring was performed as follows by a score calculation method similar to Tandem. However, the scoring method is not limited to the present embodiment, and various methods adopted in the database search method as a conventional method may be adopted.

　スコア付けは、下記式（１）及び式（２）を用いて行った。

The scoring was performed using the following formula (1) and formula (2).

　ここで、Ｓｃｏｒｅが実際にペプチド配列候補と測定データから算出されたスコアである。Ｉ_ｉは照合の結果合致したピークの強度、Ｎは合致したピークの総数、ＴＩＣは探索対象のＭＳ^２スペクトルのトータルイオンクロマトグラム、ｎ_ｂ及びｎ_ｙは、それぞれプロダクトイオン照合の結果合致したｂ系列イオン及びｙ系列イオンの個数であり、ここではＮ＝ｎ_ｂ＋ｎ_ｙである。ペプチド配列候補のスコア分布をもとに、配列候補から統計的に有意な配列を選びだすための指標および閾値を設けることが可能である。例えばスコア分布から算出される有意確率（ｐ－ｖａｌｕｅ）や期待値（Ｅ－ｖａｌｕｅ）を指標として判別用の閾値を設定可能である。ただし、有意差の有無を判別するための指標は、上記のような指標に限定されるものではなく、本実施例においてはＥ－ｖａｌｕｅによる判別を、1位候補と下位候補とのスコア差を閾値とした判別法で代替（再現)することも可能であった。 Here, Score is a score actually calculated from peptide sequence candidates and measurement data. I _i is the result matched the intensity of the peak of the collation, the total number of peaks N is matched, TIC is the total ion chromatogram of the MS ² spectra to be searched, n _b and n _y are, b which matches the result of the product ions collation respectively is the number of sequence ions and y-series ions, here _{n =} _n b + n _y. Based on the score distribution of peptide sequence candidates, it is possible to provide an index and a threshold value for selecting a statistically significant sequence from the sequence candidates. For example, a threshold value for discrimination can be set using as an index a significance probability (p-value) or an expected value (E-value) calculated from the score distribution. However, the index for determining the presence / absence of a significant difference is not limited to the index as described above, and in this embodiment, the determination by E-value is performed by calculating the score difference between the first candidate and the lower candidate. It was possible to substitute (reproduce) with a discrimination method using a threshold.

　図６は、尿試料から得られたＭＳ^２スペクトルについて実際に解析を行った結果を示す図である。「ＵｎｉＰｒｏｔ　Ａｃｃｅｓｓｉｏｎ」は、タンパク質データベースであるＵｎｉＰｒｏｔのタンパク質ＩＤである。「ＵｎｉＰｒｏｔ　Ｎａｍｅ」は、ＵｎｉＰｒｏｔの登録タンパク質の名称である。「Ｓｔａｒｔ」及び「Ｅｎｄ」は、ＵｎｉＰｒｏｔの登録配列中におけるペプチドの開始残基及び終了残基の位置を示している。「Ｓｅｑｕｅｎｃｅ」は、帰属された尿中ペプチドのアミノ酸配列である。「Ｐｒｅｃｕｒｓｏｒ　Ｉｏｎ　Ｍａｓｓ」は、質量分析で観測された一価のペプチドイオンの質量電荷比である。 FIG. 6 is a diagram showing the results of actual analysis of the MS ² spectrum obtained from the urine sample. “UniProt Access” is the protein ID of UniProt, which is a protein database. “UniProt Name” is the name of a registered protein of UniProt. “Start” and “End” indicate the positions of the starting residue and ending residue of the peptide in the registered sequence of UniProt. “Sequence” is the amino acid sequence of the assigned urinary peptide. “Precursor Ion Mass” is the mass-to-charge ratio of monovalent peptide ions observed by mass spectrometry.

　評価のために尿試料から得られた１６個の高品質のＭＳ^２スペクトルについて、従来法としてのタンパク質データベース検索法であるＭａｓｃｏｔやＸ！Ｔａｎｄｅｍを用いた場合には、同定閾値をそれぞれプロテオーム解析で使われている閾値に比べて大幅に緩和した値である１．０、０．１とし、偽陽性ヒットを許容したにもかかわらず、５個のペプチド配列（図６中のＡ）が同定されるにとどまった。 For 16 high-quality MS ² spectra obtained from urine samples for evaluation, Mascot and X! Which are protein database search methods as conventional methods. When Tandem was used, the identification threshold values were 1.0 and 0.1, which were greatly relaxed compared to the threshold values used in proteome analysis, respectively, and false positive hits were allowed, Only five peptide sequences (A in FIG. 6) were identified.

　これに対して、本発明による解析の結果、上記５個のペプチド配列（図６中のＡ）だけでなく、残りの１１個のペプチド配列（図６中のＢ）も含む１６個の高品質のＭＳ^２スペクトル全てからペプチド配列候補が推定された。そして、プロダクトイオン照合部によるスコア付けの結果、いずれのスペクトルについても１位候補と２位候補以下とのスコア差が１０以上あったことから、１位候補が有意な推定結果であると判別された。また、目視による検証の結果、いずれの推定結果も妥当な結果であることが認められた。 On the other hand, as a result of the analysis according to the present invention, not only the above 5 peptide sequences (A in FIG. 6) but also the remaining 11 peptide sequences (B in FIG. 6) 16 high quality Peptide sequence candidates were estimated from all of the MS ² spectra. As a result of scoring by the product ion matching unit, the score difference between the first candidate and the second candidate is 10 or more for any spectrum, so that the first candidate is determined to be a significant estimation result. It was. In addition, as a result of visual inspection, it was confirmed that all of the estimation results were valid results.

＜第２実施形態＞
　図７は、本発明の第２実施形態に係るペプチド帰属システム１００の構成例を示したブロック図である。 Second Embodiment
FIG. 7 is a block diagram showing a configuration example of the peptide attribution system 100 according to the second embodiment of the present invention.

　第１実施形態では、プロダクトイオン照合部１４２が主要プロダクトイオンの照合を行う際に、ペプチド配列候補から理論プロダクトイオン質量を算出するような構成について説明した。これに対して、第２実施形態では、プロダクトイオン照合部１４２が、ペプチド配列候補の作成元となった内在性ペプチド配列データベース２にペプチド配列が格納されている内在性ペプチドのＭＳ^ｎ測定データ（照合先データ）を用いて、解析対象となるＭＳ^ｎ測定データ（照合元データ）との類似度を算出する。その他の構成については、第１実施形態と同様であるため、図に同一符号を付して説明を省略する。 In 1st Embodiment, when the product ion collation part 142 collated main product ion, the structure which calculates theoretical product ion mass from a peptide sequence candidate was demonstrated. On the other hand, in the second embodiment, the product ion matching unit 142 performs MS ⁿ measurement data of endogenous peptides whose peptide sequences are stored in the endogenous peptide sequence database 2 from which the peptide sequence candidates are created ( Using the collation target data), the similarity with the MS ⁿ measurement data (collation source data) to be analyzed is calculated. Since other configurations are the same as those in the first embodiment, the same reference numerals are given to the drawings and description thereof is omitted.

　ペプチド帰属システム１００には、内在性ペプチドスペクトルライブラリ２１が含まれる。内在性ペプチドスペクトルライブラリ２１には、内在性ペプチド配列データベース２にペプチド配列が格納されている各内在性ペプチドについて、質量分析を行うことにより得られたＭＳ^ｎスペクトルが記憶されている。プロダクトイオン照合部１４２は、内在性ペプチドスペクトルライブラリ２１に記憶されているＭＳ^ｎスペクトルを用いて、質量分析部１２で測定されたＭＳ^ｎスペクトルとの類似度を算出する。照合先データのプリカーサイオン質量から照合元データのプリカーサイオン質量を差し引いたΔｍが質量許容誤差よりも大きい場合には、照合元データのプロダクトイオン質量からΔｍを差し引いたピークとの照合結果も類似度の算出に用いられる。また、Δｍが照合元のペプチド配列のいずれかの末端の配列（アミノ酸１個以上)の質量Δｎよりも大きい場合には、照合元のプロダクトイオン質量からΔｎを差し引いたピークとの照合結果を類似度の算出に用いてよい。 The peptide assignment system 100 includes an endogenous peptide spectrum library 21. The endogenous peptide spectrum library 21 stores an MS ⁿ spectrum obtained by performing mass spectrometry for each endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2. The product ion matching unit 142 calculates the similarity with the MS ⁿ spectrum measured by the mass analysis unit 12 using the MS ⁿ spectrum stored in the endogenous peptide spectrum library 21. If Δm obtained by subtracting the precursor ion mass of the verification source data from the precursor ion mass of the verification destination data is larger than the mass tolerance, the verification result with the peak obtained by subtracting Δm from the product ion mass of the verification source data is also similar. Used to calculate In addition, when Δm is larger than the mass Δn of the sequence (one or more amino acids) at either end of the peptide sequence of the matching source, the matching result with the peak obtained by subtracting Δn from the product ion mass of the matching source is similar. It may be used to calculate the degree.

　類似度の算出には、既知のスペクトルライブラリ検索法で利用されている様々な方法を利用することができる（例えば、Stein, S. E. & Scott, D. R.: Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification. JASMS, 5, 859-866 (1994)）。この場合、例えば照合元データと照合先データとを照合し、質量許容誤差の範囲で合致したイオンピークについてピーク強度の積を正規化したものを類似度として用いてもよい。 Various methods used in known spectral library search methods can be used to calculate the similarity (for example, Stein, S. E. & Scott, D. R .: Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification. JASMS, 5, 859-866 (1994)). In this case, for example, collation source data and collation destination data may be collated, and a product obtained by normalizing a product of peak intensities for ion peaks that match within the range of mass tolerance may be used as the similarity.

　内在性ペプチドの場合、予想できないような部位で切断されるような場合があるため、第１実施形態のように理論プロダクトイオン質量を算出するような構成の場合、理論通りにスコア付けを行うことができないおそれがある。これに対して、第２実施形態では、内在性ペプチド配列データベース２にペプチド配列が格納されている内在性ペプチドの実際のＭＳ^ｎ測定データを用いるため、より高感度でペプチド配列を決定することができる場合がある。 In the case of an endogenous peptide, it may be cleaved at a site that cannot be predicted. Therefore, in the case of the configuration for calculating the theoretical product ion mass as in the first embodiment, scoring is performed as theoretically. You may not be able to. On the other hand, in the second embodiment, since the actual MS ⁿ measurement data of the endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2 is used, the peptide sequence can be determined with higher sensitivity. There are cases where it is possible.

　　　　１　　ペプチド帰属システム
　　　　２　　内在性ペプチド配列データベース
　　　１１　　データベース作成部
　　　１２　　質量分析部
　　　１３　　ピークリスト作成部
　　　１４　　ペプチド帰属部
　　　２１　　内在性ペプチドスペクトルライブラリ
　　１００　　ペプチド帰属システム
　　１１１　　標的ペプチド配列データベース
　　１４１　　配列推定部
　　１４２　　プロダクトイオン照合部 DESCRIPTION OF SYMBOLS 1 Peptide attribution system 2 Endogenous peptide sequence database 11 Database creation part 12 Mass spectrometry part 13 Peak list creation part 14 Peptide attribution part 21 Endogenous peptide spectrum library 100 Peptide attribution system 111 Target peptide sequence database 141 Sequence estimation part 142 Product ion collation Part

Claims

　生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を１残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成するデータベース作成ステップと、
　ペプチド試料に対して質量分析を行う質量分析ステップと、
　前記データベース作成ステップにより作成された複数の標的ペプチド配列と、前記質量分析ステップにより得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定するペプチド帰属ステップとを含むことを特徴とするペプチド帰属方法。 Based on the endogenous peptide having a known peptide sequence among endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide, it contains one or more partial sequences of the endogenous peptide. Creating a target peptide sequence database including a plurality of target peptide sequences by generating a peptide sequence as a target peptide sequence; and
A mass spectrometry step for performing mass spectrometry on a peptide sample;
A peptide attribution step of determining a peptide sequence of an endogenous peptide contained in a peptide sample based on a plurality of target peptide sequences created by the database creation step and a mass spectrum obtained by the mass analysis step A peptide attribution method characterized by the above.
　生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を１残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成するデータベース作成部と、
　ペプチド試料に対して質量分析を行う質量分析部と、
　前記データベース作成部により作成された複数の標的ペプチド配列と、前記質量分析部により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定するペプチド帰属部とを備えたことを特徴とするペプチド帰属システム。 Based on the endogenous peptide having a known peptide sequence among endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide, it contains one or more partial sequences of the endogenous peptide. A database creation unit for creating a target peptide sequence database including a plurality of target peptide sequences by generating a peptide sequence as a target peptide sequence;
A mass spectrometer for performing mass spectrometry on a peptide sample;
A plurality of target peptide sequences created by the database creation unit, and a peptide attribution unit for determining the peptide sequence of the endogenous peptide contained in the peptide sample based on the mass spectrum obtained by the mass analysis unit. A peptide attribution system characterized by that.