WO2017047580A1 - Peptide assignment method and peptide assignment system - Google Patents

Peptide assignment method and peptide assignment system Download PDF

Info

Publication number
WO2017047580A1
WO2017047580A1 PCT/JP2016/076963 JP2016076963W WO2017047580A1 WO 2017047580 A1 WO2017047580 A1 WO 2017047580A1 JP 2016076963 W JP2016076963 W JP 2016076963W WO 2017047580 A1 WO2017047580 A1 WO 2017047580A1
Authority
WO
WIPO (PCT)
Prior art keywords
peptide
sequence
endogenous
peptide sequence
database
Prior art date
Application number
PCT/JP2016/076963
Other languages
French (fr)
Japanese (ja)
Inventor
雅樹 村瀬
田中 耕一
Original Assignee
株式会社島津製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社島津製作所 filed Critical 株式会社島津製作所
Priority to US15/759,659 priority Critical patent/US20190041393A1/en
Priority to JP2017539911A priority patent/JP6489224B2/en
Publication of WO2017047580A1 publication Critical patent/WO2017047580A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR

Definitions

  • the present invention relates to a peptide attribution method and a peptide attribution system for determining the peptide sequence of an endogenous peptide produced (generated) in vivo.
  • Non-Patent Document 1 a method using database search (for example, see Non-Patent Document 1 below) or a method using de novo sequencing (for example, see Non-Patent Document 2 below) )It has been known.
  • a database search method such as Mascot provided by Matrix Science is used (for example, see Non-Patent Document 1 below).
  • a database search method such as Mascot provided by Matrix Science is used (for example, see Non-Patent Document 1 below).
  • combinations of all peptide fragments assumed from protein amino acid sequences recorded in a protein database are obtained by in silico digestion.
  • the molecular weight of the obtained peptide fragment is collated with the MS 2 precursor ion mass, and the theoretical product ion mass is calculated for those that match within a predetermined mass tolerance range.
  • the calculated theoretical product ion mass is collated with the MS 2 measurement data, and a peptide having a high degree of coincidence is searched.
  • amino acid sequences are read from measurement data without using a database. Specifically, a peptide fragment group in which amino acid residues are removed one by one from the end of the peptide by some method is generated, and the amino acid sequence is read from the mass difference of ion peaks derived from these peptide fragment groups.
  • software called PEAKS is widely known (for example, see Non-Patent Document 2 below).
  • analysis is often performed after the protein is fragmented into peptides using digestive enzymes in order to reduce the difficulty of the analysis. .
  • fragmenting the protein and reducing the size ionization during mass spectrometry is promoted, and analysis sensitivity is improved.
  • a search space can be reduced by selecting a digestive enzyme that cleaves a protein at a specific site (specific sequence) compared to a method that randomly cleaves the protein. Can do.
  • a digestive enzyme can be selected so that a product ion that can easily sequence the amino acid distribution in the peptide is generated. For example, when trypsin is selected as the digestive enzyme, y / b series ions are specifically generated when ions are cleaved by CID (collision-induced dissociation), so that sequencing is facilitated.
  • CID collision-induced dissociation
  • Endogenous peptides are peptides produced in vivo, and are transported through body fluids such as blood as molecules involved in information transmission and function control in the body, and some are excreted in urine as metabolites. By analyzing the structure of this endogenous peptide, it is possible to obtain useful information for development of new drugs and diagnosis of diseases. However, it has been difficult to use the conventional peptide assignment method as described above for analysis of endogenous peptides.
  • an endogenous peptide is produced by cleaving a protein by an in vivo processing mechanism or metabolic mechanism.
  • a single protein produces a number of different peptide fragments, which may include peptide fragments that share partial sequences and peptide fragments that do not share partial sequences.
  • proteome analysis a protein is cleaved at a specific site using a digestive enzyme. Therefore, the entire protein can be assigned by determining the sequence of a peptide fragment with a high detection amount (ionization efficiency).
  • the present invention has been made in view of the above circumstances, and an object thereof is to provide a peptide assignment method and a peptide assignment system that can determine peptide sequences of more endogenous peptides with high sensitivity.
  • the peptide attribution method includes a database creation step, a mass analysis step, and a peptide attribution step.
  • the database creation step a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence among the endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide. Is generated as a target peptide sequence, thereby creating a target peptide sequence database including a plurality of target peptide sequences.
  • mass analysis step mass analysis is performed on the peptide sample.
  • the peptide assignment step the peptide sequence of the endogenous peptide contained in the peptide sample is determined based on the plurality of target peptide sequences created in the database creation step and the mass spectrum obtained in the mass analysis step. .
  • a peptide sequence comprising one or more residues of a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence and the full-length sequence of the precursor protein of the endogenous peptide.
  • a database is generated.
  • Peptide sequences (target peptide sequences) that have a partial sequence in common with endogenous peptides with known peptide sequences (target peptide sequences) may be left as peptide sequences of unknown endogenous peptides in mass spectra that cannot be assigned by conventional methods There is sex.
  • target peptide sequence database target peptide sequence database
  • an increase in search space can be effectively prevented.
  • the peptide sequence of the target peptide contained in the peptide sample is preferentially searched, so that more endogenous peptides can be detected.
  • Peptide sequences can be determined with high sensitivity.
  • the peptide attribution system includes a database creation unit, a mass analysis unit, and a peptide attribution unit.
  • the database creation unit is a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence among the endogenous peptides produced in vivo, and the full-length sequence of the precursor protein of the endogenous peptide. Is generated as a target peptide sequence, thereby creating a target peptide sequence database including a plurality of target peptide sequences.
  • the mass spectrometer performs mass analysis on the peptide sample.
  • the peptide assignment unit determines a peptide sequence of an endogenous peptide contained in a peptide sample based on a plurality of target peptide sequences created by the database creation unit and a mass spectrum obtained by the mass analysis unit. .
  • the present invention it is possible to effectively prevent an increase in search space by generating a target peptide sequence database, and based on the target peptide sequence database and a mass spectrum obtained by mass analysis of a peptide sample.
  • the peptide sequence of more endogenous peptides can be determined with high sensitivity.
  • FIG. 1 is a block diagram showing a configuration example of a peptide attribution system 1 according to the first embodiment of the present invention.
  • the peptide assignment system 1 is for determining a peptide sequence of an endogenous peptide produced in a living body from a sample to be analyzed (peptide sample), and includes a database creation unit 11 and a mass analysis unit 12. , A peak list creation unit 13 and a peptide attribution unit 14. At least a part of each of these units 11 to 14 is configured by an information processing apparatus including a CPU (Central Processing Unit).
  • CPU Central Processing Unit
  • the peptide attribution system 1 determines a peptide sequence of an endogenous peptide in a peptide sample using a plurality of peptide sequences stored in the endogenous peptide sequence database 2.
  • the endogenous peptide sequence database 2 stores the peptide sequences of a plurality of endogenous peptides whose peptide sequences are known.
  • “the peptide sequence is known” means that the peptide sequence is recorded in a publicly available sequence database of endogenous peptides or literature information, or is highly reliable by conventional analysis methods (including manual analysis). This is a concept including the case where a peptide sequence is assigned at a degree.
  • Examples of the published endogenous peptide sequence database and literature information include, for example, the Mosaiques DB (http://mosaiques-diagnostics.de/diapatpcms/mosaiquescms/front_content.php?), which is a sequence database of endogenous peptides contained in urine.
  • idcat 257, Siwy et al., “Human urinary peptide database for multiple disease biomarker discovery”, Proteomics Clin. Appl., 2011, 5, 367-374) and non-database documents (Smith, et al., “Deciphering the peptidome of urine from ovarian cancer patients and healthy controls”, Clin. Proteomics, 2014, 11 (1): 23).
  • the database creation unit 11 is based on the peptide sequences of a plurality of endogenous peptides stored in the endogenous peptide sequence database 2 and the full-length sequences of the precursor proteins of the endogenous peptides stored in the protein sequence database 3.
  • the target peptide sequence database 111 including a plurality of target peptide sequences is created by generating a peptide sequence different from these endogenous peptides as the target peptide sequence (database creation step).
  • the protein sequence database 3 is a full-length sequence database of proteins that are referred to when a sequence not registered in the endogenous peptide sequence database 2 is extended.
  • the target peptide sequence generated at this time is a peptide sequence including one or more residues of a partial sequence of the endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2. That is, the target peptide sequence generated by the database creation unit 11 has a partial sequence (or all sequences) in common with an endogenous peptide whose peptide sequence is known.
  • the mass spectrometer 12 performs mass analysis on the peptide sample (mass analysis step).
  • a method of mass spectrometry by the mass analyzer 12 is not particularly limited, and for example, a method using an ion trap time-of-flight mass spectrometer (IT-TOF MS) can be employed. When this method is used, mass analysis is performed on a peptide sample using, for example, IT-TOF MS equipped with an ionization section, an ion trap, and TOF MS (all not shown).
  • the peptide sample is ionized in the ionization section, and the ions are captured by an ion trap.
  • the ion trap for example, a three-dimensional quadrupole type can be used, but the ion trap is not limited thereto.
  • a part of the trapped ions is selectively left in the ion trap, and the ions are cleaved by CID (collision induced dissociation).
  • the cleaved ions are sent from the ion trap to the TOF MS (time-of-flight mass analyzer).
  • ions flying in the flight space are detected by a detector. Specifically, ions accelerated by an electric field formed in the flight space are temporally separated according to m / z (mass-to-charge ratio) while flying in the flight space, and sequentially detected by a detector. The Thereby, the relationship between m / z and the detection intensity in the detector is measured as a mass spectrum, and mass spectrometry is realized.
  • m / z mass-to-charge ratio
  • IT-TOF MS for example, a tandem time-of-flight mass spectrometer without an ion trap (Tandem TOF (TOF-TOF) MS) and a quadrupole time-of-flight mass spectrometer (Q-TOF MS)
  • Mass spectrometry may be performed using a hybrid mass spectrometer such as a quadrupole ion trap mass spectrometer (Qq-IT MS).
  • the ion cleavage method is not limited to CID, and other cleavage methods such as ETD (electron transfer dissociation) and ECD (electron capture dissociation) may be used.
  • MS n analysis (n is an integer of 2 or more) is performed by repeating a series of operations of cleaving ions in an ion trap and performing mass analysis with TOF MS.
  • the MS n spectrum as a mass spectrum can be measured.
  • Peak list creation section 13 based on the MS n spectrum obtained by a mass analysis unit 12, to create a peak list obtained by extracting peaks included in the MS n spectra (MS n peak list).
  • the peptide attribution unit 14 is based on a plurality of target peptide sequences stored in the target peptide sequence database 111 and the peak list created by the peak list creation unit 13, and the peptide sequences of endogenous peptides contained in the peptide sample Is determined (peptide assignment step).
  • the peptide attribution unit 14 includes functional units such as a sequence estimation unit 141 and a product ion collation unit 142 when the CPU executes a program, for example.
  • the sequence estimation unit 141 searches the target peptide sequence database 111 for a peptide sequence that matches the MS 2 precursor ion mass within a predetermined range of mass tolerance.
  • the peptide sequence searched by the sequence estimation unit 141 becomes a peptide sequence candidate (peptide sequence candidate) of the endogenous peptide contained in the peptide sample.
  • the product ion verification unit 142 scores the peptide sequence candidates obtained by the sequence estimation unit 141. When a sufficient number of peptide sequence candidates are obtained, a statistically significant peptide sequence candidate is obtained from the distribution of scores of each peptide sequence candidate, and the peptide sequence of the endogenous peptide contained in the peptide sample is obtained. Can be determined.
  • FIG. 2 is a diagram for explaining an aspect when the database creation unit 11 generates a target peptide sequence.
  • FIG. 3 is a flowchart showing the flow of processing by the database creation unit 11.
  • an endogenous peptide P having a known peptide sequence is included in a protein having a known full-length sequence (assigned protein)
  • the peptide sequence of the endogenous peptide P is stored in the endogenous peptide sequence database 2.
  • the endogenous peptide P whose peptide sequence is stored in the endogenous peptide sequence database 2 is assigned to the protein.
  • the full-length sequence of this protein and the sequence start residue and sequence end residue of the endogenous peptide P in the full-length sequence are assigned to the protein. It is preferred that a group is provided.
  • the database creation unit 11 reads the endogenous peptide sequence database 2 (step S101), and generates a peptide sequence (target peptide sequence) of the target peptide based on the read peptide sequence of each endogenous peptide P (target peptide sequence) (step S101). Step S102). Specifically, the database creation unit 11 generates a target peptide sequence by stretching the peptide sequence while leaving one or more residues (partial sequence) of the peptide sequence of the endogenous peptide P. At this time, the database creation unit 11 expands and contracts the peptide sequence while referring to the full-length sequence of the assigned protein containing the endogenous peptide P.
  • the database creation unit 11 stores the generated target peptide sequence in the target peptide sequence database 111 (step S103).
  • the processing in steps S102 and S103 is performed for all endogenous peptides P stored in the endogenous peptide sequence database 2, and when the processing for all endogenous peptides P is completed (Yes in step S104), the target peptide All sequence variations are stored in the target peptide sequence database 111.
  • a peptide sequence of the target peptide P1 is generated, or the C-terminal side is extended.
  • the peptide sequence of the target peptide P2 can be generated.
  • the peptide sequence of the target peptide P3 is generated, or both the N-terminal side and the C-terminal side are extended.
  • a peptide sequence of the target peptide P4 can also be generated.
  • a peptide sequence that does not share a partial sequence with the endogenous peptide P is not generated as a target peptide sequence. Accordingly, the search space can be further reduced even when compared to the case where the full-length sequence of the protein belonging to the known peptide is cleaved non-specifically to generate a peptide sequence. In the case where an isoform exists in the assigned protein and the registered sequences match but the extending sequences are different, they are generated as target peptide sequences of different variations and stored in the target peptide sequence database 111.
  • FIG. 4 is a flowchart showing the flow of processing by the mass analysis unit 12 and the peak list creation unit 13.
  • the mass spectrometer 12 ionizes a peptide sample containing an endogenous peptide, and measures the MS 1 spectrum by mass-analyzing the ions (step S201).
  • the peak list creation unit 13 creates an MS 1 peak list by extracting peaks from the measured MS 1 spectrum (step S202).
  • the mass spectrometer 12 selects a plurality of MS 2 precursor ions to be measured for the MS 2 spectrum from the created MS 1 peak list by a predetermined method (step S203), and cleaves each MS 2 precursor ion.
  • the MS 2 spectrum is measured by performing mass spectrometry (step S204).
  • the process of step S204 is performed for all the MS 2 precursor ions, the processing for all the MS 2 precursor ions are finished (Yes in step S205), the peak list generating unit 13, a peak from the measured MS 2 spectra Is extracted to create an MS 2 peak list (step S206).
  • FIG. 5 is a flowchart showing the flow of processing by the peptide attribution unit 14.
  • the sequence estimation unit 141 searches the target peptide sequence database 111 for a peptide sequence that matches the MS 2 precursor ion mass within a predetermined range of mass tolerance (step S301). . As a result, when one or more corresponding peptide sequences (peptide sequence candidates) are obtained (Yes in step S302), the peptide sequence candidates are scored (step S303).
  • the theoretical product ion mass of the main product ions (eg, y / b series ions) of the peptide sequence candidate is calculated, and for each product ion in the MS 2 peak list, Peptide sequence candidates that match the theoretical product ion mass within a predetermined mass tolerance range are searched.
  • the main product ion means a product ion whose site that is easily cleaved is known in advance, and the theoretical product ion mass (theoretical product ion mass) is calculated as a result of the site that is easy to cleave. Can do.
  • scoring is performed using the matched peak intensity, number of peaks, and the like.
  • various score calculation methods used in database search using a protein database can be employed.
  • steps S301 ⁇ S303 are performed for all the MS 2 precursor ions, the processing for all the MS 2 precursor ion terminated (in step S304 Yes), the peptide sequence candidates based on the scores of each peptide sequence candidates Narrowing is performed (step S305). At this time, peptide sequence candidates are uniquely narrowed down based on a significant difference in scores, and the peptide sequence is output as an analysis result (step S306). In addition, when a statistical index cannot be calculated because the number of peptide sequence candidates is small, for example, processing up to ranking based on the score is performed, and the subsequent unique narrowing may be left to the user. .
  • a peptide sequence database including one or more residues of a partial sequence of the endogenous peptide is generated based on the endogenous peptide whose peptide sequence is known.
  • Peptide sequences that have a partial sequence in common with endogenous peptides with known peptide sequences (target peptide sequences) may be left as peptide sequences of unknown endogenous peptides in mass spectra that cannot be assigned by conventional methods There is sex.
  • target peptide sequence database 111 a database of target peptide sequences (target peptide sequence database 111) is generated, an increase in search space can be effectively prevented. And based on the target peptide sequence database 111 and the mass spectrum obtained by mass analysis of the peptide sample in the mass analyzer 12, by preferentially searching for the peptide sequence of the target peptide contained in the peptide sample, The peptide sequence of more endogenous peptides can be determined with high sensitivity.
  • Example Generate 944,390 target peptide sequence variations consisting of 7 to 80 residues in length based on 944 peptide sequences of endogenous peptides attributed to the endogenous peptides and measurement data recorded in the above-mentioned MosaiqueDB. A target peptide database was created.
  • the peptide sequence is determined by the sequence estimation unit. Estimated. As a result, with respect to 35 peaks (excluding those in which the precursor ion mass overlaps within the range of mass tolerance and the peptide sequences are different), the target peptide sequence stored in the target peptide sequence database and the range of mass tolerance Peptide sequence candidates that match each other were obtained, and about 50 peptide sequence candidates on average from each peak (a total of about 1800) were obtained.
  • X a well-known search engine! Scoring was performed as follows by a score calculation method similar to Tandem.
  • the scoring method is not limited to the present embodiment, and various methods adopted in the database search method as a conventional method may be adopted.
  • Score is a score actually calculated from peptide sequence candidates and measurement data.
  • I i is the result matched the intensity of the peak of the collation
  • the total number of peaks N is matched
  • TIC is the total ion chromatogram of the MS 2 spectra to be searched
  • a threshold value for discrimination can be set using as an index a significance probability (p-value) or an expected value (E-value) calculated from the score distribution.
  • p-value significance probability
  • E-value expected value
  • the index for determining the presence / absence of a significant difference is not limited to the index as described above, and in this embodiment, the determination by E-value is performed by calculating the score difference between the first candidate and the lower candidate. It was possible to substitute (reproduce) with a discrimination method using a threshold.
  • FIG. 6 is a diagram showing the results of actual analysis of the MS 2 spectrum obtained from the urine sample.
  • “UniProt Access” is the protein ID of UniProt, which is a protein database.
  • “UniProt Name” is the name of a registered protein of UniProt.
  • “Start” and “End” indicate the positions of the starting residue and ending residue of the peptide in the registered sequence of UniProt.
  • “Sequence” is the amino acid sequence of the assigned urinary peptide.
  • Precursor Ion Mass is the mass-to-charge ratio of monovalent peptide ions observed by mass spectrometry.
  • FIG. 7 is a block diagram showing a configuration example of the peptide attribution system 100 according to the second embodiment of the present invention.
  • the product ion matching unit 142 performs MS n measurement data of endogenous peptides whose peptide sequences are stored in the endogenous peptide sequence database 2 from which the peptide sequence candidates are created ( Using the collation target data), the similarity with the MS n measurement data (collation source data) to be analyzed is calculated. Since other configurations are the same as those in the first embodiment, the same reference numerals are given to the drawings and description thereof is omitted.
  • the peptide assignment system 100 includes an endogenous peptide spectrum library 21.
  • the endogenous peptide spectrum library 21 stores an MS n spectrum obtained by performing mass spectrometry for each endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2.
  • the product ion matching unit 142 calculates the similarity with the MS n spectrum measured by the mass analysis unit 12 using the MS n spectrum stored in the endogenous peptide spectrum library 21. If ⁇ m obtained by subtracting the precursor ion mass of the verification source data from the precursor ion mass of the verification destination data is larger than the mass tolerance, the verification result with the peak obtained by subtracting ⁇ m from the product ion mass of the verification source data is also similar.
  • the matching result with the peak obtained by subtracting ⁇ n from the product ion mass of the matching source is similar. It may be used to calculate the degree.
  • collation source data and collation destination data may be collated, and a product obtained by normalizing a product of peak intensities for ion peaks that match within the range of mass tolerance may be used as the similarity.
  • an endogenous peptide In the case of an endogenous peptide, it may be cleaved at a site that cannot be predicted. Therefore, in the case of the configuration for calculating the theoretical product ion mass as in the first embodiment, scoring is performed as theoretically. You may not be able to.
  • the peptide sequence can be determined with higher sensitivity. There are cases where it is possible.

Abstract

A peptide assignment method wherein, on the basis of full length sequences of endogenous peptides with known peptide sequence, from among endogenous peptides produced in vivo, and precursor proteins of the endogenous peptides, a database creation unit 11 generates target peptide sequences which are peptide sequences containing one or more residues in the partial sequences of the endogenous peptides so as to create a target peptide sequence database 111 containing multiple target peptide sequences. Next, a mass spectrometry unit 12 conducts mass spectrometry of a peptide sample. Then, a peptide assignment unit 14 determines the peptide sequence of an endogenous peptide contained in the peptide sample on the basis of the multiple target peptide sequences created by the database creation unit 11 and a mass spectrum obtained by the mass spectrometry unit 12.

Description

ペプチド帰属方法及びペプチド帰属システムPeptide assignment method and peptide assignment system
 本発明は、生体内で産生(生成)される内在性ペプチドのペプチド配列を決定するペプチド帰属方法及びペプチド帰属システムに関するものである。 The present invention relates to a peptide attribution method and a peptide attribution system for determining the peptide sequence of an endogenous peptide produced (generated) in vivo.
 タンパク質を対象とする一般的なペプチド帰属方法として、データベース検索を用いた方法(例えば下記非特許文献1参照)や、デノボシーケンシング(de novo sequencing)を用いた方法(例えば下記非特許文献2参照)が知られている。 As a general peptide attribution method for proteins, a method using database search (for example, see Non-Patent Document 1 below) or a method using de novo sequencing (for example, see Non-Patent Document 2 below) )It has been known.
 データベース検索を用いた方法には、例えばマトリックスサイエンス社が提供しているマスコット(Mascot)などのデータベース検索法が用いられる(例えば下記非特許文献1参照)。具体的には、タンパク質データベースに収録されているタンパク質アミノ酸配列から想定される全てのペプチド断片の組み合わせが、インシリコ酵素消化(in silico digestion)により求められる。そして、求められたペプチド断片の分子量がMSプリカーサイオン質量と照合され、所定の質量許容誤差の範囲で合致したものについて理論プロダクトイオン質量が算出される。算出された理論プロダクトイオン質量はMS測定データと照合され、一致度の高いペプチドが探索される。 As a method using database search, for example, a database search method such as Mascot provided by Matrix Science is used (for example, see Non-Patent Document 1 below). Specifically, combinations of all peptide fragments assumed from protein amino acid sequences recorded in a protein database are obtained by in silico digestion. Then, the molecular weight of the obtained peptide fragment is collated with the MS 2 precursor ion mass, and the theoretical product ion mass is calculated for those that match within a predetermined mass tolerance range. The calculated theoretical product ion mass is collated with the MS 2 measurement data, and a peptide having a high degree of coincidence is searched.
 一方、デノボシーケンシングでは、データベースを用いずに、測定データからアミノ酸配列が読み取られる。具体的には、ペプチドの末端から何らかの方法によりアミノ酸残基が1残基ずつ取り除かれたペプチド断片群が生成され、それらのペプチド断片群に由来するイオンピークの質量差からアミノ酸配列が読み取られる。代表的な実装としては、PEAKSと呼ばれるソフトウェアが広く知られている(例えば下記非特許文献2参照)。 On the other hand, in de novo sequencing, amino acid sequences are read from measurement data without using a database. Specifically, a peptide fragment group in which amino acid residues are removed one by one from the end of the peptide by some method is generated, and the amino acid sequence is read from the mass difference of ion peaks derived from these peptide fragment groups. As a typical implementation, software called PEAKS is widely known (for example, see Non-Patent Document 2 below).
 タンパク質を対象とした分析では、上記のようなペプチド帰属方法のいずれを用いた場合でも、分析の難易度を下げるために、しばしば消化酵素を用いてタンパク質をペプチドに断片化してから分析が行われる。このように、タンパク質を断片化して、サイズを小さくすることにより、質量分析時のイオン化が促進され、分析感度が向上する。 In the analysis for proteins, in any of the above peptide assignment methods, analysis is often performed after the protein is fragmented into peptides using digestive enzymes in order to reduce the difficulty of the analysis. . Thus, by fragmenting the protein and reducing the size, ionization during mass spectrometry is promoted, and analysis sensitivity is improved.
 この場合、消化酵素の選択は、データ解析の上でも重要となる。例えばタンパク質データベースを用いたデータベース検索では、タンパク質を特定の部位(特定の配列)で切断するような消化酵素を選択することにより、無作為に切断するような方法と比べて探索空間を小さくすることができる。その結果、検索時間や誤同定を実用的な水準に抑えつつ、ペプチド同定数を増やすことができる。また、デノボシーケンシングにおいては、ペプチド内のアミノ酸分布をシーケンシングしやすいプロダクトイオンが生成されるように、消化酵素を選択することができる。例えば、トリプシンを消化酵素として選択した場合には、イオンがCID(衝突誘起解離)により開裂される際に、y/b系列イオンが特異的に生成されるため、シーケンシングが容易となる。 In this case, selection of the digestive enzyme is important for data analysis. For example, in a database search using a protein database, a search space can be reduced by selecting a digestive enzyme that cleaves a protein at a specific site (specific sequence) compared to a method that randomly cleaves the protein. Can do. As a result, the number of peptide identifications can be increased while suppressing the search time and misidentification to a practical level. In de novo sequencing, a digestive enzyme can be selected so that a product ion that can easily sequence the amino acid distribution in the peptide is generated. For example, when trypsin is selected as the digestive enzyme, y / b series ions are specifically generated when ions are cleaved by CID (collision-induced dissociation), so that sequencing is facilitated.
 内在性ペプチドは、生体内で産生されるペプチドであり、体内の情報伝達や機能制御に関わる分子として血液等の体液を通して輸送される他、代謝産物として尿中に排出されるものもある。この内在性ペプチドの構造を解析すれば、新薬の開発や病気の診断などに有用な情報を得ることが可能である。しかしながら、上記のような従来のペプチド帰属方法を内在性ペプチドの分析に用いることは困難であった。 Endogenous peptides are peptides produced in vivo, and are transported through body fluids such as blood as molecules involved in information transmission and function control in the body, and some are excreted in urine as metabolites. By analyzing the structure of this endogenous peptide, it is possible to obtain useful information for development of new drugs and diagnosis of diseases. However, it has been difficult to use the conventional peptide assignment method as described above for analysis of endogenous peptides.
 具体的には、内在性ペプチドは、タンパク質が生体内のプロセッシング機構や代謝機構により切断されて産生される。1つのタンパク質からは、多数の異なるペプチド断片が産生され、それらのペプチド断片の中には、部分配列を共有するペプチド断片、及び、部分配列を共有しないペプチド断片が含まれ得る。プロテオーム解析では、消化酵素を用いてタンパク質が特定の部位で切断されるため、一部の検出量(イオン化効率)の高いペプチド断片の配列を決定することによりタンパク質全体を帰属可能である。 Specifically, an endogenous peptide is produced by cleaving a protein by an in vivo processing mechanism or metabolic mechanism. A single protein produces a number of different peptide fragments, which may include peptide fragments that share partial sequences and peptide fragments that do not share partial sequences. In proteome analysis, a protein is cleaved at a specific site using a digestive enzyme. Therefore, the entire protein can be assigned by determining the sequence of a peptide fragment with a high detection amount (ionization efficiency).
 しかし、内在性ペプチドの場合には、生体内で様々な切断機序により産生されるため、同一タンパク質由来のペプチドであっても産生量は異なる。従って、産生量やイオン化効率が低く検出されにくいペプチドに対しては、より高感度な帰属技術が求められる。また、切断機序が既知の内在性ペプチドを除いて、タンパク質からの切断部位は予め特定されていない。したがって、従来法であるタンパク質データベースを用いたデータベース検索により内在性ペプチドの構造解析を行うためには、あらゆる部位で切断されて産生されたペプチド断片を想定して探索を行う必要があり、そのため、探索空間が著しく増大する。著しい探索空間の増大は、単に検索時間の増大をもたらすだけでなく、ペプチドの同定数の低下、すなわち同定感度の低下をもたらすという問題がある。 However, in the case of an endogenous peptide, it is produced by various cleavage mechanisms in a living body, and therefore the production amount is different even for peptides derived from the same protein. Therefore, a more sensitive assignment technique is required for peptides with low production and ionization efficiency that are difficult to detect. Moreover, the cleavage site | part from protein is not specified previously except the endogenous peptide with a known cutting mechanism. Therefore, in order to perform structural analysis of endogenous peptides by database search using a conventional protein database, it is necessary to perform a search assuming peptide fragments produced by cleavage at any site. The search space is significantly increased. There is a problem that a significant increase in search space not only results in an increase in search time but also a decrease in the number of identified peptides, that is, a decrease in identification sensitivity.
 また、内在性ペプチドは、ペプチド内のアミノ酸の空間分布が多様であるため、プロダクトイオンの生成パターンが多様かつ複雑である。そのため、プロテオーム解析のように消化酵素の導入によりペプチド断片のアミノ酸分布が均質的になるようデザインした場合とは異なり、デノボシーケンシングによる解析も困難となる。 In addition, since endogenous peptides have various spatial distributions of amino acids in the peptides, the production pattern of product ions is diverse and complicated. Therefore, unlike the case where the amino acid distribution of the peptide fragment is designed to be uniform by introducing a digestive enzyme as in proteome analysis, analysis by de novo sequencing becomes difficult.
 本発明は、上記実情に鑑みてなされたものであり、より多くの内在性ペプチドのペプチド配列を高感度で決定することができるペプチド帰属方法及びペプチド帰属システムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a peptide assignment method and a peptide assignment system that can determine peptide sequences of more endogenous peptides with high sensitivity.
 本発明に係るペプチド帰属方法は、データベース作成ステップと、質量分析ステップと、ペプチド帰属ステップとを含む。前記データベース作成ステップでは、生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を1残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成する。前記質量分析ステップでは、ペプチド試料に対して質量分析を行う。前記ペプチド帰属ステップでは、前記データベース作成ステップにより作成された複数の標的ペプチド配列と、前記質量分析ステップにより得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定する。 The peptide attribution method according to the present invention includes a database creation step, a mass analysis step, and a peptide attribution step. In the database creation step, a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence among the endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide. Is generated as a target peptide sequence, thereby creating a target peptide sequence database including a plurality of target peptide sequences. In the mass analysis step, mass analysis is performed on the peptide sample. In the peptide assignment step, the peptide sequence of the endogenous peptide contained in the peptide sample is determined based on the plurality of target peptide sequences created in the database creation step and the mass spectrum obtained in the mass analysis step. .
 このような構成によれば、ペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を1残基以上含むペプチド配列のデータベースが生成される。ペプチド配列が既知の内在性ペプチドと部分配列が一部共通しているペプチド配列(標的ペプチド配列)は、従来法では帰属されない質量スペクトル中に未知の内在性ペプチドのペプチド配列として残されている可能性がある。 According to such a configuration, a peptide sequence comprising one or more residues of a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence and the full-length sequence of the precursor protein of the endogenous peptide. A database is generated. Peptide sequences (target peptide sequences) that have a partial sequence in common with endogenous peptides with known peptide sequences (target peptide sequences) may be left as peptide sequences of unknown endogenous peptides in mass spectra that cannot be assigned by conventional methods There is sex.
 したがって、標的ペプチド配列のデータベース(標的ペプチド配列データベース)を生成すれば、探索空間の増大を効果的に防止することができる。そして、当該標的ペプチド配列データベースとペプチド試料の質量分析により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる標的ペプチドのペプチド配列を優先的に探索することによって、より多くの内在性ペプチドのペプチド配列を高感度で決定することができる。 Therefore, if a database of target peptide sequences (target peptide sequence database) is generated, an increase in search space can be effectively prevented. Based on the target peptide sequence database and the mass spectrum obtained by mass analysis of the peptide sample, the peptide sequence of the target peptide contained in the peptide sample is preferentially searched, so that more endogenous peptides can be detected. Peptide sequences can be determined with high sensitivity.
 本発明に係るペプチド帰属システムは、データベース作成部と、質量分析部と、ペプチド帰属部とを備える。前記データベース作成部は、生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を1残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成する。前記質量分析部は、ペプチド試料に対して質量分析を行う。前記ペプチド帰属部は、前記データベース作成部により作成された複数の標的ペプチド配列と、前記質量分析部により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定する。 The peptide attribution system according to the present invention includes a database creation unit, a mass analysis unit, and a peptide attribution unit. The database creation unit is a partial sequence of the endogenous peptide based on the endogenous peptide having a known peptide sequence among the endogenous peptides produced in vivo, and the full-length sequence of the precursor protein of the endogenous peptide. Is generated as a target peptide sequence, thereby creating a target peptide sequence database including a plurality of target peptide sequences. The mass spectrometer performs mass analysis on the peptide sample. The peptide assignment unit determines a peptide sequence of an endogenous peptide contained in a peptide sample based on a plurality of target peptide sequences created by the database creation unit and a mass spectrum obtained by the mass analysis unit. .
 本発明によれば、標的ペプチド配列データベースを生成することにより探索空間の増大を効果的に防止することができ、当該標的ペプチド配列データベースとペプチド試料の質量分析により得られた質量スペクトルとに基づいて、より多くの内在性ペプチドのペプチド配列を高感度で決定することができる。 According to the present invention, it is possible to effectively prevent an increase in search space by generating a target peptide sequence database, and based on the target peptide sequence database and a mass spectrum obtained by mass analysis of a peptide sample. The peptide sequence of more endogenous peptides can be determined with high sensitivity.
本発明の第1実施形態に係るペプチド帰属システムの構成例を示したブロック図である。It is the block diagram which showed the structural example of the peptide attribution system which concerns on 1st Embodiment of this invention. データベース作成部により標的ペプチド配列を生成する際の態様について説明するための図である。It is a figure for demonstrating the aspect at the time of producing | generating a target peptide sequence by a database preparation part. データベース作成部による処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process by a database preparation part. 質量分析部及びピークリスト作成部による処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process by a mass spectrometer part and a peak list preparation part. ペプチド帰属部による処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the process by a peptide attribution part. 尿試料から得られたMSスペクトルについて実際に解析を行った結果を示す図である。It is a diagram showing a result of actual analyzed MS 2 spectra obtained from urine samples. 本発明の第2実施形態に係るペプチド帰属システムの構成例を示したブロック図である。It is the block diagram which showed the structural example of the peptide attribution system which concerns on 2nd Embodiment of this invention.
<第1実施形態>
1.第1実施形態に係るペプチド帰属システムの構成
 図1は、本発明の第1実施形態に係るペプチド帰属システム1の構成例を示したブロック図である。
<First Embodiment>
1. Configuration of Peptide Attribution System According to First Embodiment FIG. 1 is a block diagram showing a configuration example of a peptide attribution system 1 according to the first embodiment of the present invention.
 このペプチド帰属システム1は、分析対象となる試料(ペプチド試料)の中から、生体内で産生される内在性ペプチドのペプチド配列を決定するためのものであり、データベース作成部11、質量分析部12、ピークリスト作成部13及びペプチド帰属部14などを備えている。これらの各部11~14の少なくとも一部は、CPU(Central Processing Unit)を含む情報処理装置により構成されている。 The peptide assignment system 1 is for determining a peptide sequence of an endogenous peptide produced in a living body from a sample to be analyzed (peptide sample), and includes a database creation unit 11 and a mass analysis unit 12. , A peak list creation unit 13 and a peptide attribution unit 14. At least a part of each of these units 11 to 14 is configured by an information processing apparatus including a CPU (Central Processing Unit).
 このペプチド帰属システム1は、内在性ペプチド配列データベース2に格納されている複数のペプチド配列を用いて、ペプチド試料中の内在性ペプチドのペプチド配列を決定する。内在性ペプチド配列データベース2には、ペプチド配列が既知である複数の内在性ペプチドについて、それらの内在性ペプチドのペプチド配列が格納されている。ここで、「ペプチド配列が既知」とは、公開されている内在性ペプチドの配列データベースや文献情報にペプチド配列が収録されている場合や、従来の解析法(手動解析を含む。)により高い信頼度でペプチド配列が帰属されている場合を含む概念である。 The peptide attribution system 1 determines a peptide sequence of an endogenous peptide in a peptide sample using a plurality of peptide sequences stored in the endogenous peptide sequence database 2. The endogenous peptide sequence database 2 stores the peptide sequences of a plurality of endogenous peptides whose peptide sequences are known. Here, “the peptide sequence is known” means that the peptide sequence is recorded in a publicly available sequence database of endogenous peptides or literature information, or is highly reliable by conventional analysis methods (including manual analysis). This is a concept including the case where a peptide sequence is assigned at a degree.
 公開されている内在性ペプチドの配列データベースや文献情報としては、例えば尿に含まれる内在性ペプチドの配列データベースであるMosaiques DB(http://mosaiques-diagnostics.de/diapatpcms/mosaiquescms/front_content.php?idcat=257, Siwy et al., “Human urinary peptide database for multiple disease biomarker discovery”, Proteomics Clin. Appl., 2011, 5, 367-374)や、データベース化されていない文献(Smith, et al., “Deciphering the peptidome of urine from ovarian cancer patients and healthy controls”, Clin. Proteomics, 2014, 11(1):23)などが知られている。 Examples of the published endogenous peptide sequence database and literature information include, for example, the Mosaiques DB (http://mosaiques-diagnostics.de/diapatpcms/mosaiquescms/front_content.php?), Which is a sequence database of endogenous peptides contained in urine. idcat = 257, Siwy et al., “Human urinary peptide database for multiple disease biomarker discovery”, Proteomics Clin. Appl., 2011, 5, 367-374) and non-database documents (Smith, et al., “Deciphering the peptidome of urine from ovarian cancer patients and healthy controls”, Clin. Proteomics, 2014, 11 (1): 23).
 データベース作成部11は、内在性ペプチド配列データベース2に格納されている複数の内在性ペプチドのペプチド配列と、タンパク質配列データベース3に格納されている当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、これらの内在性ペプチドとは異なるペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベース111を作成する(データベース作成ステップ)。タンパク質配列データベース3は、内在性ペプチド配列データベース2には未登録の配列を伸長する際に参照されるタンパク質の全長配列データベースである。このとき生成される標的ペプチド配列は、内在性ペプチド配列データベース2にペプチド配列が格納されている内在性ペプチドの部分配列を1残基以上含むペプチド配列である。すなわち、データベース作成部11により生成される標的ペプチド配列は、ペプチド配列が既知の内在性ペプチドと部分配列(あるいは全ての配列)が共通している。 The database creation unit 11 is based on the peptide sequences of a plurality of endogenous peptides stored in the endogenous peptide sequence database 2 and the full-length sequences of the precursor proteins of the endogenous peptides stored in the protein sequence database 3. The target peptide sequence database 111 including a plurality of target peptide sequences is created by generating a peptide sequence different from these endogenous peptides as the target peptide sequence (database creation step). The protein sequence database 3 is a full-length sequence database of proteins that are referred to when a sequence not registered in the endogenous peptide sequence database 2 is extended. The target peptide sequence generated at this time is a peptide sequence including one or more residues of a partial sequence of the endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2. That is, the target peptide sequence generated by the database creation unit 11 has a partial sequence (or all sequences) in common with an endogenous peptide whose peptide sequence is known.
 質量分析部12は、ペプチド試料に対して質量分析を行う(質量分析ステップ)。質量分析部12による質量分析の方法としては、特に限定されるものではないが、例えばイオントラップ飛行時間型質量分析装置(IT-TOF MS)を用いた方法を採用することができる。この方法を用いた場合には、例えばイオン化部、イオントラップ及びTOF MS(いずれも図示せず)を備えたIT-TOF MSを用いてペプチド試料に対する質量分析が行われる。 The mass spectrometer 12 performs mass analysis on the peptide sample (mass analysis step). A method of mass spectrometry by the mass analyzer 12 is not particularly limited, and for example, a method using an ion trap time-of-flight mass spectrometer (IT-TOF MS) can be employed. When this method is used, mass analysis is performed on a peptide sample using, for example, IT-TOF MS equipped with an ionization section, an ion trap, and TOF MS (all not shown).
 具体的には、ペプチド試料はイオン化部においてイオン化され、そのイオンがイオントラップにより捕捉される。イオントラップとしては、例えば三次元四重極型のものを用いることができるが、これに限られるものではない。イオントラップ内には、捕捉したイオンの一部が選択的に残され、そのイオンがCID(衝突誘起解離)により開裂される。開裂されたイオンは、イオントラップからTOF MS(飛行時間型質量分析器)に送られる。 Specifically, the peptide sample is ionized in the ionization section, and the ions are captured by an ion trap. As the ion trap, for example, a three-dimensional quadrupole type can be used, but the ion trap is not limited thereto. A part of the trapped ions is selectively left in the ion trap, and the ions are cleaved by CID (collision induced dissociation). The cleaved ions are sent from the ion trap to the TOF MS (time-of-flight mass analyzer).
 TOF MSでは、飛行空間を飛行したイオンが検出器により検出される。具体的には、飛行空間に形成された電場により加速されたイオンが、当該飛行空間を飛行する間にm/z(質量電荷比)に応じて時間的に分離され、検出器により順次検出される。これにより、m/zと検出器における検出強度との関係が質量スペクトルとして測定され、質量分析が実現される。ただし、IT-TOF MSに限らず、例えばイオントラップのないタンデム型飛行時間型質量分析装置(Tandem TOF(TOF-TOF) MS)の他、四重極飛行時間型質量分析装置(Q-TOF MS)、四重極イオントラップ型質量分析装置(Qq-IT MS)などのハイブリッド型の質量分析装置を用いて質量分析が行われてもよい。また、イオンの開裂方法についても、CIDに限られるものではなく、ETD(電子移動解離)、ECD(電子捕獲解離)などの他の開裂方法が用いられてもよい。 In TOF MS, ions flying in the flight space are detected by a detector. Specifically, ions accelerated by an electric field formed in the flight space are temporally separated according to m / z (mass-to-charge ratio) while flying in the flight space, and sequentially detected by a detector. The Thereby, the relationship between m / z and the detection intensity in the detector is measured as a mass spectrum, and mass spectrometry is realized. However, it is not limited to IT-TOF MS, for example, a tandem time-of-flight mass spectrometer without an ion trap (Tandem TOF (TOF-TOF) MS) and a quadrupole time-of-flight mass spectrometer (Q-TOF MS) ), Mass spectrometry may be performed using a hybrid mass spectrometer such as a quadrupole ion trap mass spectrometer (Qq-IT MS). Also, the ion cleavage method is not limited to CID, and other cleavage methods such as ETD (electron transfer dissociation) and ECD (electron capture dissociation) may be used.
 IT-TOF MSを用いた質量分析では、イオントラップにおいてイオンを開裂させてTOF MSで質量分析を行うという一連の動作を繰り返し行うことにより、MS分析(nは2以上の整数)を行い、質量スペクトルとしてのMSスペクトルを測定することができる。 In mass spectrometry using IT-TOF MS, MS n analysis (n is an integer of 2 or more) is performed by repeating a series of operations of cleaving ions in an ion trap and performing mass analysis with TOF MS. The MS n spectrum as a mass spectrum can be measured.
 ピークリスト作成部13は、質量分析部12により得られたMSスペクトルに基づいて、そのMSスペクトルに含まれるピークを抽出したピークリスト(MSピークリスト)を作成する。 Peak list creation section 13, based on the MS n spectrum obtained by a mass analysis unit 12, to create a peak list obtained by extracting peaks included in the MS n spectra (MS n peak list).
 ペプチド帰属部14は、標的ペプチド配列データベース111に格納されている複数の標的ペプチド配列と、ピークリスト作成部13により作成されたピークリストとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定する(ペプチド帰属ステップ)。ペプチド帰属部14は、例えばCPUがプログラムを実行することにより、配列推定部141及びプロダクトイオン照合部142などの機能部を含んでいる。 The peptide attribution unit 14 is based on a plurality of target peptide sequences stored in the target peptide sequence database 111 and the peak list created by the peak list creation unit 13, and the peptide sequences of endogenous peptides contained in the peptide sample Is determined (peptide assignment step). The peptide attribution unit 14 includes functional units such as a sequence estimation unit 141 and a product ion collation unit 142 when the CPU executes a program, for example.
 配列推定部141は、例えばMSピークリスト中のMSプリカーサイオンについて、MSプリカーサイオン質量と所定の質量許容誤差の範囲で合致するペプチド配列を標的ペプチド配列データベース111から探索する。配列推定部141により探索されたペプチド配列は、ペプチド試料に含まれる内在性ペプチドのペプチド配列の候補(ペプチド配列候補)となる。 For example, for the MS 2 precursor ion in the MS 1 peak list, the sequence estimation unit 141 searches the target peptide sequence database 111 for a peptide sequence that matches the MS 2 precursor ion mass within a predetermined range of mass tolerance. The peptide sequence searched by the sequence estimation unit 141 becomes a peptide sequence candidate (peptide sequence candidate) of the endogenous peptide contained in the peptide sample.
 プロダクトイオン照合部142は、配列推定部141で得られたペプチド配列候補に対してスコア付けを行う。ペプチド配列候補の数が十分に得られた場合には、各ペプチド配列候補のスコアの分布から統計的に有意なペプチド配列候補を求め、そのペプチド配列をペプチド試料に含まれる内在性ペプチドのペプチド配列に決定することができる。 The product ion verification unit 142 scores the peptide sequence candidates obtained by the sequence estimation unit 141. When a sufficient number of peptide sequence candidates are obtained, a statistically significant peptide sequence candidate is obtained from the distribution of scores of each peptide sequence candidate, and the peptide sequence of the endogenous peptide contained in the peptide sample is obtained. Can be determined.
2.データベース作成部による処理
 図2は、データベース作成部11により標的ペプチド配列を生成する際の態様について説明するための図である。また、図3は、データベース作成部11による処理の流れを示したフローチャートである。
2. Processing by Database Creation Unit FIG. 2 is a diagram for explaining an aspect when the database creation unit 11 generates a target peptide sequence. FIG. 3 is a flowchart showing the flow of processing by the database creation unit 11.
 図2の例では、既知の全長配列を有するタンパク質(帰属タンパク質)の中に、ペプチド配列が既知の内在性ペプチドPが含まれる場合について説明する。すなわち、この例では、内在性ペプチドPのペプチド配列が内在性ペプチド配列データベース2に格納されているものとする。内在性ペプチド配列データベース2にペプチド配列が格納されている内在性ペプチドPは、タンパク質に帰属されており、このタンパク質の全長配列と全長配列中の内在性ペプチドPの配列開始残基及び配列終了残基が与えられていることが好ましい。 In the example of FIG. 2, a case where an endogenous peptide P having a known peptide sequence is included in a protein having a known full-length sequence (assigned protein) will be described. That is, in this example, it is assumed that the peptide sequence of the endogenous peptide P is stored in the endogenous peptide sequence database 2. The endogenous peptide P whose peptide sequence is stored in the endogenous peptide sequence database 2 is assigned to the protein. The full-length sequence of this protein and the sequence start residue and sequence end residue of the endogenous peptide P in the full-length sequence are assigned to the protein. It is preferred that a group is provided.
 この場合、データベース作成部11は、内在性ペプチド配列データベース2を読み込み(ステップS101)、読み込んだ各内在性ペプチドPのペプチド配列に基づいて、標的ペプチドのペプチド配列(標的ペプチド配列)を生成する(ステップS102)。具体的には、データベース作成部11は、内在性ペプチドPのペプチド配列の一部(部分配列)を1残基以上残しながら、ペプチド配列を伸縮させることにより標的ペプチド配列を生成する。このとき、データベース作成部11は、内在性ペプチドPが含まれる帰属タンパク質の全長配列を参照しながらペプチド配列を伸縮させる。 In this case, the database creation unit 11 reads the endogenous peptide sequence database 2 (step S101), and generates a peptide sequence (target peptide sequence) of the target peptide based on the read peptide sequence of each endogenous peptide P (target peptide sequence) (step S101). Step S102). Specifically, the database creation unit 11 generates a target peptide sequence by stretching the peptide sequence while leaving one or more residues (partial sequence) of the peptide sequence of the endogenous peptide P. At this time, the database creation unit 11 expands and contracts the peptide sequence while referring to the full-length sequence of the assigned protein containing the endogenous peptide P.
 データベース作成部11は、生成した標的ペプチド配列を標的ペプチド配列データベース111に格納する(ステップS103)。ステップS102,S103の処理は、内在性ペプチド配列データベース2に格納されている全ての内在性ペプチドPについて行われ、全ての内在性ペプチドPについての処理が終了すると(ステップS104でYes)、標的ペプチド配列の全てのバリエーションが標的ペプチド配列データベース111に格納される。 The database creation unit 11 stores the generated target peptide sequence in the target peptide sequence database 111 (step S103). The processing in steps S102 and S103 is performed for all endogenous peptides P stored in the endogenous peptide sequence database 2, and when the processing for all endogenous peptides P is completed (Yes in step S104), the target peptide All sequence variations are stored in the target peptide sequence database 111.
 例えば図2に示すように、内在性ペプチドPのペプチド配列のN末端側を伸長させ、C末端側を短縮させることにより、標的ペプチドP1のペプチド配列を生成したり、C末端側を伸長させ、N末端側を短縮させることにより、標的ペプチドP2のペプチド配列を生成したりすることができる。また、内在性ペプチドPのペプチド配列のN末端側及びC末端側の両方を短縮させることにより、標的ペプチドP3のペプチド配列を生成したり、N末端側及びC末端側の両方を伸長させることにより、標的ペプチドP4のペプチド配列を生成したりすることもできる。ただし、図2中にP5,P6で示すように、内在性ペプチドPと部分配列が共通していないペプチド配列は、標的ペプチド配列として生成されない。従って、既知ペプチドの帰属タンパク質の全長配列を非特異的に切断処理しペプチド配列を生成した場合と比較しても、さらに探索空間を小さく抑えることが可能である。なお、帰属タンパク質にアイソフォームが存在し、登録配列は一致するが伸長する配列が異なるという場合には、異なるバリエーションの標的ペプチド配列として生成され、標的ペプチド配列データベース111に格納される。 For example, as shown in FIG. 2, by extending the N-terminal side of the peptide sequence of the endogenous peptide P and shortening the C-terminal side, a peptide sequence of the target peptide P1 is generated, or the C-terminal side is extended, By shortening the N-terminal side, the peptide sequence of the target peptide P2 can be generated. In addition, by shortening both the N-terminal side and the C-terminal side of the peptide sequence of the endogenous peptide P, the peptide sequence of the target peptide P3 is generated, or both the N-terminal side and the C-terminal side are extended. A peptide sequence of the target peptide P4 can also be generated. However, as indicated by P5 and P6 in FIG. 2, a peptide sequence that does not share a partial sequence with the endogenous peptide P is not generated as a target peptide sequence. Accordingly, the search space can be further reduced even when compared to the case where the full-length sequence of the protein belonging to the known peptide is cleaved non-specifically to generate a peptide sequence. In the case where an isoform exists in the assigned protein and the registered sequences match but the extending sequences are different, they are generated as target peptide sequences of different variations and stored in the target peptide sequence database 111.
3.質量分析部及びピークリスト作成部による処理
 図4は、質量分析部12及びピークリスト作成部13による処理の流れを示したフローチャートである。
3. Processing by Mass Analysis Unit and Peak List Creation Unit FIG. 4 is a flowchart showing the flow of processing by the mass analysis unit 12 and the peak list creation unit 13.
 質量分析部12は、内在性ペプチドを含むペプチド試料をイオン化し、そのイオンを質量分析することによりMSスペクトルを測定する(ステップS201)。このとき、ピークリスト作成部13は、測定されたMSスペクトルからピークを抽出することにより、MSピークリストを作成する(ステップS202)。 The mass spectrometer 12 ionizes a peptide sample containing an endogenous peptide, and measures the MS 1 spectrum by mass-analyzing the ions (step S201). At this time, the peak list creation unit 13 creates an MS 1 peak list by extracting peaks from the measured MS 1 spectrum (step S202).
 その後、質量分析部12は、作成されたMSピークリストから、MSスペクトルの測定対象となる複数のMSプリカーサイオンを所定の方法により選択し(ステップS203)、各MSプリカーサイオンを開裂させて質量分析を行うことによりMSスペクトルを測定する(ステップS204)。ステップS204の処理は、全てのMSプリカーサイオンについて行われ、全てのMSプリカーサイオンについての処理が終了すると(ステップS205でYes)、ピークリスト作成部13が、測定されたMSスペクトルからピークを抽出することにより、MSピークリストを作成する(ステップS206)。 Thereafter, the mass spectrometer 12 selects a plurality of MS 2 precursor ions to be measured for the MS 2 spectrum from the created MS 1 peak list by a predetermined method (step S203), and cleaves each MS 2 precursor ion. The MS 2 spectrum is measured by performing mass spectrometry (step S204). The process of step S204 is performed for all the MS 2 precursor ions, the processing for all the MS 2 precursor ions are finished (Yes in step S205), the peak list generating unit 13, a peak from the measured MS 2 spectra Is extracted to create an MS 2 peak list (step S206).
4.ペプチド帰属部による処理
 図5は、ペプチド帰属部14による処理の流れを示したフローチャートである。
4). Processing by Peptide Attribution Unit FIG. 5 is a flowchart showing the flow of processing by the peptide attribution unit 14.
 配列推定部141は、MSピークリスト中の各MSプリカーサイオンについて、MSプリカーサイオン質量と所定の質量許容誤差の範囲で合致するペプチド配列を標的ペプチド配列データベース111から探索する(ステップS301)。その結果、該当するペプチド配列(ペプチド配列候補)が1つ以上得られた場合には(ステップS302でYes)、そのペプチド配列候補に対してスコア付けを行う(ステップS303)。 For each MS 2 precursor ion in the MS 1 peak list, the sequence estimation unit 141 searches the target peptide sequence database 111 for a peptide sequence that matches the MS 2 precursor ion mass within a predetermined range of mass tolerance (step S301). . As a result, when one or more corresponding peptide sequences (peptide sequence candidates) are obtained (Yes in step S302), the peptide sequence candidates are scored (step S303).
 ペプチド配列候補に対してスコア付けを行う際には、例えばペプチド配列候補の主要プロダクトイオン(例えばy/b系列イオン)の理論プロダクトイオン質量が算出され、MSピークリスト中の各プロダクトイオンについて、理論プロダクトイオン質量と所定の質量許容誤差の範囲で合致するペプチド配列候補が探索される。主要プロダクトイオンとは、切断されやすい部位が予め分かっているプロダクトイオンを意味しており、切断されやすい部位が予め分かっている結果、理論的なプロダクトイオン質量(理論プロダクトイオン質量)を算出することができる。 When scoring a peptide sequence candidate, for example, the theoretical product ion mass of the main product ions (eg, y / b series ions) of the peptide sequence candidate is calculated, and for each product ion in the MS 2 peak list, Peptide sequence candidates that match the theoretical product ion mass within a predetermined mass tolerance range are searched. The main product ion means a product ion whose site that is easily cleaved is known in advance, and the theoretical product ion mass (theoretical product ion mass) is calculated as a result of the site that is easy to cleave. Can do.
 探索された各ペプチド配列候補については、合致したピーク強度やピーク数などを用いてスコア付けが行われる。スコア算出法としては、タンパク質データベースを用いたデータベース検索で使用されている様々なスコア算出法を採用することができる。 For each searched peptide sequence candidate, scoring is performed using the matched peak intensity, number of peaks, and the like. As the score calculation method, various score calculation methods used in database search using a protein database can be employed.
 ステップS301~S303の処理は、全てのMSプリカーサイオンについて行われ、全てのMSプリカーサイオンについての処理が終了すると(ステップS304でYes)、各ペプチド配列候補のスコアに基づいてペプチド配列候補の絞り込みが行われる(ステップS305)。このとき、スコアの有意差に基づいてペプチド配列候補が一意的に絞り込まれ、そのペプチド配列が解析結果として出力される(ステップS306)。なお、ペプチド配列候補の数が少ないなどの理由で統計的な指標を算出できない場合には、例えばスコアに基づく順位付けまでの処理が行われ、その後の一意的な絞り込みはユーザに委ねてもよい。 Processing of steps S301 ~ S303 are performed for all the MS 2 precursor ions, the processing for all the MS 2 precursor ion terminated (in step S304 Yes), the peptide sequence candidates based on the scores of each peptide sequence candidates Narrowing is performed (step S305). At this time, peptide sequence candidates are uniquely narrowed down based on a significant difference in scores, and the peptide sequence is output as an analysis result (step S306). In addition, when a statistical index cannot be calculated because the number of peptide sequence candidates is small, for example, processing up to ranking based on the score is performed, and the subsequent unique narrowing may be left to the user. .
5.作用効果
 本実施形態では、ペプチド配列が既知の内在性ペプチドに基づいて、当該内在性ペプチドの部分配列を1残基以上含むペプチド配列のデータベース(標的ペプチド配列データベース111)が生成される。ペプチド配列が既知の内在性ペプチドと部分配列が一部共通しているペプチド配列(標的ペプチド配列)は、従来法では帰属されない質量スペクトル中に未知の内在性ペプチドのペプチド配列として残されている可能性がある。
5. Effects In this embodiment, a peptide sequence database (target peptide sequence database 111) including one or more residues of a partial sequence of the endogenous peptide is generated based on the endogenous peptide whose peptide sequence is known. Peptide sequences (target peptide sequences) that have a partial sequence in common with endogenous peptides with known peptide sequences (target peptide sequences) may be left as peptide sequences of unknown endogenous peptides in mass spectra that cannot be assigned by conventional methods There is sex.
 したがって、標的ペプチド配列のデータベース(標的ペプチド配列データベース111)を生成すれば、探索空間の増大を効果的に防止することができる。そして、当該標的ペプチド配列データベース111と質量分析部12でのペプチド試料の質量分析により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる標的ペプチドのペプチド配列を優先的に探索することによって、より多くの内在性ペプチドのペプチド配列を高感度で決定することができる。 Therefore, if a database of target peptide sequences (target peptide sequence database 111) is generated, an increase in search space can be effectively prevented. And based on the target peptide sequence database 111 and the mass spectrum obtained by mass analysis of the peptide sample in the mass analyzer 12, by preferentially searching for the peptide sequence of the target peptide contained in the peptide sample, The peptide sequence of more endogenous peptides can be determined with high sensitivity.
6.実施例
 上述のMosaiqueDBに収録されている内在性ペプチド及び測定データから帰属した内在性ペプチドのペプチド配列944個を基に、7~80残基長からなる標的ペプチド配列のバリエーションを944,390個生成し、標的ペプチドデータベースを作成した。
6). Example Generate 944,390 target peptide sequence variations consisting of 7 to 80 residues in length based on 944 peptide sequences of endogenous peptides attributed to the endogenous peptides and measurement data recorded in the above-mentioned MosaiqueDB. A target peptide database was created.
 尿試料から質量分析部により測定された約38ピーク(プリカーサイオン:m/z=793~2943、計70スペクトル)のMS測定データ及びMS測定データに対して、配列推定部でペプチド配列を推定した。その結果、35ピーク(プリカーサイオン質量が質量許容誤差の範囲で重複するもの、かつ、ペプチド配列が異なるものを除く)について、標的ペプチド配列データベースに格納されている標的ペプチド配列と質量許容誤差の範囲で合致するペプチド配列候補が得られ、各ピークから平均して約50個(計1800個余り)のペプチド配列候補が得られた。 For the MS 2 measurement data and the MS 3 measurement data of about 38 peaks (precursor ions: m / z = 793 to 2943, 70 spectra in total) measured from the urine sample by the mass spectrometry unit, the peptide sequence is determined by the sequence estimation unit. Estimated. As a result, with respect to 35 peaks (excluding those in which the precursor ion mass overlaps within the range of mass tolerance and the peptide sequences are different), the target peptide sequence stored in the target peptide sequence database and the range of mass tolerance Peptide sequence candidates that match each other were obtained, and about 50 peptide sequence candidates on average from each peak (a total of about 1800) were obtained.
 上記配列推定部で推定されたペプチド配列候補について、y/b系列イオンの理論質量を算出し、解析対象ピークから得られた計70のMSスペクトル(n=2又は3)に対してプロダクトイオンの照合を行った。そして、公知の検索エンジンであるX!Tandemに類似するスコア算出法により、以下の通りスコア付けを行った。ただし、スコア付けの方法は、本実施例に限定されるものではなく、従来法としてのデータベース検索法で採用されている様々な手法を採用してもよい。 For the peptide sequence candidates estimated by the sequence estimation unit, the theoretical mass of y / b series ions is calculated, and product ions are calculated for a total of 70 MS n spectra (n = 2 or 3) obtained from the analysis target peaks. Was verified. And X, a well-known search engine! Scoring was performed as follows by a score calculation method similar to Tandem. However, the scoring method is not limited to the present embodiment, and various methods adopted in the database search method as a conventional method may be adopted.
 スコア付けは、下記式(1)及び式(2)を用いて行った。
Figure JPOXMLDOC01-appb-M000001
The scoring was performed using the following formula (1) and formula (2).
Figure JPOXMLDOC01-appb-M000001
 ここで、Scoreが実際にペプチド配列候補と測定データから算出されたスコアである。Iは照合の結果合致したピークの強度、Nは合致したピークの総数、TICは探索対象のMSスペクトルのトータルイオンクロマトグラム、n及びnは、それぞれプロダクトイオン照合の結果合致したb系列イオン及びy系列イオンの個数であり、ここではN=n+nである。ペプチド配列候補のスコア分布をもとに、配列候補から統計的に有意な配列を選びだすための指標および閾値を設けることが可能である。例えばスコア分布から算出される有意確率(p-value)や期待値(E-value)を指標として判別用の閾値を設定可能である。ただし、有意差の有無を判別するための指標は、上記のような指標に限定されるものではなく、本実施例においてはE-valueによる判別を、1位候補と下位候補とのスコア差を閾値とした判別法で代替(再現)することも可能であった。 Here, Score is a score actually calculated from peptide sequence candidates and measurement data. I i is the result matched the intensity of the peak of the collation, the total number of peaks N is matched, TIC is the total ion chromatogram of the MS 2 spectra to be searched, n b and n y are, b which matches the result of the product ions collation respectively is the number of sequence ions and y-series ions, here n = n b + n y. Based on the score distribution of peptide sequence candidates, it is possible to provide an index and a threshold value for selecting a statistically significant sequence from the sequence candidates. For example, a threshold value for discrimination can be set using as an index a significance probability (p-value) or an expected value (E-value) calculated from the score distribution. However, the index for determining the presence / absence of a significant difference is not limited to the index as described above, and in this embodiment, the determination by E-value is performed by calculating the score difference between the first candidate and the lower candidate. It was possible to substitute (reproduce) with a discrimination method using a threshold.
 図6は、尿試料から得られたMSスペクトルについて実際に解析を行った結果を示す図である。「UniProt Accession」は、タンパク質データベースであるUniProtのタンパク質IDである。「UniProt Name」は、UniProtの登録タンパク質の名称である。「Start」及び「End」は、UniProtの登録配列中におけるペプチドの開始残基及び終了残基の位置を示している。「Sequence」は、帰属された尿中ペプチドのアミノ酸配列である。「Precursor Ion Mass」は、質量分析で観測された一価のペプチドイオンの質量電荷比である。 FIG. 6 is a diagram showing the results of actual analysis of the MS 2 spectrum obtained from the urine sample. “UniProt Access” is the protein ID of UniProt, which is a protein database. “UniProt Name” is the name of a registered protein of UniProt. “Start” and “End” indicate the positions of the starting residue and ending residue of the peptide in the registered sequence of UniProt. “Sequence” is the amino acid sequence of the assigned urinary peptide. “Precursor Ion Mass” is the mass-to-charge ratio of monovalent peptide ions observed by mass spectrometry.
 評価のために尿試料から得られた16個の高品質のMSスペクトルについて、従来法としてのタンパク質データベース検索法であるMascotやX!Tandemを用いた場合には、同定閾値をそれぞれプロテオーム解析で使われている閾値に比べて大幅に緩和した値である1.0、0.1とし、偽陽性ヒットを許容したにもかかわらず、5個のペプチド配列(図6中のA)が同定されるにとどまった。 For 16 high-quality MS 2 spectra obtained from urine samples for evaluation, Mascot and X! Which are protein database search methods as conventional methods. When Tandem was used, the identification threshold values were 1.0 and 0.1, which were greatly relaxed compared to the threshold values used in proteome analysis, respectively, and false positive hits were allowed, Only five peptide sequences (A in FIG. 6) were identified.
 これに対して、本発明による解析の結果、上記5個のペプチド配列(図6中のA)だけでなく、残りの11個のペプチド配列(図6中のB)も含む16個の高品質のMSスペクトル全てからペプチド配列候補が推定された。そして、プロダクトイオン照合部によるスコア付けの結果、いずれのスペクトルについても1位候補と2位候補以下とのスコア差が10以上あったことから、1位候補が有意な推定結果であると判別された。また、目視による検証の結果、いずれの推定結果も妥当な結果であることが認められた。 On the other hand, as a result of the analysis according to the present invention, not only the above 5 peptide sequences (A in FIG. 6) but also the remaining 11 peptide sequences (B in FIG. 6) 16 high quality Peptide sequence candidates were estimated from all of the MS 2 spectra. As a result of scoring by the product ion matching unit, the score difference between the first candidate and the second candidate is 10 or more for any spectrum, so that the first candidate is determined to be a significant estimation result. It was. In addition, as a result of visual inspection, it was confirmed that all of the estimation results were valid results.
<第2実施形態>
 図7は、本発明の第2実施形態に係るペプチド帰属システム100の構成例を示したブロック図である。
Second Embodiment
FIG. 7 is a block diagram showing a configuration example of the peptide attribution system 100 according to the second embodiment of the present invention.
 第1実施形態では、プロダクトイオン照合部142が主要プロダクトイオンの照合を行う際に、ペプチド配列候補から理論プロダクトイオン質量を算出するような構成について説明した。これに対して、第2実施形態では、プロダクトイオン照合部142が、ペプチド配列候補の作成元となった内在性ペプチド配列データベース2にペプチド配列が格納されている内在性ペプチドのMS測定データ(照合先データ)を用いて、解析対象となるMS測定データ(照合元データ)との類似度を算出する。その他の構成については、第1実施形態と同様であるため、図に同一符号を付して説明を省略する。 In 1st Embodiment, when the product ion collation part 142 collated main product ion, the structure which calculates theoretical product ion mass from a peptide sequence candidate was demonstrated. On the other hand, in the second embodiment, the product ion matching unit 142 performs MS n measurement data of endogenous peptides whose peptide sequences are stored in the endogenous peptide sequence database 2 from which the peptide sequence candidates are created ( Using the collation target data), the similarity with the MS n measurement data (collation source data) to be analyzed is calculated. Since other configurations are the same as those in the first embodiment, the same reference numerals are given to the drawings and description thereof is omitted.
 ペプチド帰属システム100には、内在性ペプチドスペクトルライブラリ21が含まれる。内在性ペプチドスペクトルライブラリ21には、内在性ペプチド配列データベース2にペプチド配列が格納されている各内在性ペプチドについて、質量分析を行うことにより得られたMSスペクトルが記憶されている。プロダクトイオン照合部142は、内在性ペプチドスペクトルライブラリ21に記憶されているMSスペクトルを用いて、質量分析部12で測定されたMSスペクトルとの類似度を算出する。照合先データのプリカーサイオン質量から照合元データのプリカーサイオン質量を差し引いたΔmが質量許容誤差よりも大きい場合には、照合元データのプロダクトイオン質量からΔmを差し引いたピークとの照合結果も類似度の算出に用いられる。また、Δmが照合元のペプチド配列のいずれかの末端の配列(アミノ酸1個以上)の質量Δnよりも大きい場合には、照合元のプロダクトイオン質量からΔnを差し引いたピークとの照合結果を類似度の算出に用いてよい。 The peptide assignment system 100 includes an endogenous peptide spectrum library 21. The endogenous peptide spectrum library 21 stores an MS n spectrum obtained by performing mass spectrometry for each endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2. The product ion matching unit 142 calculates the similarity with the MS n spectrum measured by the mass analysis unit 12 using the MS n spectrum stored in the endogenous peptide spectrum library 21. If Δm obtained by subtracting the precursor ion mass of the verification source data from the precursor ion mass of the verification destination data is larger than the mass tolerance, the verification result with the peak obtained by subtracting Δm from the product ion mass of the verification source data is also similar. Used to calculate In addition, when Δm is larger than the mass Δn of the sequence (one or more amino acids) at either end of the peptide sequence of the matching source, the matching result with the peak obtained by subtracting Δn from the product ion mass of the matching source is similar. It may be used to calculate the degree.
 類似度の算出には、既知のスペクトルライブラリ検索法で利用されている様々な方法を利用することができる(例えば、Stein, S. E. & Scott, D. R.: Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification. JASMS, 5, 859-866 (1994))。この場合、例えば照合元データと照合先データとを照合し、質量許容誤差の範囲で合致したイオンピークについてピーク強度の積を正規化したものを類似度として用いてもよい。 Various methods used in known spectral library search methods can be used to calculate the similarity (for example, Stein, S. E. & Scott, D. R .: Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification. JASMS, 5, 859-866 (1994)). In this case, for example, collation source data and collation destination data may be collated, and a product obtained by normalizing a product of peak intensities for ion peaks that match within the range of mass tolerance may be used as the similarity.
 内在性ペプチドの場合、予想できないような部位で切断されるような場合があるため、第1実施形態のように理論プロダクトイオン質量を算出するような構成の場合、理論通りにスコア付けを行うことができないおそれがある。これに対して、第2実施形態では、内在性ペプチド配列データベース2にペプチド配列が格納されている内在性ペプチドの実際のMS測定データを用いるため、より高感度でペプチド配列を決定することができる場合がある。 In the case of an endogenous peptide, it may be cleaved at a site that cannot be predicted. Therefore, in the case of the configuration for calculating the theoretical product ion mass as in the first embodiment, scoring is performed as theoretically. You may not be able to. On the other hand, in the second embodiment, since the actual MS n measurement data of the endogenous peptide whose peptide sequence is stored in the endogenous peptide sequence database 2 is used, the peptide sequence can be determined with higher sensitivity. There are cases where it is possible.
    1  ペプチド帰属システム
    2  内在性ペプチド配列データベース
   11  データベース作成部
   12  質量分析部
   13  ピークリスト作成部
   14  ペプチド帰属部
   21  内在性ペプチドスペクトルライブラリ
  100  ペプチド帰属システム
  111  標的ペプチド配列データベース
  141  配列推定部
  142  プロダクトイオン照合部
DESCRIPTION OF SYMBOLS 1 Peptide attribution system 2 Endogenous peptide sequence database 11 Database creation part 12 Mass spectrometry part 13 Peak list creation part 14 Peptide attribution part 21 Endogenous peptide spectrum library 100 Peptide attribution system 111 Target peptide sequence database 141 Sequence estimation part 142 Product ion collation Part

Claims (2)

  1.  生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を1残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成するデータベース作成ステップと、
     ペプチド試料に対して質量分析を行う質量分析ステップと、
     前記データベース作成ステップにより作成された複数の標的ペプチド配列と、前記質量分析ステップにより得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定するペプチド帰属ステップとを含むことを特徴とするペプチド帰属方法。
    Based on the endogenous peptide having a known peptide sequence among endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide, it contains one or more partial sequences of the endogenous peptide. Creating a target peptide sequence database including a plurality of target peptide sequences by generating a peptide sequence as a target peptide sequence; and
    A mass spectrometry step for performing mass spectrometry on a peptide sample;
    A peptide attribution step of determining a peptide sequence of an endogenous peptide contained in a peptide sample based on a plurality of target peptide sequences created by the database creation step and a mass spectrum obtained by the mass analysis step A peptide attribution method characterized by the above.
  2.  生体内で産生される内在性ペプチドのうちペプチド配列が既知の内在性ペプチド、及び、当該内在性ペプチドの前駆体タンパク質の全長配列に基づいて、当該内在性ペプチドの部分配列を1残基以上含むペプチド配列を標的ペプチド配列として生成することにより、複数の標的ペプチド配列を含む標的ペプチド配列データベースを作成するデータベース作成部と、
     ペプチド試料に対して質量分析を行う質量分析部と、
     前記データベース作成部により作成された複数の標的ペプチド配列と、前記質量分析部により得られた質量スペクトルとに基づいて、ペプチド試料に含まれる内在性ペプチドのペプチド配列を決定するペプチド帰属部とを備えたことを特徴とするペプチド帰属システム。
    Based on the endogenous peptide having a known peptide sequence among endogenous peptides produced in vivo and the full-length sequence of the precursor protein of the endogenous peptide, it contains one or more partial sequences of the endogenous peptide. A database creation unit for creating a target peptide sequence database including a plurality of target peptide sequences by generating a peptide sequence as a target peptide sequence;
    A mass spectrometer for performing mass spectrometry on a peptide sample;
    A plurality of target peptide sequences created by the database creation unit, and a peptide attribution unit for determining the peptide sequence of the endogenous peptide contained in the peptide sample based on the mass spectrum obtained by the mass analysis unit. A peptide attribution system characterized by that.
PCT/JP2016/076963 2015-09-14 2016-09-13 Peptide assignment method and peptide assignment system WO2017047580A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/759,659 US20190041393A1 (en) 2015-09-14 2016-09-13 Peptide assignment method and peptide assignment system
JP2017539911A JP6489224B2 (en) 2015-09-14 2016-09-13 Peptide assignment method and peptide assignment system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015-181031 2015-09-14
JP2015181031 2015-09-14

Publications (1)

Publication Number Publication Date
WO2017047580A1 true WO2017047580A1 (en) 2017-03-23

Family

ID=58288676

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/076963 WO2017047580A1 (en) 2015-09-14 2016-09-13 Peptide assignment method and peptide assignment system

Country Status (3)

Country Link
US (1) US20190041393A1 (en)
JP (1) JP6489224B2 (en)
WO (1) WO2017047580A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019200183A (en) * 2018-05-18 2019-11-21 株式会社島津製作所 Method for creating spectra library for endogenous peptide identification, endogenous peptide identification method and endogenous peptide identification device
JP2021510829A (en) * 2018-02-26 2021-04-30 レコ コーポレイションLeco Corporation A method for classifying library hits in mass spectrometry

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005283430A (en) * 2004-03-30 2005-10-13 Shimadzu Corp Method for analyzing structure of biological sample
JP2009092411A (en) * 2007-10-04 2009-04-30 Nec Corp Peptide identification method
JP2013047624A (en) * 2011-08-29 2013-03-07 Shimadzu Corp Modified protein identification method using mass analysis and identification apparatus
JP2015021739A (en) * 2013-07-16 2015-02-02 国立大学法人 熊本大学 Creation method of database for identification/determination of peptide peak in mass analysis
JP2015049056A (en) * 2013-08-30 2015-03-16 株式会社島津製作所 Mass analysis data analyzer and analysis method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005283430A (en) * 2004-03-30 2005-10-13 Shimadzu Corp Method for analyzing structure of biological sample
JP2009092411A (en) * 2007-10-04 2009-04-30 Nec Corp Peptide identification method
JP2013047624A (en) * 2011-08-29 2013-03-07 Shimadzu Corp Modified protein identification method using mass analysis and identification apparatus
JP2015021739A (en) * 2013-07-16 2015-02-02 国立大学法人 熊本大学 Creation method of database for identification/determination of peptide peak in mass analysis
JP2015049056A (en) * 2013-08-30 2015-03-16 株式会社島津製作所 Mass analysis data analyzer and analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAZUKI SASAKI ET AL.: "Proteome Kaisei Kaiseki to Metabolome Kaiseki 4. Peptidome Kaiseki no Genjo to Tenbo", EXPERIMENTAL MEDICINE, vol. 23/4, 2005, pages 585 - 592 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021510829A (en) * 2018-02-26 2021-04-30 レコ コーポレイションLeco Corporation A method for classifying library hits in mass spectrometry
JP7108697B2 (en) 2018-02-26 2022-07-28 レコ コーポレイション Methods for Ranking Candidate Analytes
JP2019200183A (en) * 2018-05-18 2019-11-21 株式会社島津製作所 Method for creating spectra library for endogenous peptide identification, endogenous peptide identification method and endogenous peptide identification device

Also Published As

Publication number Publication date
US20190041393A1 (en) 2019-02-07
JPWO2017047580A1 (en) 2018-05-31
JP6489224B2 (en) 2019-03-27

Similar Documents

Publication Publication Date Title
JP5750676B2 (en) Cell identification device and program
JP6362611B2 (en) System and method for identifying compounds from MS / MS data without using precursor ion information
He et al. ADEPTS: advanced peptide de novo sequencing with a pair of tandem mass spectra
JP2010256101A (en) Method and device for analyzing glycopeptide structure
US10796784B2 (en) Mass spectrometric data analyzing apparatus and analyzing method
EP3544016A2 (en) Methods for combining predicted and observed mass spectral fragmentation data
JP6489224B2 (en) Peptide assignment method and peptide assignment system
JP4821400B2 (en) Structural analysis system
US7691643B2 (en) Mass analysis method and mass analysis apparatus
JP5776443B2 (en) Modified protein identification method and identification apparatus using mass spectrometry
US9702882B2 (en) Method and system for analyzing mass spectrometry data
JP2015230262A (en) Mass analysis data analysis method and device
JP4702284B2 (en) Protein analysis method
JP5983371B2 (en) Peptide structure analysis method and apparatus
US20130144585A1 (en) Apparatus and method for idendificaton of protein modification
Khatun et al. Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification
WO2000073787A1 (en) An expert system for protein identification using mass spectrometric information combined with database searching
WO2004083233A2 (en) Peptide identification
JP2017096668A (en) Identification support method and identification support device for living matter derived substance
JP6003842B2 (en) Protein identification method and identification apparatus
KR100699437B1 (en) Apparatus and Method for Analysis of Amino Acid Sequence
JP2007010509A (en) Analysis supporting system and method
JP6962273B2 (en) A method for creating a spectrum library for identifying an endogenous peptide, a method for identifying an endogenous peptide, and a device for identifying an endogenous peptide.
JP2009168695A (en) Three-dimensional structure prediction method, three-dimensional structure prediction program, and mass spectroscope
Ramachandran et al. FPTMS: Frequency-based approach to identify the peptide from the low-energy collision-induced dissociation tandem mass spectra

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16846451

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017539911

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16846451

Country of ref document: EP

Kind code of ref document: A1