CN110706747B - Method and device for detecting tumor neoantigen polypeptide - Google Patents

Method and device for detecting tumor neoantigen polypeptide Download PDF

Info

Publication number
CN110706747B
CN110706747B CN201910878145.4A CN201910878145A CN110706747B CN 110706747 B CN110706747 B CN 110706747B CN 201910878145 A CN201910878145 A CN 201910878145A CN 110706747 B CN110706747 B CN 110706747B
Authority
CN
China
Prior art keywords
polypeptide
mutation
neoantigen
hla
tumor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910878145.4A
Other languages
Chinese (zh)
Other versions
CN110706747A (en
Inventor
周涛
陈利斌
郭璟
楼峰
曹善柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiangxin Medical Technology Co ltd
Tianjin Xiangxin Biotechnology Co ltd
Beijing Xiangxin Biotechnology Co ltd
Original Assignee
Beijing Xiangxin Medical Technology Co ltd
Tianjin Xiangxin Biotechnology Co ltd
Beijing Xiangxin Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiangxin Medical Technology Co ltd, Tianjin Xiangxin Biotechnology Co ltd, Beijing Xiangxin Biotechnology Co ltd filed Critical Beijing Xiangxin Medical Technology Co ltd
Priority to CN201910878145.4A priority Critical patent/CN110706747B/en
Publication of CN110706747A publication Critical patent/CN110706747A/en
Application granted granted Critical
Publication of CN110706747B publication Critical patent/CN110706747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention provides a method and a device for detecting tumor neoantigen polypeptide. The method comprises the following steps: obtaining somatic mutation and germ line mutation of tumor tissues; performing HLA typing by using the sequencing data of the tumor control blood cell sample to obtain an HLA typing result; predicting the neoantigen polypeptide by using the HLA typing result to perform somatic mutation and germ line mutation to obtain candidate neoantigen polypeptide; and (4) scoring and sequencing the candidate neoantigen polypeptides, wherein the polypeptide with the highest score is the neoantigen polypeptide. The neoantigen polypeptide is predicted by obtaining the sum of the mutations including the two parts of sources, the sources of the mutations are more comprehensive, and the prediction result is relatively more accurate. The predicted candidate neoantigen polypeptides are subjected to scoring and sequencing, and the subsequent neoantigen polypeptides with the highest score are used as the neoantigen polypeptides, so that more accurate neoantigen polypeptides can be obtained conveniently, and the guiding significance of subsequent immunotherapy medication is further improved.

Description

Method and device for detecting tumor neoantigen polypeptide
Technical Field
The invention relates to the field of gene sequencing data analysis, in particular to a method and a device for detecting tumor neoantigen polypeptide.
Background
In recent years, the tumor immunotherapy is continuously heated, new clinical tests are continuously reported, the effective remission rate and the cure rate are greatly improved, and accurate and rapid tumor neoantigen detection is the most basic and important work.
At present, the detection method of the tumor neoantigen is mainly based on the total exon data of tumor and normal tissues, the GATK is used for carrying out somatic mutation detection, and the NetMHCpan is used for predicting the MHC-I binding capacity of the detection result. The method has incompleteness, and the detected tumor neoantigen has no good or bad sequence, which brings great trouble to the subsequent immunotherapy experiment.
Disclosure of Invention
The invention mainly aims to provide a method and a device for detecting tumor neoantigen polypeptide, so as to solve the problems of incomplete and inaccurate detection result of tumor neoantigen in the prior art.
In order to achieve the above object, according to one aspect of the present invention, there is provided a method for detecting a tumor neoantigen polypeptide, the method comprising: obtaining somatic mutation and germ line mutation of tumor tissues; performing HLA typing by using the sequencing data of the tumor control blood cell sample to obtain an HLA typing result; predicting the neoantigen polypeptide by using the HLA typing result to perform somatic mutation and germ line mutation to obtain candidate neoantigen polypeptide; and (4) scoring and sequencing the candidate neoantigen polypeptides, wherein the polypeptide with the highest score is the neoantigen polypeptide.
Further, prior to using HLA typing results for neoantigen polypeptide prediction of somatic and germline mutations, the method further comprises: somatic cell specific mutation and germline mutation are combined and annotated with VEP to obtain the gene, transcript and polypeptide segment causing change in each mutation.
Further, performing neoantigen polypeptide prediction on somatic and germline mutations includes: carrying out MHC affinity test and scoring on polypeptide fragments changed by each mutation in somatic mutation and germ line mutation to obtain the sum score of each polypeptide fragment; wherein the MHC affinity test comprises: (1) polypeptide fragments of 9-11 amino acids in length are used: (2) polypeptide fragments in a plurality of different positions; (3) various affinity test methods were used: the test method comprises at least one of the following steps: MHCflurry, MHCnggetSI, NNalign, and NetMHC.
Further, the MHC affinity test further comprises at least one of: (4) polypeptide fragments employing multiple transcripts; (5) a variety of different HLA typing is employed.
Further, scoring the candidate neoantigen polypeptides comprises: and (3) sequencing the sum score of each polypeptide fragment in the candidate neoantigen polypeptides, wherein the polypeptide with the highest score is the neoantigen polypeptide.
Further, performing HLA typing by using the sequencing data of the tumor control blood cell sample, and obtaining an HLA typing result comprises: carrying out sequence comparison on the sequencing data of the tumor control blood cell sample and known HLA alleles in an IMGT/HLA database to obtain a comparison matrix; carrying out merging, deleting and sorting on the comparison matrix to obtain a sorting matrix; and processing the sorting matrix by adopting an optimization problem algorithm to obtain an HLA typing result.
Further, the comparison matrix comprises a plurality of columns and a plurality of rows, the columns are all the HLA types, the rows are all the reads, and the merging, deleting and sorting of the comparison matrix comprises: merging the same rows to obtain a weight row; for any two columns, namely, a column and b column, if the b column completely contains reads of the a column and the b column also contains other reads different from the reads of the a column, the a column is deleted.
Further, an ILP optimization algorithm is adopted to solve the rationality matrix, and an HLA typing result is obtained.
Furthermore, the sequencing data of the tumor control blood cell sample is subjected to sequence alignment with known HLA alleles in an IMGT/HLA database by adopting Optitype software to obtain an alignment matrix.
Further, obtaining somatic and germline mutations of tumor tissue includes: carrying out paired somatic cell detection by utilizing the tumor control blood cell sample and the tumor tissue sample to obtain somatic cell mutation of the tumor tissue; and (3) performing germ line mutation detection by using a tumor control blood cell sample and using GATK to obtain the germ line mutation.
In order to achieve the above object, according to one aspect of the present invention, there is provided a device for detecting a tumor neoantigen polypeptide, the device comprising: the system comprises an acquisition module, an HLA typing module, a candidate neoantigen prediction module and a neoantigen prediction module; the acquisition module is used for acquiring somatic mutation and germ line mutation of the tumor tissue; the HLA typing module is used for carrying out HLA typing by utilizing the sequencing data of the tumor control blood cell sample to obtain an HLA typing result; the candidate neoantigen prediction module is used for predicting neoantigen polypeptides by utilizing HLA typing results to perform somatic mutation and germ line mutation to obtain candidate neoantigen polypeptides; and the neoantigen prediction module is used for scoring and sequencing the candidate neoantigen polypeptides and marking the polypeptide with the highest score as the neoantigen polypeptide.
Further, the apparatus further comprises: mutation merging and annotation module: the method is used for merging somatic cell specific mutation and embryonic line mutation, and adopting VEP for annotation to obtain the gene, transcript and polypeptide segment causing change of each mutation.
Further, the candidate neoantigen prediction module: the MHC affinity testing module is used for carrying out MHC affinity testing and scoring on polypeptide fragments changed due to each mutation in somatic mutation and germline mutation to obtain the sum score of each polypeptide fragment; wherein the MHC affinity test comprises: (1) polypeptide fragments of 9-11 amino acids in length are used: (2) polypeptide fragments in a plurality of different positions; (3) various affinity test devices were employed: the testing device comprises at least one of the following components: MHCflurry, MHCnggetSI, NNalign, and NetMHC.
Further, the MHC affinity test further comprises at least one of: (4) polypeptide fragments employing multiple transcripts; (5) a variety of different HLA typing is employed.
Further, the neoantigen prediction module: and the method is used for sequencing the sum score of each polypeptide fragment in the candidate neoantigen polypeptide, and the polypeptide with the highest score is the neoantigen polypeptide.
Further, the HLA typing module includes: the system comprises a comparison unit, a merging and deleting unit and an HLA typing unit, wherein the comparison unit is used for carrying out sequence comparison on sequencing data of a tumor control blood cell sample and known HLA alleles in an IMGT/HLA database to obtain a comparison matrix; the merging and deleting unit is used for merging, deleting and sorting the comparison matrix to obtain a sorted matrix; and the HLA typing unit is used for processing the arrangement matrix by adopting an optimization problem algorithm to obtain an HLA typing result.
Further, the alignment matrix includes a plurality of columns and a plurality of rows, the plurality of columns are all HLA types, the plurality of rows are all reads, and the merge deletion unit includes: a merging subunit and a deleting subunit, where the merging subunit is configured to merge the same rows (the same means completely the same here) to obtain a weight row (the weight row records the number of repeats of reads); and the deleting subunit is used for deleting the a column and the b column of any two columns, if the b column completely contains reads of the a column and the b column also contains other reads different from the reads of the a column.
Further, the HLA typing unit includes: and the ILP optimization module is used for solving the rationality matrix by utilizing an ILP optimization algorithm to obtain an HLA typing result.
Further, the alignment unit is Optitype software.
Further, the acquisition module includes: a somatic cell mutation acquisition unit and an embryonic line mutation acquisition unit, wherein the somatic cell mutation acquisition unit is used for carrying out paired somatic cell detection by utilizing a tumor control blood cell sample and a tumor tissue sample to obtain the somatic cell mutation of the tumor tissue; and the germ line mutation acquisition unit is used for carrying out germ line mutation detection by using the GATK through utilizing the tumor control blood cell sample to obtain the germ line mutation.
According to another aspect of the present invention, there is provided a storage medium comprising a stored program, wherein the program, when executed, controls a device on which the storage medium is located to perform any of the above-described methods for detecting a tumor neoantigen polypeptide.
According to another aspect of the present invention, there is provided a processor for executing a program, wherein the program is executed for performing any of the above-mentioned methods for detecting a tumor neoantigen polypeptide.
By applying the technical scheme of the invention, the prediction of the neoantigen polypeptide is carried out by acquiring the sum of the mutations of the somatic mutation and the germ line mutation of the tumor tissue, the sources of the mutations are more comprehensive, and therefore, the prediction result is relatively more accurate. In addition, the method also scores and orders the predicted candidate neoantigen polypeptides, and takes the subsequent neoantigen polypeptide with the highest score as the neoantigen polypeptide according to the scoring result, so that more accurate neoantigen polypeptide can be obtained conveniently, and the guiding significance of subsequent immunotherapy medication is further improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 shows a schematic flow diagram of a method for detecting a tumor neoantigen polypeptide according to a preferred embodiment of the present application;
FIG. 2 shows a detailed schematic flow chart of a method for detecting a tumor neoantigen polypeptide according to a preferred embodiment of the present application; and
FIG. 3 shows a schematic structural diagram of an apparatus for detecting a tumor neoantigen polypeptide according to a preferred embodiment of the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail with reference to examples.
HLA: human leukocyte antigen, the expression product of the human Major Histocompatibility Complex (MHC), is an alloantigen with high polymorphism.
The IMGT/HLA database contains mainly sequence information on alleles of HLA type I and type II genes, and also contains some alleles of non-HLA genes (Allele).
The HLA type I gene mainly comprises classical HLA-A, HLA-B, HLA-C and other genes, and also comprises partial pseudogenes.
HLA type II genes mainly comprise DR, DQ, DP, DO and DM series genes.
For the different alleles, in addition to the standard nomenclature specified by the WHO commission, an HLA number is also provided. For example, HLA00001, a × 01: 01: 01: 01; HLA02169, a × 01: 01: 01: 02N; HLA01244, a × 01: 01: 02. and A, 01: 01: 01: 02N for example, the HLA allele is divided into four fields and finally includes a modification suffix, for a total of 5 fields.
The database provides various formats such as fasta, msf, pir and the like, and provides sequences of DNA, RNA and protein at three different levels.
Example 1
In a preferred embodiment of the present application, a method for detecting a tumor neoantigen polypeptide is provided, and fig. 1 is a flowchart of a method for detecting a tumor neoantigen polypeptide according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S101, obtaining somatic mutation and germ line mutation of tumor tissues;
step S102, performing HLA typing by using the sequencing data of the tumor control blood cell sample to obtain an HLA typing result;
step S103, utilizing HLA typing results to carry out neoantigen polypeptide prediction on somatic mutation and germ line mutation to obtain candidate neoantigen polypeptides;
and step S104, scoring and sequencing the candidate neoantigen polypeptides, wherein the polypeptide with the highest score is the neoantigen polypeptide.
According to the method, the neoantigen polypeptide is predicted by acquiring the sum of the mutations of the somatic mutation and the germ line mutation of the tumor tissue, the sources of the mutations are more comprehensive, and therefore the prediction result is relatively more accurate. In addition, the method also scores and orders the predicted candidate neoantigen polypeptides, and takes the subsequent neoantigen polypeptide with the highest score as the neoantigen polypeptide according to the scoring result, so that more accurate neoantigen polypeptide can be obtained conveniently, and the guiding significance of subsequent immunotherapy medication is further improved.
Before using HLA typing results to predict the neoantigen polypeptide of somatic mutation and germ line mutation, it is also necessary to obtain the polypeptide fragment caused by mutation and whether there is a change. The determination may specifically be made using existing methods. In a preferred embodiment, the method further comprises: somatic cell specific mutation and germline mutation are combined and annotated with VEP to obtain the gene, transcript and polypeptide segment causing change in each mutation.
In a preferred embodiment, the prediction of neoantigen polypeptides for somatic and germline mutations comprises: carrying out MHC affinity test and scoring on polypeptide fragments changed by each mutation in somatic mutation and germ line mutation to obtain the sum score of each polypeptide fragment; wherein the MHC affinity test comprises: (1) polypeptide fragments of 9-11 amino acids in length are used: (2) polypeptide fragments in a plurality of different positions; (3) various affinity test methods were used: the test method comprises at least one of the following steps: MHCflurry, MHCnggetSI, NNalign, and NetMHC.
In a further preferred embodiment, the MHC affinity test further comprises at least one of: (4) polypeptide fragments employing multiple transcripts; (5) a variety of different HLA typing is employed.
And (3) for the polypeptide fragment formed by each mutation, carrying out MHC affinity detection by adopting fragments with multiple lengths, wherein the length is generally selected from 9-11 amino acids. In addition to length variation, the mutation may be selected differently at a particular position in the polypeptide fragment for a single mutation site; if the mutation is near a variable cleavage site, the polypeptide is set up from transcript to transcript. In addition, different HLA types will have different MHC affinities for a particular polypeptide fragment.
The polypeptide fragments are all from one mutation, although the sequences of the polypeptide fragments are similar, the MHC affinity of the polypeptide fragments are different, so that the MHC affinity analysis of the polypeptide fragments with multiple mutation selection dimensions can more comprehensively analyze whether one mutation possibly constitutes a new antigen polypeptide. In addition, MHC affinity testing has used a number of methods including MHCflurry, MHCnuggetsI, NNalign, NetMHC, and performing MHC affinity testing on the above polypeptides from one mutation will result in a number of affinity scores, and each polypeptide is summed and scored. And a plurality of algorithms are adopted for affinity analysis, so that the robustness of the analysis is improved, and the reliability of the analysis result is improved.
In a preferred embodiment, scoring the candidate neoantigen polypeptides comprises: and (3) sequencing the sum score of each polypeptide fragment in the candidate neoantigen polypeptides, wherein the polypeptide with the highest score is the neoantigen polypeptide.
The HLA typing using the sequencing data of the tumor control blood cell sample can be performed by the existing method, for example, by performing HLA typing on the tumor control blood cell sample by using BWA-HLA and Polysolver software, and then taking the intersection of the two typing results. In a preferred embodiment, the HLA typing is performed using sequencing data from a tumor control blood cell sample, and obtaining the HLA typing result comprises: carrying out sequence comparison on the sequencing data of the tumor control blood cell sample and known HLA alleles in an IMGT/HLA database to obtain a comparison matrix; carrying out merging, deleting and sorting on the comparison matrix to obtain a sorting matrix; and processing the sorting matrix by adopting an optimization problem algorithm to obtain an HLA typing result.
In a preferred embodiment, the alignment matrix includes a plurality of columns and a plurality of rows, the plurality of columns being all HLA types, the plurality of rows being all reads, the merge deletion unit includes: a merging subunit and a deleting subunit, where the merging subunit is configured to merge the same rows (the same means completely the same here) to obtain a weight row (the weight row records the number of repeats of reads); and the deleting subunit is used for deleting the a column and the b column of any two columns, if the b column completely contains reads of the a column and the b column also contains other reads different from the reads of the a column.
In a preferred embodiment, an ILP optimization algorithm is used to solve the sorting matrix to obtain the HLA typing result.
In a preferred embodiment, the sequencing data of the tumor control blood cell samples are aligned to known HLA alleles in the IMGT/HLA database using Optitype software to generate an alignment matrix.
BWA-HLA is a far-aged method, with poor results, and intersection of WA-HLA with Polysolver is likely to miss the correct HLA typing results. While the typing accuracy was significantly higher with Optitype than the Polysolver software. Specifically, the software optitype (v2.1.0) and the sequencing fastq file of the blood cell sample of the tumor patient are used for HLA typing, and the typing result can be obtained.
The specific acquisition mode of somatic mutation and germ line mutation can be realized by adopting the existing method. In a preferred embodiment, obtaining somatic and germline mutations of tumor tissue comprises: carrying out paired somatic cell detection by utilizing the tumor control blood cell sample and the tumor tissue sample to obtain somatic cell mutation of the tumor tissue; and (3) performing germ line mutation detection by using a tumor control blood cell sample and using GATK to obtain the germ line mutation.
Specifically, paired somatic mutation was performed on tumor tissue and blood cells of tumor patients using the mutct 2 module of GATK (v4.0.5.1) to obtain a somatic mutation vcf file, and mutations that passed the filter criteria were selected for subsequent use. The blood cells of the tumor patients are subjected to germ line mutation detection by using a HaplotpypeCaller module of GATK (v4.0.5.1), a germ line mutation vcf file is obtained, and the mutation passing the filtering standard is selected for subsequent use. The above filtering criteria are typically mutation frequencies greater than 10% and reads support numbers greater than 100.
In a preferred embodiment of the present application, as shown in fig. 2, there is provided a method for detecting tumor immunity neoantigen polypeptide, comprising:
step A, using a fastq file of a sequencing result of a tumor control blood cell sample, comparing reads in the fastq file to known HLA sequences of type I and type II in an IMGT/HLA database, and obtaining an alignment matrix.
And step B, after the matrixes are merged, deleted and sorted in a comparison mode, an optimization problem algorithm is used for achieving the HLA typing result.
And step C, carrying out paired somatic cell detection by utilizing the tumor control blood cell sample and the cancer tissue sample to obtain the somatic cell variation of the tumor tissue.
And D, performing germ line mutation detection by using the tumor control blood cell sample and using GATK to obtain the germ line mutation of the patient.
And E, merging the somatic cell specific mutation and the germ line mutation, and performing annotation by using VEP to obtain a polypeptide fragment corresponding to the mutation.
And F, utilizing the HLA typing result and the combined mutation to predict the neoantigens, and obtaining an initial neoantigen list (namely candidate neoantigens).
And G, scoring and sequencing the predicted polypeptides by using the mutation frequency and the affinity of the polypeptide fragments and MHC molecules in the initial neoantigen list to obtain a final neoantigen list after scoring and sequencing, and further taking the newly antigen with the highest score as the neoantigen.
Example 2
The target is as follows: tumor patients (sample ID 180504502TT1) were tested for neogenetic tumor antigens.
The method comprises the following steps:
1. HLA typing was performed using the software optitype (v2.1.0) and sequencing fastq files of blood cell samples from tumor patients to obtain typing results (see Table 1).
2. Paired somatic mutation was performed on tumor tissue and blood cells of tumor patients using the Mutect2 module of GATK (v4.0.5.1) to obtain a somatic mutation vcf file, and mutations that passed the filter criteria were selected for subsequent use.
3. The blood cells of the tumor patients are subjected to germ line mutation detection by using a HaplotpypeCaller module of GATK (v4.0.5.1), a germ line mutation vcf file is obtained, and the mutation passing the filtering standard is selected for subsequent use.
4. The somatic mutation file was merged with the Germline mutation file using the combinanevarients module of GATK (v4.0.5.1).
5. The merged file was annotated with vep (v94.5) to obtain the gene where the mutation was located, the transcript, and the polypeptide fragment that resulted in the change.
6. The HLA typing results and the annotated mutation files were subjected to neoantigen detection and scoring ranking using the pvacceq module of pvactols (v1.3.4), and the ranking results are shown in Table 2.
Table 1: HLA typing results
A B C
A*24:03 B*46:01 C*01:02
Table 2: ranking results of neoantigen scoring
Figure BDA0002205020330000081
As can be seen from table 2, the most highly scored neoantigen polypeptide is a mutated antigen sequence on chromosome 1.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) or a processor, and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In response to the above manner, the present application also provides a device for detecting tumor neoantigen polypeptide, which is used to implement the above embodiments and preferred embodiments, and the description of which is already given is not repeated. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
This is further illustrated below in connection with alternative embodiments.
Example 3
In this embodiment, there is also provided a device for detecting a tumor neoantigen polypeptide, as shown in fig. 3, the device comprising: the system comprises an acquisition module 10, an HLA typing module 20, a candidate neoantigen prediction module 30 and a neoantigen prediction module 40, wherein the acquisition module 10 is used for acquiring somatic mutation and germ line mutation of tumor tissues; the HLA typing module 20 is used for carrying out HLA typing by utilizing the sequencing data of the tumor control blood cell sample to obtain an HLA typing result; a candidate neoantigen prediction module 30, configured to perform neoantigen polypeptide prediction on somatic mutation and germline mutation by using an HLA typing result, so as to obtain a candidate neoantigen polypeptide; and the neoantigen prediction module 40 is used for scoring and sequencing the candidate neoantigen polypeptides and marking the polypeptide with the highest score as the neoantigen polypeptide.
The device carries out prediction on the polypeptide of the neoantigen by acquiring the sum of the mutations of two parts of sources including somatic mutation and germ line mutation of tumor tissues, the sources of the mutations are more comprehensive, and therefore the prediction result is relatively more accurate. In addition, the method also scores and orders the predicted candidate neoantigen polypeptides, and takes the subsequent neoantigen polypeptide with the highest score as the neoantigen polypeptide according to the scoring result, so that more accurate neoantigen polypeptide can be obtained conveniently, and the guiding significance of subsequent immunotherapy medication is further improved.
Optionally, the apparatus further comprises: mutation merging and annotation module: the method is used for merging somatic cell specific mutation and embryonic line mutation, and adopting VEP for annotation to obtain the gene, transcript and polypeptide segment causing change of each mutation.
Optionally, the candidate neoantigen prediction module comprises: the MHC affinity testing module is used for carrying out MHC affinity testing and scoring on polypeptide fragments changed due to each mutation in somatic mutation and germline mutation to obtain the sum score of each polypeptide fragment; wherein the MHC affinity test comprises: (1) polypeptide fragments of 9-11 amino acids in length are used: (2) polypeptide fragments in a plurality of different positions; (3) various affinity test devices were employed: the testing device comprises at least one of the following components: MHCflurry, MHCnggetSI, NNalign, and NetMHC.
Optionally, the MHC affinity test further comprises at least one of: (4) polypeptide fragments employing multiple transcripts; (5) a variety of different HLA typing is employed.
Optionally, the neoantigen prediction module: and the method is used for sequencing the sum score of each polypeptide fragment in the candidate neoantigen polypeptide, and the polypeptide with the highest score is the neoantigen polypeptide.
Optionally, the HLA typing module comprises: the system comprises a comparison unit, a merging and deleting unit and an HLA typing unit, wherein the comparison unit is used for carrying out sequence comparison on sequencing data of a tumor control blood cell sample and known HLA alleles in an IMGT/HLA database to obtain a comparison matrix; the merging and deleting unit is used for merging, deleting and sorting the comparison matrix to obtain a sorted matrix; and the HLA typing unit is used for processing the arrangement matrix by adopting an optimization problem algorithm to obtain an HLA typing result.
Optionally, the alignment matrix includes a plurality of columns and a plurality of rows, the plurality of columns are all HLA types, the plurality of rows are all reads, and the merge deletion unit includes: a merging subunit and a deleting subunit, where the merging subunit is configured to merge the same rows (the same means completely the same here) to obtain a weight row (the weight row records the number of repeats of reads); and the deleting subunit is used for deleting the a column and the b column of any two columns, if the b column completely contains reads of the a column and the b column also contains other reads different from the reads of the a column.
Optionally, the HLA typing unit comprises: and the ILP optimization module is used for solving the rationality matrix by utilizing an ILP optimization algorithm to obtain an HLA typing result.
Optionally, the alignment unit is Optitype software.
Optionally, the obtaining module includes: a somatic cell mutation acquisition unit and an embryonic line mutation acquisition unit, wherein the somatic cell mutation acquisition unit is used for carrying out paired somatic cell detection by utilizing a tumor control blood cell sample and a tumor tissue sample to obtain the somatic cell mutation of the tumor tissue; and the germ line mutation acquisition unit is used for carrying out germ line mutation detection by using the GATK through utilizing the tumor control blood cell sample to obtain the germ line mutation.
From the above description, it can be seen that the above-described embodiments of the present invention achieve the following technical effects: the neoantigen polypeptide is predicted by acquiring the sum of the mutations of the somatic mutation and the germ line mutation of the tumor tissue, so that the sources of the mutations are more comprehensive, and the prediction result is relatively more accurate. In addition, the method also scores and orders the predicted candidate neoantigen polypeptides, and takes the subsequent neoantigen polypeptide with the highest score as the neoantigen polypeptide according to the scoring result, so that more accurate neoantigen polypeptide can be obtained conveniently, and the guiding significance of subsequent immunotherapy medication is further improved.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A method for detecting a tumor neoantigen polypeptide, the method comprising:
obtaining somatic mutation and germ line mutation of tumor tissues;
performing HLA typing by using the sequencing data of the tumor control blood cell sample to obtain an HLA typing result;
predicting the neoantigen polypeptide by using the HLA typing result to perform the somatic mutation and the germ line mutation to obtain candidate neoantigen polypeptide;
scoring and sequencing the candidate neoantigen polypeptides to obtain the neoantigen polypeptide with the highest score;
performing HLA typing by using the sequencing data of the tumor control blood cell sample, and obtaining an HLA typing result comprises the following steps:
carrying out sequence comparison on the sequencing data of the tumor control blood cell sample and known HLA alleles in an IMGT/HLA database to obtain a comparison matrix;
merging, deleting and sorting the comparison matrix to obtain a sorted matrix;
processing the sorting matrix by adopting an optimization problem algorithm to obtain the HLA typing result;
the comparison matrix comprises a plurality of columns and a plurality of rows, the columns are all HLA types, the rows are all reads, and the merging, deleting and sorting of the comparison matrix comprises the following steps:
merging the same rows to obtain a weight row;
for any two columns, namely a column a and b column, if the b column completely contains reads of the a column and the b column also contains other reads different from the reads of the a column, deleting the a column.
2. The method of claim 1, wherein prior to using the HLA typing results for neoantigen polypeptide prediction of somatic and germline mutations, the method further comprises:
combining the somatic cell specific mutation and the embryonic line mutation, and adopting VEP for annotation to obtain the gene, the transcript and the polypeptide segment causing the change of each mutation.
3. The method of claim 2, wherein performing neoantigen polypeptide prediction of somatic and germline mutations comprises:
performing MHC affinity test and scoring on the polypeptide fragments changed by each of the somatic mutation and the germline mutation to obtain the sum score of each polypeptide fragment;
wherein the MHC affinity test comprises:
(1) polypeptide fragments of 9-11 amino acids in length are used:
(2) polypeptide fragments in a plurality of different positions;
(3) various affinity test methods were used: the test method comprises at least one of the following steps: MHCflurry, MHCnggetSI, NNalign, and NetMHC.
4. The method of claim 3, wherein the MHC affinity test further comprises at least one of: (4) polypeptide fragments employing multiple transcripts; (5) a variety of different HLA typing is employed.
5. The method of claim 3, wherein scoring the candidate neoantigen polypeptides comprises:
and sequencing the sum score of each polypeptide fragment in the candidate neoantigen polypeptide, wherein the polypeptide with the highest score is the neoantigen polypeptide.
6. The method of claim 1, wherein the sorting matrix is solved using an ILP optimization algorithm to obtain the HLA-typing results.
7. The method of claim 1, wherein the alignment matrix is obtained by aligning the sequencing data of the tumor control blood cell sample with known HLA alleles in an IMGT/HLA database using Optitype software.
8. The method of claim 1, wherein obtaining somatic and germline mutations in tumor tissue comprises:
carrying out paired somatic cell detection by utilizing a tumor control blood cell sample and a tumor tissue sample to obtain the somatic cell mutation of the tumor tissue;
and (3) performing germ line mutation detection by using a tumor control blood cell sample and using GATK to obtain the germ line mutation.
9. A device for detecting a tumor neoantigen polypeptide, the device comprising:
the acquisition module is used for acquiring somatic mutation and germ line mutation of the tumor tissue;
the HLA typing module is used for carrying out HLA typing by utilizing the sequencing data of the tumor control blood cell sample to obtain an HLA typing result;
the candidate neoantigen prediction module is used for predicting neoantigen polypeptides by using the HLA typing result to the somatic mutation and the germ line mutation to obtain candidate neoantigen polypeptides;
the neoantigen prediction module is used for scoring and sequencing the candidate neoantigen polypeptides and recording the polypeptide with the highest score as the neoantigen polypeptide;
the HLA typing module comprises:
the comparison unit is used for carrying out sequence comparison on the sequencing data of the tumor control blood cell sample and known HLA alleles in an IMGT/HLA database to obtain a comparison matrix;
the merging and deleting unit is used for merging, deleting and sorting the comparison matrix to obtain a sorting matrix;
the HLA typing unit is used for processing the sorting matrix by adopting an optimization problem algorithm to obtain the HLA typing result;
wherein the comparison matrix comprises a plurality of columns and a plurality of rows, the columns are all HLA types, the rows are all reads, and the merge deletion unit comprises:
a merging subunit, configured to merge the same rows to obtain a weight row; and
and the deleting subunit is used for deleting the a column and the b column of any two columns, if the b column completely contains reads of the a column and the b column also contains other reads different from the reads of the a column.
10. The apparatus of claim 9, further comprising:
mutation merging and annotation module: and the method is used for merging the somatic cell specific mutation and the germ line mutation, and performing annotation by adopting VEP (VEP) to obtain the gene, the transcript and the polypeptide segment causing the change of each mutation.
11. The apparatus of claim 10, wherein the candidate neoantigen prediction module:
an MHC affinity testing module, which is used for carrying out MHC affinity testing and scoring on the polypeptide fragments changed by each mutation in the somatic mutation and the germ line mutation to obtain the sum score of each polypeptide fragment;
wherein the MHC affinity test comprises:
(1) polypeptide fragments of 9-11 amino acids in length are used:
(2) polypeptide fragments in a plurality of different positions;
(3) various affinity test devices were employed: the testing device comprises at least one of the following components: MHCflurry, MHCnggetSI, NNalign, and NetMHC.
12. The device of claim 11, wherein the MHC affinity test further comprises at least one of: (4) polypeptide fragments employing multiple transcripts; (5) a variety of different HLA typing is employed.
13. The apparatus of claim 11, wherein the neoantigen prediction module: the polypeptide fragments are added and scored in sequence, and the polypeptide fragment with the highest score is the new antigen polypeptide.
14. The apparatus of claim 9, wherein the HLA typing unit comprises: and the ILP optimization module is used for solving the sorting matrix by utilizing an ILP optimization algorithm to obtain the HLA typing result.
15. The apparatus of claim 9, wherein the alignment unit is Optitype software.
16. The apparatus of claim 9, wherein the means for obtaining comprises:
a somatic cell mutation acquisition unit for performing paired somatic cell detection by using a tumor control blood cell sample and a tumor tissue sample to obtain the somatic cell mutation of the tumor tissue;
and an embryonic line mutation acquisition unit, which is used for carrying out embryonic line mutation detection by using the GATK by using the tumor control blood cell sample to obtain the embryonic line mutation.
17. A storage medium comprising a stored program, wherein the program, when executed, controls a device on which the storage medium is located to perform the method for detecting a tumor neoantigen polypeptide according to any one of claims 1 to 8.
18. A processor configured to execute a program, wherein the program is configured to execute the method for detecting a tumor neoantigen polypeptide according to any one of claims 1 to 8.
CN201910878145.4A 2019-09-17 2019-09-17 Method and device for detecting tumor neoantigen polypeptide Active CN110706747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910878145.4A CN110706747B (en) 2019-09-17 2019-09-17 Method and device for detecting tumor neoantigen polypeptide

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910878145.4A CN110706747B (en) 2019-09-17 2019-09-17 Method and device for detecting tumor neoantigen polypeptide

Publications (2)

Publication Number Publication Date
CN110706747A CN110706747A (en) 2020-01-17
CN110706747B true CN110706747B (en) 2021-09-07

Family

ID=69195946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910878145.4A Active CN110706747B (en) 2019-09-17 2019-09-17 Method and device for detecting tumor neoantigen polypeptide

Country Status (1)

Country Link
CN (1) CN110706747B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424740B (en) * 2022-09-30 2023-11-17 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011143656A2 (en) * 2010-05-14 2011-11-17 The General Hospital Corporation Compositions and methods of identifying tumor specific neoantigens
CN108388773A (en) * 2018-02-01 2018-08-10 杭州纽安津生物科技有限公司 A kind of identification method of tumor neogenetic antigen
CN108601731A (en) * 2015-12-16 2018-09-28 磨石肿瘤生物技术公司 Discriminating, manufacture and the use of neoantigen
CN109584960A (en) * 2018-12-14 2019-04-05 上海鲸舟基因科技有限公司 Predict the method, apparatus and storage medium of tumor neogenetic antigen

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011143656A2 (en) * 2010-05-14 2011-11-17 The General Hospital Corporation Compositions and methods of identifying tumor specific neoantigens
CN108601731A (en) * 2015-12-16 2018-09-28 磨石肿瘤生物技术公司 Discriminating, manufacture and the use of neoantigen
CN108388773A (en) * 2018-02-01 2018-08-10 杭州纽安津生物科技有限公司 A kind of identification method of tumor neogenetic antigen
CN109584960A (en) * 2018-12-14 2019-04-05 上海鲸舟基因科技有限公司 Predict the method, apparatus and storage medium of tumor neogenetic antigen

Also Published As

Publication number Publication date
CN110706747A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
Hundal et al. pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens
Guo et al. Illumina human exome genotyping array clustering and quality control
Gautier et al. Alternative mapping of probes to genes for Affymetrix chips
CN107849612B (en) Alignment and variant sequencing analysis pipeline
US20190332963A1 (en) Systems and methods for visualizing a pattern in a dataset
US20160078094A1 (en) Systems and Methods for Annotating Biomolecule Data
Marstrand et al. Identifying and mapping cell-type-specific chromatin programming of gene expression
CN110706742B (en) Pan-cancer tumor neoantigen high-throughput prediction method and application thereof
Jänes et al. A comparative study of RNA-seq analysis strategies
Schilder et al. echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline
CN114446389B (en) Tumor neoantigen feature analysis and immunogenicity prediction tool and application thereof
Zhang et al. Tools for fundamental analysis functions of TCR repertoires: a systematic comparison
Gensterblum-Miller et al. Novel transcriptional activity and extensive allelic imbalance in the human MHC region
CN110706747B (en) Method and device for detecting tumor neoantigen polypeptide
Ansari et al. A novel pathway analysis approach based on the unexplained disregulation of genes
da Silveira et al. Molecular profiling of RNA tumors using high-throughput RNA sequencing: From raw data to systems level analyses
KR101770962B1 (en) A method and apparatus of providing information on a genomic sequence based personal marker
Meyer et al. ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq
CN114067908B (en) Method, device and storage medium for evaluating single-sample homologous recombination defects
Zhang et al. Analysis of TCR β CDR3 sequencing data for tracking anti-tumor immunity
Battaglia Neoantigen prediction from genomic and transcriptomic data
Croning et al. Automated design of genomic Southern blot probes
Andreatta et al. T Cell Clonal Analysis Using Single-cell RNA Sequencing and Reference Maps
Arsenijevic et al. Reproducible, scalable fusion gene detection from RNA-seq
Servant Bioinformatics Methods for ChIP-seq Histone Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant