CN114005490B - Circulating tumor DNA fusion detection method based on second-generation sequencing technology - Google Patents

Circulating tumor DNA fusion detection method based on second-generation sequencing technology Download PDF

Info

Publication number
CN114005490B
CN114005490B CN202111640988.4A CN202111640988A CN114005490B CN 114005490 B CN114005490 B CN 114005490B CN 202111640988 A CN202111640988 A CN 202111640988A CN 114005490 B CN114005490 B CN 114005490B
Authority
CN
China
Prior art keywords
sequence
read
sequencing
dna
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111640988.4A
Other languages
Chinese (zh)
Other versions
CN114005490A (en
Inventor
姬晓勇
汪彦荣
潘晓西
高司航
王欢欢
伍启熹
王建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youxun Medical Devices Co ltd
Original Assignee
Beijing Youxun Medical Devices Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youxun Medical Devices Co ltd filed Critical Beijing Youxun Medical Devices Co ltd
Priority to CN202111640988.4A priority Critical patent/CN114005490B/en
Publication of CN114005490A publication Critical patent/CN114005490A/en
Application granted granted Critical
Publication of CN114005490B publication Critical patent/CN114005490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1003Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor
    • C12N15/1006Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor by means of a solid support carrier, e.g. particles, polymers
    • C12N15/1013Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor by means of a solid support carrier, e.g. particles, polymers by using magnetic beads
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Plant Pathology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the field of gene detection, in particular to a circulating tumor DNA fusion detection method based on a second-generation sequencing technology. The invention provides a gene fusion detection method based on a segmentation sequence, which does not involve sequence assembly in the whole calculation process and is beneficial to saving the running speed; when the sequence-digital conversion method is adopted to compare the similarity of the sequences, the running speed is faster. In addition, the invention also optimizes the extraction method and the damage repair method, which is further favorable for the detection effect of the fusion gene.

Description

Circulating tumor DNA fusion detection method based on second-generation sequencing technology
Technical Field
The invention relates to the field of gene detection, in particular to a circulating tumor DNA fusion detection method based on a second-generation sequencing technology.
Background
As early as 1948, Mandel and mantais first reported the presence of free nucleotides (cfNA) in human blood. The importance of cfDNA was not appreciated by scientists in the early days of reporting, until a mutation in RAS gene was detected in the blood of cancer patients in 1994. With microsatellite variation detected in the blood of cancer patients by cell-free DNA (cell-free DNA), the potential research value of cfDNA is more and more remarkable due to the great investment of researchers in the research of cfNAs (DNA, mRNA, microRNAs) in the blood of cancer patients in the past decade.
Liquid Biopsy (Liquid Biopsy) has many advantages such as rapidness, convenience, and less trauma compared to conventional tissue Biopsy. It can be used by clinicians to monitor tumor response to treatment and predict tumor recurrence. In the long term, fluid biopsies can also help physicians find the earliest tumors when the patient is asymptomatic. Meanwhile, the content level of cfDNA not only reflects the growth progress of the tumor, but also can show related fluctuation change in normal human bodies. In general, malignant patients have higher cfDNA content than non-tumor patients, but one can still differentiate by quantification in benign lesions, inflammation, tissue trauma. Until now, the change of what physiological factors lead to the development and progress of cancer is still not thoroughly studied, but the development and progress of tumor can be monitored by studying free DNA (ctDNA) of circulating tumor cells, and related mutant genes can be known. In addition, circulating miRNAs have recently also been shown to be potential cancer biomarkers.
Cancer treatment modalities are diverse, including radiotherapy, chemotherapy, and emerging targeted and immunotherapy; among them, targeted therapy is an important component of precise medicine, and is particularly important for screening targeted gene-specific populations, and gene detection of cancer is directly related to the therapeutic efficiency of drugs on patients. Currently, the main sample types used for detection are tumor tissue or puncture samples; however, tissue samples are invasive procedures which are sometimes difficult to perform, and these procedures are affected by the size and location of the tumor, the general condition of the patient, etc., and sometimes a satisfactory tissue cannot be obtained, which has a series of limitations. Tumor cell necrosis and apoptosis of the shed tumor cells after entering blood release ctDNA into peripheral blood. Plasma free tumor dna (ctdna) has been recognized in recent years as a sample for which tumor-specific changes can be detected.
The ctDNA used for detecting the tumor mutation has the advantages that: non-invasive or minimally invasive to operate; is available in any course of the disease; can be used as a tumor marker to realize real-time detection and dynamic detection; overcoming the heterogeneity of tumor tissues. However, there are still some technical challenges to detect gene mutations using ctDNA, which mainly appear as: 1) ctDNA content varies from person to person, and is low in most people; 2) the ctDNA fragments are relatively small, most of the ctDNA fragments are about 180bp, and the fragments are distributed in 100 bp-400 bp; 3) the ratio of tumor-associated DNA in ctDNA varies greatly among people and is often difficult to detect because of the small ratio. These limit the wide application of ctDNA in tumor detection, and therefore, efficient and convenient extraction of ctDNA is an important factor affecting the wide application of ctDNA in tumor detection.
The existing methods for detecting the circulating tumor DNA mutation are many, but the method based on the second-generation sequencing is most applied, and the detection means is also most abundant. The most important technical implementation means in the method based on next generation sequencing are two: one is a target region capture or amplification method for high depth sequencing, and the other is a library-building sequencing method for adding molecular barcodes or molecular labels. The two methods are experimentally based on the conventional second-generation sequencing library construction method, and not only can effectively detect high-frequency mutation of circulating tumor DNA of a sample, but also can effectively detect low-frequency or even ultra-low-frequency mutation (> ═ 0.1%).
Gene fusion is ubiquitous in the genome, and is the process of forming a new gene from two unrelated genes by chromosomal translocation, deletion, or inversion. Many studies show that gene fusion is closely related to the occurrence and development of various diseases, especially cancers, and even is a direct cause of some cancers, so gene fusion also becomes an important research content in the current omics big data analysis. Therefore, gene fusion may be closely related to the occurrence and development of various cancers, and these fusion genes may also be potential drug targets, and it is very necessary to conduct intensive research on them.
The accuracy of the breakpoint coordinates of the result calculated by the current structural variation detection software is low, but when a verification experiment is carried out, not only is the accurate fracture position required to be known, so that the subsequent primer design is facilitated; and most require sequence assembly. Meanwhile, the current structure variation detection software has the characteristics of low detection speed, high resource requirement and the like.
Disclosure of Invention
The invention aims to provide a fusion detection method based on a segmentation sequence, which can provide accurate breakpoint coordinates without sequence assembly, thereby improving the running speed of fusion detection.
Specifically, the invention firstly provides a method for detecting fusion genes based on a next generation sequencing technology, which comprises the following steps:
comparing the sequencing sequence with a reference genome to obtain an original BAM file, removing the repeated sequence and the sequences compared to a plurality of positions to obtain a final BAM file for further detection; (ii) a
Extracting a read containing soft truncation from the final BAM file, and splitting the read into different read groups according to the broken coordinates and directions of the read;
comparing the reads of the same read group in pairs, removing the reads with too short length, too low sequence similarity with other reads of the read group or containing repeated sequences, further extracting the read with the highest matching degree with other reads for the next detection, and taking the number of the reads of the read group at the moment as the number of the reads supporting variation;
re-comparing the soft truncated part of the read sequence extracted in the last step to a reference genome, and if the comparison score is too low or the re-compared genome position is too close to the genome position compared with the original sequence, not performing the next detection;
annotating the original sequence alignment coordinates and the realignment coordinates; and (3) taking the depth corresponding to the original sequence comparison coordinate as the depth, taking the ratio of the number of the read segments supporting the variation to the depth as the variation frequency, and simultaneously outputting the variation frequency to a result file.
Preferably, the extracting and grouping of the reads including the soft truncation specifically includes:
determining the comparison mode of each sequencing read according to the cigar information of each sequencing read, wherein if the sequencing read has no soft truncation, the mode of the cigar is M, if the left side of the sequencing read has the soft truncation, the mode of the cigar is SM, if the right side of the sequencing read has the soft truncation, the mode of the cigar is MS, the original comparison chromosome and the coordinates of the sequencing read carrying the soft truncation are used as key, the sequence of the soft truncation part is used as value and read into a hash table, and the hash table simultaneously reserves the positive and negative chains of the sequencing sequence and the sequencing base quality information of the soft truncation part.
Preferably, the pairwise comparison of the reads in the same read group specifically includes:
(1) extracting a read with the highest matching degree between the read group and the rest reads, if the length of the read is L, extracting L sequences from the fracture part of the original read according to the step length of 1 base, and performing digital conversion on the sequences according to the following rules:
s1, constructing a binary sequence: respectively constructing A, T, C, G binary sequences according to a read, using the same base on 1 generation and the read and using different base on 0 generation and the read, and obtaining a binary sequence with the length of L for each base; then connecting 4 binary sequences with the length of L together according to the sequence of A, T, C, G to obtain 1 binary sequence with the length of 4L;
s2, first setting a second-order matrix
Figure 369029DEST_PATH_IMAGE001
Representing a 1, second order matrix
Figure 970912DEST_PATH_IMAGE002
Represents 0; then using the sequence of the 4L binary sequence obtained in the previous step to make matrix multiplication in turn, finally obtaining a second-order matrix, using said second-order matrix to make left multiplication on the weight matrix
Figure 430843DEST_PATH_IMAGE003
Obtaining a final matrix, calculating the trace of the matrix, and defining the trace as a sequence number of the sequence; calculating L serial numbers and storing the L serial numbers into an array;
(2) setting the initial values of the variables T and F as 0 respectively, sequentially traversing the original reads to compare to the residual soft truncation sequences of the same coordinate, calculating the sequence numbers of the sequences respectively, and judging whether the sequences exist in the array, if so, adding 1 to T, and if not, adding 1 to F; after traversing is finished, comparing values T and F, if T is larger than a threshold (default is 4) and is larger than a multiple of a set threshold (default is 0.5), considering that the soft truncation sequences of the group pass through filtering, taking the original comparison position and the T value of the sequencing sequence with the longest length as an ID, and outputting the sequence and the sequencing quality; wherein, the T value is the number of sequences supporting mutation.
When the sequence-digital conversion method is adopted to compare the similarity of the sequences, the running speed of the software is favorably improved, and the requirement of the software on resources is reduced.
In particular embodiments, the sequencing sequence may be pre-treated to remove low-quality, N-containing, and adapter-containing sequencing sequences prior to alignment with the reference genome.
As a preferred embodiment of the present invention, the method for detecting fusion gene based on the next generation sequencing technology comprises the following steps:
1. data preprocessing: removing sequencing sequences which are low in quality, contain N and contain an adapter by using fqtools software;
2. data alignment, duplication removal and extraction of a unique alignment sequence: comparing the sequencing sequence subjected to the data pretreatment in the previous step with the human genome hg19 to obtain a bam file, and removing the aligned repeated sequence and the sequences aligned to a plurality of positions by using Picard and Samtools respectively;
3. extraction of soft-truncated portions of the sequencing sequence that may carry a gene fusion signal: determining the comparison mode of each sequencing read according to the cigar information of each sequencing read, wherein if the sequencing read has no soft truncation, the mode of the cigar is M, if the left side of the sequencing sequence has soft truncation, the mode of the cigar is SM, if the right side of the sequencing sequence has soft truncation, the mode of the cigar is MS, the original comparison chromosome and the coordinates of the sequencing sequence carrying the soft truncation are taken as key, the sequence of the soft truncation part is taken as value and read into a hash table, and the hash table simultaneously reserves the positive and negative chains of the sequencing read and the quality information of the sequencing base of the soft truncation part;
4. filtering the extracted soft truncated information:
the original reads were compared to the sequences of all soft-truncated portions of the genome at the same coordinates, as follows:
(1) extracting a read with the highest matching degree between the read group and the rest reads, and if the length of the read is L, extracting L sequences from the step length of the read at the fracture of the original sequence by 1 base, and performing digital conversion on the sequences according to the following rules (taking the sequence with the length of L as an example):
s1, constructing a binary sequence: the sequence consisted of A, G, C, T4 bases, and 4 binary sequences were constructed from A, T, C, G, each representing the same base as the sequence by 1 and different bases by 0. Thus, a binary sequence of length L is obtained for each base. Then connecting 4 binary sequences with the length of L together according to the sequence of A, T, C, G to obtain 1 binary sequence with the length of 4L;
s2, first setting a second-order matrix
Figure 286935DEST_PATH_IMAGE001
Representing a 1, second order matrix
Figure 439699DEST_PATH_IMAGE002
Represents 0. Then using the sequence of the 4L binary sequence obtained in the previous step to make matrix multiplication in turn, finally obtaining a second-order matrix, using said second-order matrix to make left multiplication on the weight matrix
Figure 997719DEST_PATH_IMAGE003
Obtaining a final matrix, calculating the trace of the matrix, and defining the trace as a sequence number of the sequence;
because the sequence length is L, L sequence numbers can be calculated and stored into an array;
(2) setting the initial values of the variables T and F as 0 respectively, sequentially traversing the original sequence and comparing to the residual soft truncation sequence of the same coordinate, calculating the sequence numbers of the sequence numbers respectively, and judging whether the sequence numbers exist in the array, if so, adding 1 to T, and if not, adding 1 to F. After traversing, comparing values T and F, if T is more than 4 and more than 0.5 time of F, considering that the soft truncation sequence of the group passes through filtering, taking the original comparison position and the T value of the sequencing sequence with the longest length as ID, and outputting the sequence and the sequencing quality to a new file in a FASTQ format;
5. and (3) realigning: comparing the newly generated FASTQ format file to a reference genome by adopting BWA;
6. and filtering according to the re-comparison coordinates and the comparison quality: re-comparing the soft truncated part of the read sequence extracted in the last step to a reference genome, and if the comparison score is too low or the re-compared genome position is too close to the genome position compared with the original sequence, not performing the next detection; (ii) a
7. Coordinate position annotation: annotating the original sequence comparison coordinates and the re-comparison coordinates, and outputting the annotated original sequence comparison coordinates and the re-comparison coordinates to a result file;
8. variation frequency and depth: and taking the T value as the number of the reads supporting variation, taking the depth corresponding to the comparison coordinates of the original reads as the depth, taking the depth of the sequence number supporting variation as the variation frequency, and simultaneously outputting the sequence number supporting variation to a result file.
Although the detection method can reduce the requirement of software on resources, the better extraction efficiency is further favorable for detecting actual samples. When a fusion gene of ctDNA is detected, the conventional commercialized ctDNA extraction kit mainly comprises a column extraction method and a magnetic bead method, wherein the extraction efficiency of the column extraction method is high, but the column extraction method is greatly interfered by a sample matrix and needs complex instruments such as a centrifuge, a vacuum pump and the like; the magnetic bead method is simple and quick in extraction, complex instruments are not needed, manual operation can be reduced by combining an extraction workstation and the like, but the extraction efficiency of the current commercialized magnetic bead extraction kit is general, and the problem of extraction efficiency is urgently needed to be solved.
Therefore, the invention further optimizes the extraction method of ctDNA, and obtains the following preferable scheme.
Preferably, when detecting a ctDNA fusion gene, the method further comprises:
extracting ctDNA in a sample to be detected by using a nucleic acid precipitation aid, and sequencing to obtain a sequencing sequence;
the nucleic acid precipitation aid contains 1 mu g/mu L-5 mu g/mu L LCarrier RNA and 3 +/-0.5M sodium acetate.
The invention discovers that the nucleic acid precipitation promoter can improve the extraction efficiency of free circulating tumor DNA.
Preferably, the magnetic bead method is used for extracting ctDNA in a sample to be detected, and specifically comprises the following steps:
mixing a sample to be detected with a proteinase K solution, a magnetic bead suspension, a lysis binding solution and a nucleic acid precipitation aid to bind ctDNA to the magnetic beads, washing the magnetic beads, and finally eluting the ctDNA from the magnetic beads;
the lysis binding solution contains 1-10% of sodium dodecyl sulfate, 45mmol/L of Tris-HCl, 120mmol/L of NaCl, 30 mmol/L of disodium ethylene diamine tetraacetate, 10-30 mol/L of guanidine isothiocyanate, 2-4 mol/L of potassium acetate and 5-10 wt% of Tween 20, and the pH value is 4.8 +/-0.2.
Preferably, the proteinase K solution contains 45-75 mmol/L Tris-HCl and 100-120 mmol/L NaCl.
Preferably, the washing magnetic beads specifically include: washing the magnetic beads by the first washing solution and the second washing solution in sequence;
the first washing solution contains 45-75 mmol/L Tris-HCl, 100-120 mmol/L NaCl, 30-60 mmol/L disodium ethylene diamine tetraacetate and 1.5 +/-0.2 wt% triton, and the pH value is 5.5 +/-0.2;
the second washing solution comprises 45-75 mmol/L Tris-HCl and 75 +/-5 vol% ethanol.
Preferably, when elution is performed, the elution solution is nuclease-free water.
As a preferred embodiment, the method for extracting ctDNA from a sample to be tested specifically comprises: 1) dissolving proteinase K in a proteinase dissolving buffer solution in advance according to the proportion of 1 mg/500-1000 mu L to form a proteinase K solution, and adding 20 mu L of the proteinase K solution and 20 mu L of the magnetic bead suspension into a centrifugal tube; 2) transferring 300 mu L of serum or plasma sample to a centrifuge tube; 3) adding 450 mu L of lysis binding solution into a centrifuge tube, uniformly mixing for 30 seconds by vortex, adding 2 mu L of nucleic acid precipitation aid, and uniformly mixing for 15 minutes by reversing at room temperature; 4) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 5) adding 500 mu L of first washing solution, and uniformly mixing for 30 seconds by vortex; 6) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 7) adding 500 mu L of second washing solution, and uniformly mixing for 30 seconds by vortex; 8) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 9) repeating steps 7) and 8) once; 10) centrifuging, collecting liquid drops on the centrifugal pipe wall of the pipe, transferring the liquid drops to a magnetic frame, and sucking and discarding the solution; 11) air drying for 10 minutes; 12) adding 35 mu L of eluent, whirling to scatter the magnetic beads, standing at room temperature for 5-10 minutes, and shaking for 2-3 times during the period to accelerate DNA dissolution; 13) transferring the DNA to a magnetic frame, standing for 3 minutes, transferring the dissolved DNA to a new centrifugal tube for storage, wherein the dissolved DNA is the free circulating tumor DNA.
The ctDNA obtained by the ctDNA (4 mL of plasma sample) extracted by the existing method fluctuates within the range of 10-1000 ng, and in order to meet the purposes of low-frequency detection of some medication sites and even early screening of tumors, limited library building initial input is extremely important for effective library conversion of the ctDNA. In the prior art, the detection rate of 1% is detected, and the stable initial use amount of the database is more reliable than 50 ng. Even if more than 50ng is extracted, some samples still have a lot of samples which can not construct enough amount of library (at least 1000ng library) for hybridization capture. In order to satisfy hybrid capture, the number of cycles needs to be increased, which results in high repetition (Duplication) value of the later-stage biological information analysis result, and wastes the amount of the on-machine data and the cost thereof. A high proportion of samples are still lower than the initial amount of 50ng of database establishment, detection is carried out at the frequency of 0.5 percent or even lower, and the result is not ideal.
The invention discovers that certain unknown damage exists in ctDNA, the damage is repaired before the library is constructed, and the library construction is carried out on the repaired DNA, so that the conversion of partial libraries can be increased, the use amount of hybrid capture is met, the quality of the constructed library can be improved, the sequencing depth of target fragments is improved, the repeated sequence in output data is reduced, the detection accuracy is improved, and the computer-installing cost is reduced.
The following method is applicable not only to ctDNA, but also to DNA fragments obtained by breaking genomic DNA or otherwise damaged DNA fragments.
Preferably, when detecting a fusion gene of ctDNA or damaged DNA, the method further comprises:
repairing the extracted DNA by using a repair working solution, and then sequencing;
the repair working solution contains DNA damage repair enzyme, and the DNA damage repair enzyme is a mixture of UDG (Uracil-DNA glycosylase ), endonuclease IV and T4 PDG; in each mu L of repair working solution, the content of UDG is 3-4U, the content of endonuclease IV is 6-8U, and the content of T4 PDG is 6-8U.
The invention discovers that the DNA damage repair enzyme has better effect on DNA damage repair, meanwhile, the quantity of the library constructed by ctDNA is obviously increased, the sequencing depth of the target fragment in the constructed library is deeper, and the effective data quantity is relatively higher, so that the sequencing result is more accurate, and the detection of low-frequency mutation can be realized.
Preferably, the method for detecting fusion genes based on the next generation sequencing technology further comprises the following steps: and constructing the repaired DNA library to obtain the DNA library.
Further preferably, when constructing the DNA library, repairing the extracted DNA by using a repair working solution;
the repair working solution contains the DNA damage repair enzyme and the DNA end repair enzyme; the DNA end repair enzyme is a mixture of T4 DNA polymerase and PNK kinase; in each mu L of repair working solution, the content of T4 DNA polymerase is 50-100U, and the content of PNK kinase is 100-200U.
By the method, the DNA damage can be repaired in the terminal repairing step, so that the quantity of the constructed library is increased, and more effective data quantity is produced.
More preferably, the volume ratio of the dnase to the dnase is 1: 0.8 to 1.2, more preferably 1: 1, the repair reaction is more favorably carried out sufficiently.
In another preferred embodiment, the repair working solution further comprises a terminal repair buffer solution, wherein the terminal repair buffer solution can be selected from the existing terminal repair buffer system or can be prepared by selfThe application adopts a self-prepared buffer system, and the main components of the buffer system comprise Tris-HCl, dNTP, ATP and H2O。
More preferably, the repair is performed at 20-30 ℃; the repair time is preferably 20-30 min. The time range is repaired at the temperature, the repair success rate of the damaged DNA is high, and the efficiency is high.
The amount of DNA library constructed by the DNA subjected to damage repair is higher than that obtained by directly constructing the library. In addition, for samples in which direct library construction does not provide sufficient DNA library for hybrid capture, or unsuccessful library construction, the amount of library obtained by DNA repair methods before library construction is significantly increased.
The library constructed by the method can meet the requirement of library capture, the repeat sequence in the library output data is small, the effective data amount is high, and the detection of low-frequency mutation can be realized.
Aiming at the condition that the quantity of DNA obtained by the existing ctDNA extraction method can not meet the requirement of low-frequency mutation detection, one mode is to adopt the extraction method of the invention to improve the extraction efficiency, and the repair method provided by the invention can also be used for repairing the extracted DNA, thereby realizing good detection effect. In some cases, when the matrix of the sample to be tested is complex, the DNA is seriously damaged or the DNA content is low, the extraction method and the repair method can be used simultaneously to ensure the DNA quality before the machine is operated.
The above-described schemes can be combined by those skilled in the art to obtain preferred embodiments of the method of the present invention.
Based on the technical scheme, the invention has the beneficial effects that:
(1) the invention provides a gene fusion detection method based on a segmentation sequence, which does not involve sequence assembly in the whole calculation process and is beneficial to saving the running speed; when the sequence-digital conversion method is adopted to compare the similarity of the sequences, the running speed is faster.
(2) The extraction method provided by the invention can obviously improve the extraction efficiency and extraction yield of ctDNA, and is further beneficial to the detection effect of the fusion gene.
(3) The ctDNA repaired by the DNA damage repair system provided by the invention can increase part of library transformation after the library construction is carried out on the input of the current library construction initial amount, meets the dosage of hybrid capture, and can effectively ensure the result authenticity and reliability of 20ng of the library construction initial amount. Meanwhile, the library construction method can reduce 1-2 Pre-PCR cycles on the basis of the original process. The library is adopted to analyze the later period of the output data, so that the occupation amount of the repeated sequences is reduced, the effective data amount is improved, and the computer cost is reduced.
Drawings
FIG. 1 is a cross-sectional view of the sample at ALK as seen by IGV in example 1.
FIG. 2 is a cross-sectional view of the sample from example 1 as seen at EML4 by IGV.
FIG. 3 is a sequence obtained by segmentation in step 4 of example 1.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The examples do not show the specific techniques or conditions, according to the technical or conditions described in the literature in the field, or according to the product specifications. The reagents or instruments used are conventional products available from regular distributors, not indicated by the manufacturer.
Example 1
This example provides a method for detecting fusion gene based on next generation sequencing technology, taking a real sample data as an example, the sample carries a classical EML4-ALK fusion, and the dot-dash diagrams seen at ALK and EML4 by IGV are shown in fig. 1 and fig. 2, respectively. The specific detection method comprises the following steps:
1. performing capture sequencing on the sample, wherein the sequencing strategy is PE 100;
2. performing data preprocessing, comparison, duplicate removal and unique comparison sequence extraction on original offline data to form a final BAM file, and establishing an index of the BAM file by adopting Samtools;
3. extraction of soft-truncated portions of sequencing reads that may carry gene fusion signals: extracting a read soft truncation part of a sequencing sequence and a cigar tag in a BAM file in an 'MS' or 'SM' mode according to a format rule;
4. splitting the read into different read groups according to the broken coordinates and directions of the reads, and comparing the reads of the same read group pairwise:
(1) if the longest read aligned to chromosome 2 coordinate 29446853 is "CTAAAAAGCATAAATGCCCATCT" (SEQ ID No. 1) with the length of 23 bases is extracted, the longest read can be divided into 23 sequences from the broken site to the tail end of the whole read, as shown in FIG. 3, each sequence calculates a sequence number (the specific method is shown in the invention content part) and stores the sequence number into an array;
(2) presetting the original values of the variables T and F as 0, comparing the original sequence with the soft truncation sequence at the same position, sequentially calculating the sequence number, and judging whether the sequence number exists in the array, if so, adding 1 to T, and if not, adding 1 to F. After traversing, comparing values T and F, if T is more than 4 and more than 0.5 times of F, taking the original alignment position and the T value of the sequencing sequence with the longest length as ID, outputting the sequence and the sequencing quality to a new file in a FASTQ format, taking the sequence as an example, and if the T value is 11, outputting:
@Chr2_29446853_11
CTAAAAAGCATAAATGCCCATCT
+
EBDGEGEGEGEGEGEFEGEFEFEG
5. comparing the FASTQ file generated in the previous step with a reference genome again to generate a new BAM file;
6. and filtering according to the re-comparison coordinates and the comparison quality: if the re-aligned alignment score is less than 10 or the re-aligned alignment score is compared with the original sequence to the same chromosome and the coordinate value is within 5000, the filter is considered not to pass, otherwise, the filter is considered to pass;
7. coordinate position annotation: annotating the original sequence comparison coordinates and the re-comparison coordinates, and outputting to a final result file;
8. variation frequency and depth: and taking the T value as the number of sequences supporting variation, taking the depth corresponding to the comparison coordinates of the original sequences as the depth, taking the depth of the number of sequences supporting variation as the variation frequency, and simultaneously outputting the number of sequences supporting variation to a result file. The results are shown in Table 1.
TABLE 1
Figure 255482DEST_PATH_IMAGE004
Test example 1
For the sample of example 1, the detection method of example 1 is used to compare with a plurality of currently common fusion mutation finding software, and the comparison results are shown in table 2 below. From the results, the method of the present invention has significant advantages in both accuracy and operating speed.
TABLE 2
Figure 28397DEST_PATH_IMAGE005
Example 2
The present embodiment provides a method for detecting fusion gene based on next-generation sequencing technology, which is used for detecting clinical peripheral blood plasma samples.
Specifically, the extraction method comprises the following steps: 1) dissolving proteinase K in a proteinase dissolving buffer solution in advance according to the proportion of 1mg/550 mu L to form a proteinase K solution, and adding 20 mu L of the proteinase K solution and 20 mu L of the magnetic bead suspension into a centrifuge tube; 2) transferring 300 mu L of serum or plasma sample to a centrifuge tube; 3) adding 450 mu L of lysis binding solution into a centrifuge tube, uniformly mixing for 30 seconds by vortex, adding 2 mu L of nucleic acid precipitation aid, and uniformly mixing for 15 minutes by reversing at room temperature; 4) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 5) adding 500 mu L of first washing solution, and uniformly mixing for 30 seconds by vortex; 6) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 7) adding 500 mu L of second washing solution, and uniformly mixing for 30 seconds by vortex; 8) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 9) repeating steps 7) and 8) once; 10) centrifuging, collecting liquid drops on the centrifugal pipe wall of the pipe, transferring the liquid drops to a magnetic frame, and sucking and discarding the solution; 11) air drying for 10 minutes; 12) adding 35 mu L of eluent, whirling to scatter the magnetic beads, standing at room temperature for 10 minutes, and shaking for 3 times during the period to accelerate DNA dissolution; 13) transferring the DNA to a magnetic frame, standing for 3 minutes, transferring the dissolved DNA to a new centrifugal tube for storage, wherein the dissolved DNA is the free circulating tumor DNA.
Wherein the nucleic acid precipitation aid contains 25 mu g/mu LCarrier RNA and 3M sodium acetate. The lysis binding solution contains 10% of sodium dodecyl sulfate, 45mmol/L Tris-HCl, 120mmol/L NaCl, 30 mmol/L disodium ethylene diamine tetraacetate, 20mol/L guanidine isothiocyanate, 3 mol/L potassium acetate and 6 wt% of Tween 20, and the pH value is 4.8. The protease K solution contains 50mmol/L Tris-HCl and 115mmol/L NaCl. The first washing solution contains 50mmol/L Tris-HCl, 110mmol/L NaCl, 50mmol/L disodium ethylene diamine tetraacetate and 0.2wt% triton, and the pH value is 5.5; the second wash solution contained 50mmol/L Tris-HCl and 80 vol% ethanol. The eluent is nuclease-free water.
The detection method comprises the following steps:
1. performing capture sequencing on the sample, wherein the sequencing strategy is PE 100;
2. performing data preprocessing, comparison, duplicate removal and unique comparison sequence extraction on original offline data to form a final BAM file, and establishing an index of the BAM file by adopting Samtools;
3. extraction of soft-truncated portions of sequencing reads that may carry gene fusion signals: extracting a read soft truncation part of a sequencing sequence and a cigar tag in a BAM file in an 'MS' or 'SM' mode according to a format rule;
4. splitting the read into different read groups according to the broken coordinates and directions of the reads, and comparing the reads of the same read group pairwise:
(1) assuming that the length of different read segments which are subjected to fracture in each sequence group is n, the read segments can be divided into n sequences from the site where the fracture occurs to the tail end of the whole read segment, and each sequence calculates a sequence number (the specific method is shown in the invention content part) and stores the sequence number into an array;
(2) presetting the original values of the variables T and F as 0, comparing the original sequence with the soft truncation sequence at the same position, sequentially calculating the sequence number, and judging whether the sequence number exists in the array, if so, adding 1 to T, and if not, adding 1 to F. After traversing, comparing values T and F, if T is more than 4 and more than 0.5 time of F, taking the original alignment position and T value of the sequencing sequence with the longest length as ID, outputting the sequence and sequencing quality to a new file in a FASTQ format, taking the sample of the embodiment as an example, the maximum T value in the ordered group is 3, and no fusion variation is detected.
Example 3
The present embodiment provides a method for detecting fusion genes based on a second-generation sequencing technology, which is used for detecting mutations in a low-frequency sample.
Specifically, the extraction method comprises the following steps: 1) dissolving proteinase K in a proteinase dissolving buffer solution in advance according to the proportion of 1mg/550 mu L to form a proteinase K solution, and adding 20 mu L of the proteinase K solution and 20 mu L of the magnetic bead suspension into a centrifuge tube; 2) transferring 300 mu L of serum or plasma sample to a centrifuge tube; 3) adding 450 mu L of lysis binding solution into a centrifuge tube, uniformly mixing for 30 seconds by vortex, adding 2 mu L of nucleic acid precipitation aid, and uniformly mixing for 15 minutes by reversing at room temperature; 4) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 5) adding 500 mu L of first washing solution, and uniformly mixing for 30 seconds by vortex; 6) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 7) adding 500 mu L of second washing solution, and uniformly mixing for 30 seconds by vortex; 8) transferring the solution to a magnetic frame, standing for 3 minutes to adsorb magnetic beads, and absorbing and discarding the solution; 9) repeating steps 7) and 8) once; 10) centrifuging, collecting liquid drops on the centrifugal pipe wall of the pipe, transferring the liquid drops to a magnetic frame, and sucking and discarding the solution; 11) air drying for 10 minutes; 12) adding 35 mu L of eluent, whirling to scatter the magnetic beads, standing at room temperature for 10 minutes, and shaking for 3 times during the period to accelerate DNA dissolution; 13) transferring the DNA to a magnetic frame, standing for 3 minutes, transferring the dissolved DNA to a new centrifugal tube for storage, wherein the dissolved DNA is the free circulating tumor DNA.
The formulation of each reagent was the same as in example 2.
Then, DNA repair and library construction are carried out through a damage repair enzyme system, and the specific steps are as follows:
1) end repair and addition of A
Firstly, placing the tail end repairing buffer solution at room temperature to melt completely, then, whirling for 10s, and instantly centrifuging for 3 s;
the tail end repairing buffer solution is a white solid when frozen at-20 ℃, is a colorless transparent liquid after being completely melted at room temperature, and can be used after being changed into a colorless clear liquid by vortex treatment to accelerate the dissolution of crystals or particles if white crystals or white particles exist in the liquid and need to be further prolonged for room temperature equilibrium time;
2) a tip repair working solution was prepared according to the following table 3:
TABLE 3
Figure 211116DEST_PATH_IMAGE006
Wherein, the DNA damage repair enzyme is a mixture of UDG, endonuclease IV and T4 PDG, and in each muL of the terminal repair working solution, the content of UDG is 3.5U, the content of endonuclease IV is 7U, and the content of T4 PDG is 7U.
The end repair enzyme is a mixture of T4 DNA polymerase and PNK kinase, and the content of T4 DNA polymerase and the content of PNK kinase in each mu L of end repair working solution are 60U and 150U respectively.
3) Adding a terminal repair working solution (13 mu L of each sample) into the fragmented DNA system, performing vortex for 10s, performing instantaneous centrifugation for 3s, setting a PCR instrument program according to the following table 4, and performing terminal repair reaction incubation; the temperature of a hot cover of the PCR instrument is 85 ℃;
TABLE 4
Figure 662957DEST_PATH_IMAGE007
4) When the end repair reaction is carried out, the ligation Buffer (ligation Buffer) is stood in a refrigerator at 4 ℃ for thawing; taking out the required Adapter according to the task list, and standing in a refrigerator at 4 ℃;
5) after the end repair reaction was completed, the sample was removed from the PCR instrument, and the PCR tube cap was then pressed down, vortexed for 4s, and centrifuged instantaneously for 3 s.
Ligation working solutions (ligation buffer + ligase) were prepared as follows in Table 5.
TABLE 5
Figure 340058DEST_PATH_IMAGE008
The prepared connecting working solution is vortexed for 3 seconds, is subjected to instantaneous centrifugation for 3 seconds and then is placed on ice statically;
adding an Adapter into a terminal repair reaction system according to the following table 6;
TABLE 6
Figure 29796DEST_PATH_IMAGE009
After the addition of the Adapter, vortex for 10s, and add 45 μ L of prepared ligation reaction working solution after instantaneous centrifugation for 3 s; after the connection reaction working solution is added, the mixture is vortexed for 10 seconds and is instantaneously centrifuged for 3 seconds;
6) the ligation reaction requires PCR instrument programming as per table 7 below; the PCR instrument hot lid temperature was set at 45 ℃.
TABLE 7
Figure 648996DEST_PATH_IMAGE010
7) Purifying the connection product by using AMPure XP magnetic beads, fully and uniformly mixing AMPure XP Bead suspension, adding 0.8 x of uniformly mixed AMPure XP Bead suspension into a 1.5mL new centrifuge tube, transferring 70 mu L of connection product into the centrifuge tube, uniformly mixing by vortex for 5s, and standing at room temperature for 10min for incubation; adsorbing for 3min by a magnetic frame, removing the supernatant, washing twice by 80% ethanol, drying in the air, and adding 23 mu L of nucleic-free water to elute DNA.
8) At the time of elution, PCR amplification working solutions were prepared according to the following Table 8:
TABLE 8
Figure 994658DEST_PATH_IMAGE011
Standing the prepared PCR amplification working solution on ice;
9) placing the centrifuge tube into a PCR instrument with a hot lid temperature of 105 ℃
The reaction was carried out according to the following set-up procedure of table 9:
TABLE 9
Figure 600083DEST_PATH_IMAGE012
10) Purifying the connection product by using AMPure XP magnetic beads, fully and uniformly mixing AMPure XP Bead suspension, adding 1.0 x of uniformly mixed AMPure XP Bead suspension into a 1.5mL new centrifuge tube, transferring 50 mu L of connection product into the centrifuge tube, uniformly mixing by vortex for 5s, and standing at room temperature for 10min for incubation; adsorbing for 3min by a magnetic frame, removing the supernatant, washing twice by 80% ethanol, drying in the air, and adding 23 mu L of nucleic-free water to elute DNA.
11) And taking 1 mu L, detecting the DNA concentration by using the Qubit 3.0 Fluorometer, and sucking about 23 mu L of supernatant into a new 1.5ml tube, wherein the magnetic beads can be discarded at the step.
Then, detection is carried out, and the detection method comprises the following steps:
1. performing capture sequencing on the sample, wherein the sequencing strategy is PE 100;
2. performing data preprocessing, comparison, duplicate removal and unique comparison sequence extraction on original offline data to form a final BAM file, and establishing an index of the BAM file by adopting Samtools;
3. extraction of soft-truncated portions of sequencing reads that may carry gene fusion signals: extracting a read soft truncation part of a sequencing sequence and a cigar tag in a BAM file in an 'MS' or 'SM' mode according to a format rule;
4. splitting the read into different read groups according to the broken coordinates and directions of the reads, and comparing the reads of the same read group pairwise:
(1) assuming that the length of different read segments which are subjected to fracture in each sequence group is n, the read segments can be divided into n sequences from the site where the fracture occurs to the tail end of the whole read segment, and each sequence calculates a sequence number (the specific method is shown in the invention content part) and stores the sequence number into an array;
(2) presetting the original values of the variables T and F as 0, comparing the original sequence with the soft truncation sequence at the same position, sequentially calculating the sequence number, and judging whether the sequence number exists in the array, if so, adding 1 to T, and if not, adding 1 to F. After traversing, comparing values T and F, if T is more than 4 and more than 0.5 time of F, taking the original comparison position and the T value of the sequencing sequence with the longest length as ID, outputting the sequence and the sequencing quality to a new file in a FASTQ format, taking the sample of the embodiment as an example, comparing the sequence at the position of chromosome 6 coordinate 117649657 to exceed a threshold value, outputting the sequence in a FASTQ format, and continuing the next detection;
5. comparing the FASTQ file generated in the previous step with the reference genome again to generate a new BAM file, wherein the other end of the BAM file is compared to the 4 th chromosome coordinate 25679566;
6. and filtering according to the re-comparison coordinates and the comparison quality: if the re-aligned alignment score is less than 10 or the re-aligned alignment score is compared with the original sequence to the same chromosome and the coordinate value is within 5000, the filter is considered not to pass, otherwise, the filter is considered to pass;
7. coordinate position annotation: annotating the original sequence alignment coordinates and the realignment coordinates, and outputting the annotated original sequence alignment coordinates and the realignment coordinates to a final result file, wherein one end of the sample is annotated between the ROS 132 exon and the ROS 33 exon, and the other end of the sample can be annotated on the SLC34A2 exon 13;
8. variation frequency and depth: and taking the T value as the number of sequences supporting variation, taking the depth corresponding to the comparison coordinates of the original sequences as the depth, taking the depth of the number of sequences supporting variation as the variation frequency, and simultaneously outputting the number of sequences supporting variation to a result file. The detection result is SLC34A2_13: ROS1_33, frequency 0.036.
Test example 2
The extraction method of example 2 was used as an experimental group, the extraction method before optimization was used as a control group, a plurality of samples were extracted, and the concentration and total amount of the extracted nucleic acids were counted. Each set of experiment was set with 3 parallel tests, and the statistical results were averaged. The statistical results are shown in Table 10.
The extraction method before optimization is different from the extraction method of embodiment 2 only in that: nucleic acid precipitation aids were not used for precipitation.
Watch 10
Figure 268962DEST_PATH_IMAGE013
From the results, it can be seen that the nucleic acid extraction efficiency of the extraction method of example 2 is significantly higher than that of the non-optimized group, and the total amount of nucleic acid extraction is increased by 7.6 times on average. It can be seen that the extraction method of the invention can significantly improve the nucleic acid extraction efficiency and inhibit the interference of other matrixes on the nucleic acid quantification.
Test example 3
The repairing and library constructing method of example 3 was used as an experimental group, the direct library constructing method without repairing (the same library constructing method as example 3) was used as a control group, library construction was performed on a plurality of DNA samples, and the library output was counted. Each set of experiment was set with 3 parallel tests, and the statistical results were averaged. The statistical results are shown in tables 11-12.
TABLE 11
Figure 934429DEST_PATH_IMAGE014
TABLE 12
Figure 964790DEST_PATH_IMAGE015
As shown in tables 11 and 12, the efficiency of DNA library construction by the damage repair method is higher than that of direct library construction, and the average library construction amount of the optimized samples S1-S10 is 248%. And by directly constructing the library for the sample S11-S20 which can not reach the dosage of hybridization capture, the library is constructed by a DNA damage repair method, so that the library quantity required by hybridization capture is reached, and the average library construction quantity after optimization is 11.9 times that before optimization.
Although the invention has been described in detail hereinabove with respect to a general description and specific embodiments thereof, it will be apparent to those skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Sequence listing
<110> Beijing Youxin medical instruments Co., Ltd
<120> circulating tumor DNA fusion detection method based on second-generation sequencing technology
<130> KHP211117803.4YS
<160> 1
<170> SIPOSequenceListing 1.0
<210> 1
<211> 23
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
ctaaaaagca taaatgccca tct 23

Claims (9)

1. A method for detecting fusion genes based on a next generation sequencing technology is characterized by comprising the following steps:
comparing the sequencing sequence with a reference genome to obtain an original BAM file, removing the repeated sequence and the sequences compared to a plurality of positions to obtain a final BAM file for further detection;
extracting a read containing soft truncation from the final BAM file, and splitting the read into different read groups according to the broken coordinates and directions of the read;
comparing the reads of the same read group in pairs, removing the reads with too short length, too low sequence similarity with other reads of the read group or containing repeated sequences, further extracting the read with the highest matching degree with other reads for the next detection, and taking the number of the reads of the read group at the moment as the number of the reads supporting variation; wherein, the pairwise comparison of the reads of the same read group specifically includes:
(1) extracting a read with the highest matching degree between the read group and the rest reads, if the length of the read is L, extracting L sequences from the fracture part of the original read according to the step length of 1 base, and performing digital conversion on the sequences according to the following rules:
s1, constructing a binary sequence: respectively constructing A, T, C, G binary sequences according to a read, using the same base on 1 generation and the read and using different base on 0 generation and the read, and obtaining a binary sequence with the length of L for each base; then connecting 4 binary sequences with the length of L together according to the sequence of A, T, C, G to obtain 1 binary sequence with the length of 4L;
s2, first setting a second-order matrix
Figure DEST_PATH_IMAGE001
Representing a 1, second order matrix
Figure 236767DEST_PATH_IMAGE002
Represents 0; then using the sequence of the 4L binary sequence obtained in the previous step to make matrix multiplication in turn, finally obtaining a second-order matrix, using said second-order matrix to make left multiplication on the weight matrix
Figure DEST_PATH_IMAGE003
Obtaining a final matrix, calculating the trace of the matrix, and defining the trace as a sequence number of the sequence; calculating L serial numbers and storing the L serial numbers into an array;
(2) setting the initial values of the variables T and F as 0 respectively, sequentially traversing the original reads to compare to the residual soft truncation sequences of the same coordinate, calculating the sequence numbers of the sequences respectively, and judging whether the sequences exist in the array, if so, adding 1 to T, and if not, adding 1 to F; after traversing, comparing values T and F, if T is greater than a threshold and is greater than a multiple of a set threshold of F, considering that the soft truncation sequence of the group passes through filtering, taking the original comparison position and the T value of the sequencing sequence with the longest length as ID, and outputting the sequence and the sequencing quality to a new file in a FASTQ format; wherein, the T value is the sequence number supporting the variation;
the newly generated FASTQ format file in the last step is compared with a reference genome again, and if the comparison score is too low or the compared genome position is too close to the compared genome position of the original sequence, the next detection is not carried out;
annotating the original sequence alignment coordinates and the realignment coordinates; and (3) taking the depth corresponding to the original sequence comparison coordinate as the depth, taking the ratio of the number of the read segments supporting the variation to the depth as the variation frequency, and simultaneously outputting the variation frequency to a result file.
2. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 1, wherein the extracting and grouping of the reads containing soft truncation specifically comprises:
determining an alignment mode of each sequencing read according to cigar information of each sequencing read, wherein the mode of the cigar is M if the sequencing sequence has no soft truncation, the mode of the cigar is SM if the left side of the sequencing sequence has soft truncation, the mode of the cigar is MS if the right side of the sequencing sequence has soft truncation, original alignment chromosomes and coordinates of the sequencing sequence carrying the soft truncation are used as keys, the sequence of a soft truncation part is used as a value and read into a hash table, and the hash table simultaneously reserves the positive and negative chains of the sequencing sequence and the quality information of the sequencing base of the soft truncation part.
3. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 1 or 2, wherein in the step of detecting the fusion genes of ctDNA, the method further comprises the following steps:
extracting ctDNA in a sample to be detected by using a nucleic acid precipitation aid, and sequencing to obtain a sequencing sequence;
the nucleic acid precipitation aid contains 1 mu g/mu L-5 mu g/mu L LCarrier RNA and 3 +/-0.5M sodium acetate.
4. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 3, wherein the magnetic bead method is adopted to extract ctDNA in a sample to be detected, and specifically comprises the following steps:
mixing a sample to be detected with a proteinase K solution, a magnetic bead suspension, a lysis binding solution and a nucleic acid precipitation aid to bind ctDNA to the magnetic beads, washing the magnetic beads, and finally eluting the ctDNA from the magnetic beads;
the lysis binding solution contains 1-10% of sodium dodecyl sulfate, 45mmol/L of Tris-HCl, 120mmol/L of NaCl, 30 mmol/L of disodium ethylene diamine tetraacetate, 10-30 mol/L of guanidine isothiocyanate, 2-4 mol/L of potassium acetate and 5-10 wt% of Tween 20, and the pH value is 4.8 +/-0.2.
5. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 4, wherein the proteinase K solution contains 45-75 mmol/L Tris-HCl and 100-120 mmol/L NaCl.
6. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 4 or 5, wherein the washing of the magnetic beads specifically comprises: washing the magnetic beads by the first washing solution and the second washing solution in sequence;
the first washing solution contains 45-75 mmol/L Tris-HCl, 100-120 mmol/L NaCl, 30-60 mmol/L disodium ethylene diamine tetraacetate and 1.5 +/-0.2 wt% triton, and the pH value is 5.5 +/-0.2;
the second washing solution comprises 45-75 mmol/L Tris-HCl and 75 +/-5 vol% ethanol;
and/or, when elution is carried out, the eluent is nuclease-free water.
7. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 1 or 2, wherein in the detection of the fusion genes of ctDNA or damaged DNA, the method further comprises:
repairing the extracted DNA by using a repair working solution, and then sequencing;
the repair working solution contains DNA damage repair enzyme, and the DNA damage repair enzyme is a mixture of UDG, endonuclease IV and T4 PDG; in each mu L of repair working solution, the content of UDG is 3-4U, the content of endonuclease IV is 6-8U, and the content of T4 PDG is 6-8U.
8. The method for detecting fusion genes based on the next-generation sequencing technology of claim 7, wherein the method further comprises: and constructing the repaired DNA library to obtain the DNA library.
9. The method for detecting fusion genes based on the next-generation sequencing technology according to claim 8, wherein the extracted DNA is repaired by using a repair working solution when a DNA library is constructed;
the repair working solution contains the DNA damage repair enzyme and the DNA end repair enzyme; the DNA end repair enzyme is a mixture of T4 DNA polymerase and PNK kinase; in each mu L of repair working solution, the content of T4 DNA polymerase is 50-100U, and the content of PNK kinase is 100-200U.
CN202111640988.4A 2021-12-30 2021-12-30 Circulating tumor DNA fusion detection method based on second-generation sequencing technology Active CN114005490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111640988.4A CN114005490B (en) 2021-12-30 2021-12-30 Circulating tumor DNA fusion detection method based on second-generation sequencing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111640988.4A CN114005490B (en) 2021-12-30 2021-12-30 Circulating tumor DNA fusion detection method based on second-generation sequencing technology

Publications (2)

Publication Number Publication Date
CN114005490A CN114005490A (en) 2022-02-01
CN114005490B true CN114005490B (en) 2022-04-22

Family

ID=79932273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111640988.4A Active CN114005490B (en) 2021-12-30 2021-12-30 Circulating tumor DNA fusion detection method based on second-generation sequencing technology

Country Status (1)

Country Link
CN (1) CN114005490B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564415B (en) * 2023-07-10 2023-10-17 深圳华大基因科技服务有限公司 Stream sequencing analysis method, device, storage medium and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845150A (en) * 2016-12-29 2017-06-13 安诺优达基因科技(北京)有限公司 A kind of device for detecting Circulating tumor DNA sample Gene Fusion
CN107368708A (en) * 2017-08-14 2017-11-21 东莞博奥木华基因科技有限公司 A kind of method and system of precisely analysis DMD gene structures variation breakpoint
CN111326212A (en) * 2020-02-18 2020-06-23 福建和瑞基因科技有限公司 Detection method of structural variation
CN112687341A (en) * 2021-03-12 2021-04-20 上海思路迪医学检验所有限公司 Method for identifying chromosome structure variation by taking breakpoint as center
CN112802553A (en) * 2020-12-29 2021-05-14 北京优迅医疗器械有限公司 Method for comparing genome sequencing sequence and reference genome based on suffix tree algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10964410B2 (en) * 2017-05-25 2021-03-30 Koninklijke Philips N.V. System and method for detecting gene fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845150A (en) * 2016-12-29 2017-06-13 安诺优达基因科技(北京)有限公司 A kind of device for detecting Circulating tumor DNA sample Gene Fusion
CN107368708A (en) * 2017-08-14 2017-11-21 东莞博奥木华基因科技有限公司 A kind of method and system of precisely analysis DMD gene structures variation breakpoint
CN111326212A (en) * 2020-02-18 2020-06-23 福建和瑞基因科技有限公司 Detection method of structural variation
CN112802553A (en) * 2020-12-29 2021-05-14 北京优迅医疗器械有限公司 Method for comparing genome sequencing sequence and reference genome based on suffix tree algorithm
CN112687341A (en) * 2021-03-12 2021-04-20 上海思路迪医学检验所有限公司 Method for identifying chromosome structure variation by taking breakpoint as center

Also Published As

Publication number Publication date
CN114005490A (en) 2022-02-01

Similar Documents

Publication Publication Date Title
US20230323474A1 (en) Compositions and methods for isolating cell-free dna
JP7300989B2 (en) Methods and systems for analyzing nucleic acid molecules
CN108893466B (en) Sequencing joint, sequencing joint group and detection method of ultralow frequency mutation
EP3359695B1 (en) Methods and applications of gene fusion detection in cell-free dna analysis
CN107190329B (en) Fusion based on DNA is quantitatively sequenced and builds library, detection method and its application
EP3329010B1 (en) Nucleic acids and methods for detecting chromosomal abnormalities
AU2016305103C1 (en) Single-molecule sequencing of plasma DNA
EP3636777A1 (en) System and methodology for the analysis of genomic data obtained from a subject
JP2018522531A (en) Diagnosis method
CN105442054B (en) The method that storehouse is built in the amplification of multiple target site is carried out to plasma DNA
KR20160141680A (en) Method of next generation sequencing using adapter comprising barcode sequence
WO2020243722A1 (en) Methods and systems for improving patient monitoring after surgery
WO2023035889A1 (en) Gene fusion detection method and apparatus
US20230203590A1 (en) Methods and means for diagnosing lung cancer
JP2020521216A (en) Methods and systems for detecting insertions and deletions
CN114005490B (en) Circulating tumor DNA fusion detection method based on second-generation sequencing technology
JP2024056984A (en) Methods, compositions and systems for calibrating epigenetic compartment assays
WO2019064063A1 (en) Biomarkers for colorectal cancer detection
KR20220060198A (en) Method for Predicting Survival Prognosis of Pancreatic Cancer Patients Using Gene Copy Number Variation Profile
CN113215663B (en) Construction method of gastric cancer targeted therapy genome library based on high-throughput sequencing and primers
CN114746560A (en) Methods, compositions, and systems for improved binding of methylated polynucleotides
CN112391474A (en) Method for predicting esophageal squamous carcinoma metastasis based on fusobacterium nucleatum in tumor
CN111073872B (en) DNA damage repair system, DNA library construction kit and library construction method
EP4179111B1 (en) Methods of detecting genomic rearrangements using cell free nucleic acids
CN114023442B (en) Student information analysis method and model based on bone and meat tumor molecular typing of multiple groups of chemical data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant