CN111128304A - Quality detection method and device for second-generation sequencing data - Google Patents

Quality detection method and device for second-generation sequencing data Download PDF

Info

Publication number
CN111128304A
CN111128304A CN201911292413.0A CN201911292413A CN111128304A CN 111128304 A CN111128304 A CN 111128304A CN 201911292413 A CN201911292413 A CN 201911292413A CN 111128304 A CN111128304 A CN 111128304A
Authority
CN
China
Prior art keywords
target sample
qualified
detected
data
evaluation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911292413.0A
Other languages
Chinese (zh)
Inventor
孙丰龙
吕小莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital China Health Technologies Co ltd
Original Assignee
Digital China Health Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital China Health Technologies Co ltd filed Critical Digital China Health Technologies Co ltd
Priority to CN201911292413.0A priority Critical patent/CN111128304A/en
Publication of CN111128304A publication Critical patent/CN111128304A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application provides a quality detection method and a quality detection device for next-generation sequencing data, and the method comprises the following steps: acquiring to-be-detected data of a target sample; determining evaluation information of a target sample according to data to be detected, wherein the evaluation information comprises any at least two kinds of following seed evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value; and determining whether the target sample is a qualified sample according to the evaluation information.

Description

Quality detection method and device for second-generation sequencing data
Technical Field
The application relates to the field of data analysis, in particular to a quality detection method and device for second-generation sequencing data.
Background
With the continuous development of the second-generation sequencing technology, the price is continuously reduced. The use of human we technology (Whole exome Sequencing) and WGS technology (Whole genome Sequencing) in the field of genetic disease and cancer diagnosis is becoming more and more common, but the current Sequencing service providers in domestic markets are more than two hundred, and the quality control standards for laboratory library construction and subsequent bioinformation analysis of each family are different, which has a serious influence on the interpretation of the subsequent genetic disease sites. Currently, the cost of human genome sequencing is about one third of that of the WES data, particularly, the cost of the WES data is about one third of that of the WGS, the number of sequencing data is inevitably increased in the future, and how to form a quality control system with strict and complete subsequent sequencing data becomes a bottleneck of industry development.
In the prior art, the quality of data is controlled by a self-defined threshold value aiming at individual subdata in the second-generation sequencing data, so that the reliability and the validity of a data result cannot be ensured.
Disclosure of Invention
In view of this, an object of the present application is to provide a method and an apparatus for detecting quality of second-generation sequencing data, so as to solve the problem of how to improve the reliability of the quality control result of the second-generation sequencing data in the prior art.
In a first aspect, an embodiment of the present application provides a method for detecting quality of second-generation sequencing data, where the method includes:
acquiring to-be-detected data of a target sample;
determining evaluation information of a target sample according to data to be detected, wherein the evaluation information comprises any at least two kinds of following seed evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value;
and determining whether the target sample is a qualified sample according to the evaluation information.
According to a first aspect, the present embodiments provide a first possible implementation manner of the first aspect, wherein determining whether the target sample is a qualified sample according to the evaluation information includes:
judging whether each seed evaluation information is in a qualified state;
and if each piece of evaluation information is in a qualified state, determining the target sample as a qualified sample.
According to a first possible embodiment of the first aspect, the present examples provide a second possible embodiment of the first aspect, wherein whether the base distribution ratio is in a qualified state is determined as follows:
calculating the base distribution ratio of the target sample according to the data to be detected, and judging whether the base distribution ratio meets a first condition; the first condition is that the base distribution proportion of the target sample does not exceed a preset base distribution proportion interval;
if the data to be detected meet the first condition, determining that the base distribution ratio is in a qualified state;
and if the base distribution ratio and other sub-evaluation information are in qualified states, determining that the target sample is a qualified sample.
According to a first possible embodiment of the first aspect, the present examples provide a third possible embodiment of the first aspect, wherein whether the base distribution ratio is in a qualified state is determined as follows:
calculating the base number proportion of each quality value of the target sample according to the data to be detected, and judging whether the base number proportion meets a second condition; the second condition is that the base number proportion of each quality value of the target sample does not exceed the corresponding preset base number proportion interval;
if the data to be detected meet the second condition, determining that the base number ratio is in a qualified state;
and if the base number proportion and other sub-evaluation information are in qualified states, determining the target sample as a qualified sample.
According to a first possible embodiment of the first aspect, the present examples provide a fourth possible embodiment of the first aspect, wherein whether the base distribution ratio is in a qualified state is determined as follows:
performing genome comparison on the data to be detected to obtain conversion data to be detected;
calculating a high-quality ratio comparative example of the target sample according to the conversion data to be detected, and judging whether the high-quality ratio comparative example meets a third condition; the third condition is that the high-quality comparison proportion of the target sample is not less than a preset comparison proportion threshold value;
if the conversion data to be detected meets the third condition, determining that the high-quality ratio comparative example is in a qualified state;
and if the high-quality ratio comparative example and other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
According to a fifth possible embodiment of the first aspect, the present examples provide a fourth possible embodiment of the first aspect, wherein the determination of whether the base distribution ratio is in the qualified state is performed as follows:
performing genome comparison on the data to be detected to obtain conversion data to be detected;
calculating a cross contamination statistic value of the target sample according to the to-be-detected conversion data, and judging whether the cross contamination statistic value meets a fourth condition; the fourth condition is that the cross contamination statistic value of the target sample is not greater than a preset cross contamination statistic threshold value;
if the conversion data to be detected meets the fourth condition, determining that the cross contamination statistic value is in a qualified state;
and if the cross contamination statistic value and other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
According to a first possible embodiment of the first aspect, the present examples provide a sixth possible embodiment of the first aspect, wherein whether the base distribution ratio is in a qualified state is determined as follows:
carrying out format genome comparison on the data to be detected to obtain conversion data to be detected;
calculating a plurality of hybridization capture quality values of the target sample according to the to-be-detected conversion data, and judging whether the plurality of hybridization capture quality values meet a fifth condition; the fifth condition is that each hybridization capture quality value of the target sample is within a corresponding preset hybridization capture quality value interval;
if the conversion data to be detected meets the fifth condition, determining that the hybridization capture quality value is in a qualified state;
and if the hybridization capture quality value and the other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
In a second aspect, the present application provides an apparatus for detecting quality of second-generation sequencing data, the apparatus including:
the acquisition module is used for acquiring to-be-detected data of the target sample;
the calculation module is used for determining evaluation information of the target sample according to the data to be detected, wherein the evaluation information comprises any at least two kinds of following evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value;
and the judging module is used for determining whether the target sample is a qualified sample according to the evaluation information.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of the method according to any one of the first aspect and possible implementation manners when executing the computer program.
In a fourth aspect, this application is embodied as a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the steps of the method of any one of the above first aspect and its possible implementations.
According to the quality detection method for the second-generation sequencing data, the obtained data to be detected of the target sample is analyzed, the evaluation information of the target sample is determined, and whether the target sample is a qualified sample is determined according to at least two kinds of evaluation sub-information contained in the evaluation information. The quality detection method for the second-generation sequencing data, provided by the embodiment of the application, can effectively detect the sample with the problem in the second-generation sequencing data, and improve the reliability of the quality control result of the second-generation sequencing data, so that the availability of the qualified sample after quality control is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a schematic flow chart of a method for detecting quality of second-generation sequencing data according to an embodiment of the present disclosure;
FIG. 2 is a schematic flowchart of a method for detecting quality of second-generation sequencing data according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of an apparatus for detecting quality of second-generation sequencing data according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a quality detection method of next-generation sequencing data, as shown in fig. 1, comprising the following steps:
s101, acquiring to-be-detected data of a target sample;
step S102, determining evaluation information of a target sample according to data to be detected, wherein the evaluation information comprises any at least two kinds of following evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value;
and step S103, determining whether the target sample is a qualified sample according to the evaluation information.
Specifically, in order to ensure the reliability of the quality control result of the sample, the embodiment of the present application uses indexes of multiple second-generation sequencing data as the basis of quality control of the sample, and can calculate the evaluation information of the target sample corresponding to the data to be detected according to the data to be detected, and uses at least two seed evaluation information of a base distribution ratio, a base number ratio, a high quality ratio comparison ratio, a cross contamination statistic value and a hybridization capture quality value to determine whether the target sample is qualified.
In an alternative embodiment, the step S103 of determining whether the target sample is a qualified sample according to the evaluation information includes, as shown in fig. 2:
step S1031, judging whether each seed evaluation information is in a qualified state;
step S1032, if each piece of evaluation information is in a qualified state, determining that the target sample is a qualified sample.
In order to ensure the availability of the target sample, it is necessary to ensure that each evaluation sub-information in the evaluation information is in a qualified state, so as to determine that the target sample is a qualified sample.
The qualification status of each of the evaluators is determined by a standard library established based on standard samples in a 1000G database (1000genome project) and an Exome aggregate database (Exome aggregate database) and a quality control index threshold set based on the standard libraries.
In an alternative embodiment, whether the base distribution ratio is in the acceptable state is determined as follows:
2011, calculating a base distribution ratio of the target sample according to the data to be detected, and judging whether the base distribution ratio meets a first condition; the first condition is that the base distribution ratio of the target sample does not exceed a preset base distribution ratio interval;
step 2012, if the data to be detected meets the first condition, determining that the base distribution ratio is in a qualified state;
and 2013, if the base distribution ratio and other sub-evaluation information are in qualified states, determining that the target sample is a qualified sample.
Specifically, the acquired data to be detected of the target sample is an initial FastQ-formatted file, and the base distribution ratio of the target sample can be obtained by analyzing the data in the FastQ-formatted file.
For example, for target samples all of which are chinese, based on the relevant data of chinese in the 1000G database and the ExAC database, the average value of the base distribution ratio of the standard sample can be calculated as the standard value of the base distribution ratio, and the preset base distribution ratio interval is determined according to the allowable error range.
If the base distribution ratio of the target sample exceeds the preset base distribution ratio, the experimental sequencing process of the target sample is possibly problematic and cannot be used as a qualified sample.
In an alternative embodiment, whether the base distribution ratio is in the acceptable state is determined as follows:
step 2021, calculating the ratio of the number of bases of each quality value of the target sample according to the data to be detected, and judging whether the ratio of the number of bases meets a second condition; the second condition is that the ratio of the number of bases of each quality value of the target sample does not exceed the corresponding preset base number ratio interval;
step 2022, if the data to be detected satisfies the second condition, determining that the ratio of the number of bases is in a qualified state;
step 2023, if the base number ratio and other sub-evaluation information are all in a qualified state, determining that the target sample is a qualified sample.
Specifically, the base number ratio of Q20 and Q30 of the target sample can be found by analyzing the data in the file in the FastQ format.
Based on the data of Chinese in the 1000G database, the average and standard deviation of the base number ratio of Q20 of the standard sample were calculated to be 0.907234047 and 0.030598, respectively, the average and standard deviation of the base number ratio of Q30 were 0.632925 and 0.062931, respectively, as standard values of the base distribution ratio, and the preset interval of the base number ratio of Q20 was (0.815440047, 1) and the preset interval of the base number ratio of Q30 was (0.444132, 1) in accordance with the 3. sigma. principle.
If the base number ratio of Q20 and Q30 of the target sample exceeds the corresponding preset base number ratio interval, the target sample is indicated to have poor sequencing quality and cannot be used as a qualified sample.
In an alternative embodiment, whether the base distribution ratio is in the acceptable state is determined as follows:
step 2031, performing genome comparison on the data to be detected to obtain conversion data to be detected;
step 2032, calculating a high quality ratio comparison example of the target sample according to the conversion data to be detected, and judging whether the high quality ratio comparison example meets a third condition; the third condition is that the high quality comparison ratio of the target sample is not less than a preset comparison ratio threshold;
step 2033, if the conversion data to be detected meets the third condition, determining that the high quality ratio comparative example is in a qualified state;
step 2034, if the high quality ratio comparative example and the other sub-evaluation information are all in a qualified state, determining that the target sample is a qualified sample.
Specifically, a Bam-format file of the data to be detected of the target sample is obtained by performing genome comparison on the FastQ-format file, and a high-quality comparison proportion of the target sample, namely, a high-quality read comparison proportion can be obtained by analyzing the data in the Bam-format file.
By analyzing the standard sample, the average value of the ratio of the high-quality read of the standard sample to the reference genome is 0.991966, and the minimum value is 0.976797, so the preset comparison threshold value is taken between the average value of the ratio of the high-quality read of the standard sample to the reference genome and the minimum value. The embodiment of the application is preferably 0.98.
And if the high-quality contrast ratio of the target sample is less than 0.98, judging that the target sample is polluted by the DNA of other species.
In an alternative embodiment, whether the base distribution ratio is in the acceptable state is determined as follows:
2041, performing genome comparison on the data to be detected to obtain conversion data to be detected;
2042, calculating a cross contamination statistic of the target sample according to the to-be-detected conversion data, and judging whether the cross contamination statistic meets a fourth condition; the fourth condition is that the cross contamination statistic value of the target sample is not greater than a preset cross contamination statistic threshold value;
2043, if the conversion data to be detected meets the fourth condition, determining that the cross contamination statistic is in a qualified state;
and 2044, if the cross contamination statistic and other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
Specifically, genome comparison is carried out on the files in the FastQ format to obtain the files in the Bam format of the data to be detected of the target sample, and the cross contamination statistical value of the target sample can be obtained by analyzing the data in the files in the Bam format.
The cross-contamination statistical threshold is set from the data in the ExAC database, preferably the calculated cross-contamination statistical threshold is set to 0.075.
When the cross-contamination statistic for the target sample is greater than 0.075, the target sample is already contaminated and cannot be used as a qualified sample.
In an alternative embodiment, whether the base distribution ratio is in the acceptable state is determined as follows:
step 2051, performing genome comparison on the to-be-detected data to obtain to-be-detected conversion data;
step 2052, calculating a plurality of hybridization capture quality values of the target sample according to the to-be-detected conversion data, and judging whether the plurality of hybridization capture quality values meet a fifth condition; the fifth condition is that each hybridization capture quality value of the target sample is within a corresponding preset hybridization capture quality value interval;
step 2053, if the conversion data to be detected meets the fifth condition, determining that the hybridization capture quality value is in a qualified state;
and step 2054, if the hybridization capture quality value and the other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
Specifically, a Bam-formatted file of the data to be detected of the target sample is obtained by performing genome comparison on FastQ-formatted files, and a plurality of hybridization capture quality values of the target sample can be obtained by analyzing the data in the Bam-formatted file, including: average target capture zone depth, fold enrichment, percent uncovered target design zone, and percent of reads in total read by unique alignment to the reference genome.
The average value of the standard sample obtained according to the 1000G database of the average target capture area depth is 106.7, the lowest value is 50, the preset average target capture area depth can be set to 50, and if the average target capture area depth of the target sample is lower than 50, the target sample is unqualified.
The lowest value of the enrichment factor calculated from the standard sample of the 1000G database is 18.4, and preferably, the preset enrichment factor is set to 15.
The maximum value of the target design area uncovered percentage calculated by the standard sample of the 1000G database is 10.4%, preferably, the preset target design area uncovered percentage is set to 10%, and if the target design area uncovered percentage of the target sample exceeds 10%, the target sample is not qualified.
The percentage of reads obtained by unique comparison with the reference genome to the total reads has a large variation in value calculated by the standard sample of the 1000G database, but is substantially greater than 70%, so it is preferable to set the percentage of reads obtained by unique comparison with the reference genome to the total reads to 70%, and if the index of the target sample is less than 70%, the target sample is rejected.
The embodiment of the present application provides a quality detection apparatus for second-generation sequencing data, as shown in fig. 3, the apparatus includes:
the acquiring module 30 is used for acquiring to-be-detected data of the target sample;
the calculating module 31 is configured to determine evaluation information of the target sample according to the data to be detected, where the evaluation information includes any at least two kinds of following evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value;
and the judging module 32 is used for determining whether the target sample is a qualified sample according to the evaluation information.
Corresponding to the quality detection method of the second-generation sequencing data in fig. 1, an embodiment of the present application further provides a computer device 400, as shown in fig. 4, the device includes a memory 401, a processor 402, and a computer program stored on the memory 401 and executable on the processor 402, wherein the processor 402 implements the quality detection method of the second-generation sequencing data when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general-purpose memories and processors, which are not limited in this respect, and when the processor 402 runs a computer program stored in the memory 401, the method for detecting the quality of the second-generation sequencing data can be executed, so as to solve the problem in the prior art of how to improve the reliability of the quality control result of the second-generation sequencing data.
Corresponding to the quality detection method of the second-generation sequencing data in fig. 1, the embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the quality detection method of the second-generation sequencing data.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the method for detecting the quality of the second-generation sequencing data can be executed, so that the problem of how to improve the reliability of the quality control result of the second-generation sequencing data in the prior art is solved. The quality detection method for the second-generation sequencing data, provided by the embodiment of the application, can effectively detect the sample with the problem in the second-generation sequencing data, and improve the reliability of the quality control result of the second-generation sequencing data, so that the availability of the qualified sample after quality control is improved.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A quality detection method of second-generation sequencing data is characterized by comprising the following steps:
acquiring to-be-detected data of a target sample;
determining evaluation information of a target sample according to data to be detected, wherein the evaluation information comprises any at least two kinds of following seed evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value;
and determining whether the target sample is a qualified sample according to the evaluation information.
2. The method of claim 1, wherein determining whether the target sample is a qualified sample based on the evaluation information comprises:
judging whether each seed evaluation information is in a qualified state;
and if each piece of evaluation information is in a qualified state, determining the target sample as a qualified sample.
3. The method according to claim 2, wherein whether the base distribution ratio is in a qualified state is determined as follows:
calculating the base distribution ratio of the target sample according to the data to be detected, and judging whether the base distribution ratio meets a first condition; the first condition is that the base distribution proportion of the target sample does not exceed a preset base distribution proportion interval;
if the data to be detected meet the first condition, determining that the base distribution ratio is in a qualified state;
and if the base distribution ratio and other sub-evaluation information are in qualified states, determining that the target sample is a qualified sample.
4. The method according to claim 2, wherein whether the base distribution ratio is in a qualified state is determined as follows:
calculating the base number proportion of each quality value of the target sample according to the data to be detected, and judging whether the base number proportion meets a second condition; the second condition is that the base number proportion of each quality value of the target sample does not exceed the corresponding preset base number proportion interval;
if the data to be detected meet the second condition, determining that the base number ratio is in a qualified state;
and if the base number proportion and other sub-evaluation information are in qualified states, determining the target sample as a qualified sample.
5. The method according to claim 2, wherein whether the base distribution ratio is in a qualified state is determined as follows:
performing genome comparison on the data to be detected to obtain conversion data to be detected;
calculating a high-quality ratio comparative example of the target sample according to the conversion data to be detected, and judging whether the high-quality ratio comparative example meets a third condition; the third condition is that the high-quality comparison proportion of the target sample is not less than a preset comparison proportion threshold value;
if the conversion data to be detected meets the third condition, determining that the high-quality ratio comparative example is in a qualified state;
and if the high-quality ratio comparative example and other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
6. The method according to claim 2, wherein whether the base distribution ratio is in a qualified state is determined as follows:
performing genome comparison on the data to be detected to obtain conversion data to be detected;
calculating a cross contamination statistic value of the target sample according to the to-be-detected conversion data, and judging whether the cross contamination statistic value meets a fourth condition; the fourth condition is that the cross contamination statistic value of the target sample is not greater than a preset cross contamination statistic threshold value;
if the conversion data to be detected meets the fourth condition, determining that the cross contamination statistic value is in a qualified state;
and if the cross contamination statistic value and other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
7. The method according to claim 2, wherein whether the base distribution ratio is in a qualified state is determined as follows:
performing genome comparison on the data to be detected to obtain conversion data to be detected;
calculating a plurality of hybridization capture quality values of the target sample according to the to-be-detected conversion data, and judging whether the plurality of hybridization capture quality values meet a fifth condition; the fifth condition is that each hybridization capture quality value of the target sample is within a corresponding preset hybridization capture quality value interval;
if the conversion data to be detected meets the fifth condition, determining that the hybridization capture quality value is in a qualified state;
and if the hybridization capture quality value and the other sub-evaluation information are both in a qualified state, determining that the target sample is a qualified sample.
8. A quality detection device for second-generation sequencing data is characterized by comprising:
the acquisition module is used for acquiring to-be-detected data of the target sample;
the calculation module is used for determining evaluation information of the target sample according to the data to be detected, wherein the evaluation information comprises any at least two kinds of following evaluation information: base distribution ratio, base number ratio, high quality ratio comparative example, cross contamination statistical value and hybridization capture quality value;
and the judging module is used for determining whether the target sample is a qualified sample according to the evaluation information.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of the preceding claims 1-7 are implemented by the processor when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of the preceding claims 1 to 7.
CN201911292413.0A 2019-12-16 2019-12-16 Quality detection method and device for second-generation sequencing data Pending CN111128304A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911292413.0A CN111128304A (en) 2019-12-16 2019-12-16 Quality detection method and device for second-generation sequencing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911292413.0A CN111128304A (en) 2019-12-16 2019-12-16 Quality detection method and device for second-generation sequencing data

Publications (1)

Publication Number Publication Date
CN111128304A true CN111128304A (en) 2020-05-08

Family

ID=70500059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911292413.0A Pending CN111128304A (en) 2019-12-16 2019-12-16 Quality detection method and device for second-generation sequencing data

Country Status (1)

Country Link
CN (1) CN111128304A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376612A (en) * 2022-09-13 2022-11-22 郑州思昆生物工程有限公司 Data evaluation method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170228496A1 (en) * 2014-07-25 2017-08-10 Ontario Institute For Cancer Research System and method for process control of gene sequencing
CN110444255A (en) * 2019-08-30 2019-11-12 深圳裕策生物科技有限公司 Biological information quality control method, device and storage medium based on the sequencing of two generations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170228496A1 (en) * 2014-07-25 2017-08-10 Ontario Institute For Cancer Research System and method for process control of gene sequencing
CN110444255A (en) * 2019-08-30 2019-11-12 深圳裕策生物科技有限公司 Biological information quality control method, device and storage medium based on the sequencing of two generations

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
喻东 等: "高通量测序临床应用中数据质量控制和分析若干问题的探讨" *
郑广勇 等: "宏基因组大数据分析的质量控制流程规范" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376612A (en) * 2022-09-13 2022-11-22 郑州思昆生物工程有限公司 Data evaluation method and device, electronic equipment and storage medium
CN115376612B (en) * 2022-09-13 2023-10-13 郑州思昆生物工程有限公司 Data evaluation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Linderman et al. Zero-preserving imputation of single-cell RNA-seq data
CN106951925B (en) Data processing method, device, server and system
Vallejos et al. Normalizing single-cell RNA sequencing data: challenges and opportunities
Zhao et al. Detection of fetal subchromosomal abnormalities by sequencing circulating cell-free DNA from maternal plasma
Clark et al. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies
Bravo et al. Model-based quality assessment and base-calling for second-generation sequencing data
Noble How does multiple testing correction work?
Zheng et al. ChIP-chip: data, model, and analysis
CN111178760B (en) Risk monitoring method, risk monitoring device, terminal equipment and computer readable storage medium
Conley et al. Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection
Zou et al. Morphological and molecular convergences in mammalian phylogenetics
Zhang et al. Accurate and reproducible functional maps in 127 human cell types via 2D genome segmentation
Barla et al. Machine learning methods for predictive proteomics
Vollmers et al. How clear is our current view on microbial dark matter?(Re-) assessing public MAG & SAG datasets with MDMcleaner
CN113517022B (en) Gene detection method, feature extraction method, device, equipment and system
Kumar et al. Metabolomic biomarker identification in presence of outliers and missing values
CN112989861A (en) Sample identification code reading method and device, electronic equipment and storage medium
CN106301979B (en) Method and system for detecting abnormal channel
Svoboda et al. Internal oligo (dT) priming introduces systematic bias in bulk and single-cell RNA sequencing count data
CN110324352B (en) Method and device for identifying batch registered account groups
CN111128304A (en) Quality detection method and device for second-generation sequencing data
Chitpin et al. RECAP reveals the true statistical significance of ChIP-seq peak calls
Douaihy et al. BurstDECONV: a signal deconvolution method to uncover mechanisms of transcriptional bursting in live cells
CN114584377A (en) Flow anomaly detection method, model training method, device, equipment and medium
Hu et al. Exploiting noise in array CGH data to improve detection of DNA copy number change

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508

RJ01 Rejection of invention patent application after publication