CN117594130A - Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium - Google Patents
Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN117594130A CN117594130A CN202410077991.7A CN202410077991A CN117594130A CN 117594130 A CN117594130 A CN 117594130A CN 202410077991 A CN202410077991 A CN 202410077991A CN 117594130 A CN117594130 A CN 117594130A
- Authority
- CN
- China
- Prior art keywords
- signal
- evaluation index
- curve
- sequencing
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 246
- 238000007672 fourth generation sequencing Methods 0.000 title claims abstract description 38
- 238000003860 storage Methods 0.000 title claims abstract description 28
- 238000012163 sequencing technique Methods 0.000 claims abstract description 95
- 238000000034 method Methods 0.000 claims abstract description 63
- 238000007619 statistical method Methods 0.000 claims abstract description 62
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 46
- 108091028043 Nucleic acid sequence Proteins 0.000 claims abstract description 24
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 22
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 22
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 238000005070 sampling Methods 0.000 claims description 60
- 238000010586 diagram Methods 0.000 claims description 51
- 238000012545 processing Methods 0.000 claims description 25
- 239000012634 fragment Substances 0.000 claims description 23
- 238000013467 fragmentation Methods 0.000 claims description 19
- 238000006062 fragmentation reaction Methods 0.000 claims description 19
- 230000000007 visual effect Effects 0.000 claims description 19
- 238000001914 filtration Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 230000000694 effects Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 19
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 238000010832 independent-sample T-test Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000012854 evaluation process Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 238000007671 third-generation sequencing Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 125000006850 spacer group Chemical group 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Abstract
The disclosure relates to the field of nucleic acid sequencing, in particular to a nanopore sequencing signal evaluation method, a device, electronic equipment and a storage medium, which are used for acquiring an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode. Dividing the original sequencing signal to obtain a blank current signal which is acquired when the nanopore is not combined with the nucleic acid molecule and is used as a first signal, a current signal which is acquired when the nucleic acid molecule is combined with the nanopore and is not used as a second signal, and a current signal which is used when the nucleic acid sequence starts to continuously pass the hole and is used as a third signal, carrying out statistical analysis on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index, and carrying out statistical analysis on the third signal to obtain a fourth evaluation index and a fifth evaluation index. The raw sequencing signal is evaluated according to at least one evaluation index. The method can obtain various evaluation indexes through signal segmentation and statistical calculation, and evaluate the sequencing process and the sequencing result of the nanopore sequencing mode.
Description
Technical Field
The present disclosure relates to the field of nucleic acid sequencing, and in particular, to a nanopore sequencing signal evaluation method, device, electronic apparatus, and storage medium.
Background
Nanopore sequencing is a new generation of sequencing technology, one of the biggest differences from the second generation of sequencing technology is the conversion of the sequencing signal from an optical signal to a current signal. The quality of the sequencing signal is evaluated, which is a necessary step to ensure the accuracy of the sequencing result. However, the conventional sequencing signal evaluation method of the second generation sequencing technology is not suitable for the third generation sequencing, so that the establishment of a new sequencing signal evaluation system is a key problem in the development of the third generation sequencing technology.
Disclosure of Invention
In view of this, the present disclosure proposes a nanopore sequencing signal evaluation method, device, electronic apparatus and storage medium, aiming at evaluating a nanopore sequencing process and a sequencing result.
According to a first aspect of the present disclosure, there is provided a nanopore sequencing signal evaluation method, the method comprising:
acquiring an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode;
dividing the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the nucleic acid molecule is combined with the nanopore and the via hole is not started, and the third signal is a current signal when the nucleic acid sequence starts continuous via hole;
Carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
and evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
In one possible implementation manner, the performing statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index, and a third evaluation index includes:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
In one possible implementation manner, the performing statistical analysis on the first signal to obtain a first evaluation index includes:
dividing the first signal to obtain a front first signal and a rear first signal;
calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
Calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
and determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
In one possible implementation manner, the performing statistical analysis on the second signal to obtain a second evaluation index includes:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
In one possible implementation manner, the performing a comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index includes:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
In one possible implementation manner, the performing statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index includes:
removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
Carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
In a possible implementation manner, the removing the linker sequence signal in the third signal according to the preset linker sequence signal template to obtain the target sequence signal includes:
searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
and removing the signal segment from the third signal to obtain a target sequence signal.
In one possible implementation manner, the performing statistical analysis on the target sequence signal to obtain a fourth evaluation index includes:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
and determining a fourth evaluation index according to the third signal mean value and the current median.
In one possible implementation manner, the performing a fragmentation process on the target sequence signal to obtain a target signal curve includes:
Fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
and sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve.
In one possible implementation manner, the fragmenting the target sequence signal to obtain at least one sequence signal fragment includes:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
and dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment.
In one possible implementation manner, the calculating a signal observation value between adjacent interval signals includes:
according to the formulaA signal observation t is calculated, which, among other things,andthe number of signal sampling points included in the adjacent two interval signals, Andthe average value of the current values of the signal sampling points included in the two adjacent interval signals,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
In one possible implementation manner, the sorting and plotting the current signals from small to large to obtain a target signal curve includes:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
and carrying out smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
In one possible implementation manner, the performing statistical analysis on the target signal curve to obtain a fifth evaluation index includes:
and calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
In one possible implementation, the method further includes:
drawing a visual signal diagram respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
and summarizing each visual signal diagram to obtain a signal evaluation summary diagram and displaying the signal evaluation summary diagram.
According to a second aspect of the present disclosure, there is provided a nanopore sequencing signal evaluation device, the device comprising:
the sequence determining module is used for acquiring an original sequencing signal obtained by detecting the nucleic acid sequence in a nanopore sequencing mode;
the signal segmentation module is used for segmenting the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the hole is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the nucleic acid sequence starts to continuously pass through the hole;
the first statistical module is used for carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
the second statistical module is used for carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
And the effect evaluation module is used for evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
In one possible implementation, the first statistics module is further configured to:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
In one possible implementation, the first statistics module is further configured to:
dividing the first signal to obtain a front first signal and a rear first signal;
calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
and determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
In one possible implementation, the first statistics module is further configured to:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
In one possible implementation, the first statistics module is further configured to:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
In one possible implementation manner, the second statistics module is further configured to:
removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
In one possible implementation manner, the second statistics module is further configured to:
searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
and removing the signal segment from the third signal to obtain a target sequence signal.
In one possible implementation manner, the second statistics module is further configured to:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
and determining a fourth evaluation index according to the third signal mean value and the current median.
In one possible implementation manner, the second statistics module is further configured to:
fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
and sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve.
In one possible implementation manner, the second statistics module is further configured to:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
And dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment.
In one possible implementation manner, the second statistics module is further configured to:
according to the formulaA signal observation t is calculated, which, among other things,andthe number of signal sampling points included in the adjacent two interval signals,andthe current values of the signal sampling points included in the two adjacent interval signalsThe average value of the two values,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
In one possible implementation manner, the second statistics module is further configured to:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
and carrying out smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
In one possible implementation manner, the second statistics module is further configured to:
and calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
In one possible implementation, the apparatus further includes:
the signal diagram drawing module is used for drawing visual signal diagrams respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
and the image summarizing module is used for summarizing each visual signal diagram to obtain a signal evaluation summarizing diagram and displaying the signal evaluation summarizing diagram.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-described method.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
In an embodiment of the disclosure, an original sequencing signal obtained by detecting a nucleic acid sequence by nanopore sequencing is obtained. Dividing the original sequencing signal to obtain a blank current signal which is acquired when the nanopore is not combined with the nucleic acid molecule and is used as a first signal, a current signal which is acquired when the nucleic acid molecule is combined with the nanopore and is not used as a second signal, and a current signal which is used when the nucleic acid sequence starts to continuously pass the hole and is used as a third signal, carrying out statistical analysis on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index, and carrying out statistical analysis on the third signal to obtain a fourth evaluation index and a fifth evaluation index. The raw sequencing signal is evaluated according to at least one evaluation index. The method can obtain various evaluation indexes through signal segmentation and statistical calculation, and evaluate the sequencing process and the sequencing result of the nanopore sequencing mode.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flowchart of a nanopore sequencing signal evaluation method according to an embodiment of the present disclosure.
Fig. 2 shows a schematic diagram of a signal splitting process according to an embodiment of the present disclosure.
Fig. 3 shows a schematic diagram of a rear first signal according to an embodiment of the present disclosure.
Fig. 4 shows a schematic diagram of a fragmentation processing result according to an embodiment of the present disclosure.
Fig. 5 shows a schematic diagram of a target signal profile according to an embodiment of the present disclosure.
Fig. 6 shows a schematic diagram of a signal evaluation summary graph, according to an embodiment of the disclosure.
Fig. 7 shows a schematic diagram of a nanopore sequencing signal evaluation process according to an embodiment of the present disclosure.
Fig. 8 shows a schematic diagram of a nanopore sequencing signal evaluation device according to an embodiment of the present disclosure.
Fig. 9 shows a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
The nanopore sequencing signal evaluation method of the embodiment of the disclosure can be executed by electronic equipment such as terminal equipment or a server. The terminal device may be any fixed or mobile terminal such as a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, etc. The server may be a single server or a server cluster composed of a plurality of servers. Any electronic device may implement the nanopore sequencing signal evaluation method of the embodiments of the present disclosure by way of a processor invoking computer readable instructions stored in a memory.
Fig. 1 shows a flowchart of a nanopore sequencing signal evaluation method according to an embodiment of the present disclosure. As shown in fig. 1, the nanopore sequencing signal evaluation method of the embodiments of the present disclosure may include the following steps S10 to S50.
And S10, acquiring an original sequencing signal obtained by detecting the nucleic acid sequence in a nanopore sequencing mode.
In one possible implementation, an electronic device is used to obtain an original sequencing signal obtained by detecting a nucleic acid sequence based on a nanopore sequencing mode, where the original sequencing signal is a current signal obtained by acquiring according to a preset sampling rate. The process of detecting the nucleic acid sequence can be realized by a sequencer, namely, the sequencer detects the nucleic acid sequence based on a nanopore sequencing mode, an original sequencing signal with a predetermined sampling rate is obtained, and then the original sequencing signal is sent to electronic equipment. In the nanopore sequencing process, a nucleic acid sequence to be detected is combined with the nanopore through the nanopore, and a current signal is collected by a sequencer according to a preset sampling rate in the whole sequencing process, so that an original sequencing signal is obtained.
Optionally, under the condition that the electronic device has a sequencing function, the electronic device can be used for directly detecting the nucleic acid sequence based on a nanopore sequencing mode to obtain an original sequencing signal with a predetermined sampling rate.
Step S20, dividing the original sequencing signal to obtain a first signal, a second signal and a third signal. In one possible implementation manner, after the electronic device obtains the original sequencing signal, the electronic device may divide the original sequencing signal according to signal characteristics of different portions in the original sequencing signal, so as to obtain three signals, namely a first signal, a second signal and a third signal. The first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal acquired when the hole is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the continuous hole is started after the nucleic acid molecule in the nucleic acid sequence is combined with the nanopore. I.e. the first signal is essentially a hole current signal, the second signal is a Spacer current signal and the third signal is an actual sequencing current signal.
Fig. 2 shows a schematic diagram of a signal splitting process according to an embodiment of the present disclosure. As shown in fig. 2, the black waveform is an original sequencing signal, the front part of the first vertical line on the left side is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the part between the first vertical line on the left side and the middle vertical line is a current signal acquired when the nucleic acid molecule is combined with the nanopore and the via hole is not started, and the signal between the first vertical line on the right side and the middle vertical line is a current signal when the nucleic acid molecule in the nucleic acid sequence is combined with the nanopore and the via hole is started continuously. Thus, the original sequencing signal can be split into a first signal, a second signal and a third signal according to the positions of the three vertical lines in the figure. Optionally, before the electronic device segments the original sequencing signal, the electronic device may also perform low-pass filtering on the original sequencing signal according to an average filter with a preset window length, so as to remove part of interference signals included in the original sequencing signal, and improve accuracy of three signals obtained by segmentation.
And step S30, carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index.
In one possible implementation manner, the electronic device may obtain three kinds of evaluation indexes, i.e., a first evaluation index, a second evaluation index and a third evaluation index, by performing statistical analysis on two kinds of information of the first signal and the second signal, so as to be used for evaluating the quality of the original sequencing signal obtained by sequencing. The first signal can be subjected to statistical analysis to obtain a first evaluation index, the second signal can be subjected to statistical analysis to obtain a second evaluation index, and the first signal and the second signal are subjected to comprehensive statistical analysis to obtain a third evaluation index. Alternatively, one or more index contents may be included in each evaluation index.
Optionally, the first evaluation index may include a front first signal average value, a rear first signal average value, and first signal noise corresponding to the first signal. The front first signal and the rear first signal can be obtained by further dividing the first signal, wherein the front first signal is a first signal positioned before a section of the second signal, and the rear first signal is a first signal positioned after a section of the third signal. Fig. 3 shows a schematic diagram of a rear first signal according to an embodiment of the present disclosure. As shown in fig. 3, the signal to the left of the vertical line is the third signal in the section of the original sequencing signal, and the signal to the right of the vertical line is a portion of the first signal that follows the third signal. That is, typically the electronic device may determine that a front portion of a first signal is a rear first signal and that a rear portion is a front first signal. The duration of the front first signal and the rear first signal may be selected as desired.
When the first evaluation index is determined, the electronic device may divide the first signal to obtain a front first signal and a rear first signal, calculate a front first signal average value and a rear first signal average value, where the signal average value is an average value of current values of each signal sampling point in the corresponding signal. And calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points. And determining a first evaluation index according to the front first signal mean value, the rear first signal mean value and the first signal noise. Wherein the first signal noise can be calculated by the formulaN is calculated as the number of signal sampling points in the first signal,for the current value of the signal sampling point i,the average current value of all signal sampling points in the first signal.
Further, when determining the second evaluation index, the electronic device may calculate a current value average value of each signal sampling point in the second signal, to obtain the second signal average value as the second evaluation index. When determining the third evaluation index, the electronic device may calculate a ratio of the current value means of the first signal and the second signal, to obtain the third evaluation index. The ratio of the first signal average value to the second signal average value can be used as a third evaluation index.
And step S40, carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index.
In one possible implementation, the electronic device may perform statistical analysis on the third signal to obtain two kinds of evaluation indexes, i.e., a fourth evaluation index and a fifth evaluation index, where each evaluation index may include one or more index contents. The electronic device may remove the linker sequence signal in the third signal according to a preset linker sequence signal template, so as to obtain a target sequence signal. I.e. the electronic device may identify a part of the splice sequence signal in the third signal, and delete this part to remove the interference information in the third signal. And then carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index. And carrying out fragmentation processing on the target sequence signal to obtain a target signal curve. And carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
Alternatively, the linker sequence signal in the third signal may be identified based on a dynamic adjustment algorithm, i.e. the electronic device may search the third signal for a signal segment most similar to the linker sequence signal template based on the dynamic adjustment algorithm. And removing the signal segment from the third signal to obtain the target sequence signal. The dynamic adjustment algorithm may be a DTW (Dynamic Time Warping ) algorithm for correctly calculating the similarity between two time series sequences by performing time domain alignment adjustment non-linearly, the basic idea being to find a path in the joint signal sequence template and the third signal such that the euclidean distance sum of the points on the path is minimized. In the process of searching the signal fragments based on the dynamic adjustment algorithm, preset constraint conditions need to be met: (1) The start point of the signal fragment must correspond to the start point of the linker sequence signal template and the end point must correspond to the end point of the linker sequence signal template; (2) The signal segments must be continuous, i.e. the next point in the search path must be adjacent to the current point in the third signal; (3) The search path must be monotonic, i.e. the next point must be to the right of the current point. In searching for the signal segment in the third signal that is most similar to the linker sequence signal template, a two-dimensional array may be created based on the DTW algorithm, with each point in the array representing the distance between two points in the linker sequence signal template and two third signals. Then a path from the start point to the end point is found in the array, so that the distance sum on the path is minimum, and the searching process can be realized through dynamic programming.
Further, after determining the target sequence signal based on the third signal, the electronic device performs statistical analysis on the target sequence signal to obtain a fourth evaluation index. The average value and the median of the current value of each signal sampling point in the target sequence signal can be determined, and the third signal average value and the current median can be obtained. The fourth evaluation index may include a third signal mean and a current median.
In one possible implementation manner, the electronic device may further perform a fragmentation process on the target sequence signal after determining the target sequence signal to obtain the target signal curve. Alternatively, the determining process of the target curve may be that the target sequence signal is firstly subjected to a fragmentation process to obtain at least one sequence signal fragment. And calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal. And sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve. The process of fragmentation processing can be realized based on an independent sample T test (Independent Samples T-test), namely, for signals which are locally in normal distribution or approximately normal distribution, a sliding window is used for carrying out independent sample T test between two adjacent groups of signal points, if the T statistic of the test result is smaller than a preset critical value, the two groups of signal points are regarded as current signals of the same k-mer, and finally, the average value of the current signal points contained in the k-mer is taken as the current value of the k-mer, wherein the k-mer refers to a sub-signal point sequence with the length of k in a long signal point sequence.
Specifically, the electronic device may first sequentially obtain a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and with a preset first step length. And calculating a signal observation value between adjacent interval signals, and determining that the adjacent interval signals belong to two different signal fragments when the signal observation value is larger than a critical value. And further dividing the target sequence signal according to the signal fragments to which the different intermediate signals belong to obtain at least one sequence signal fragment. The signal observations t may be according to the formulaThe calculation result shows that the method comprises the steps of,andthe number of signal sampling points included in the adjacent two interval signals,andthe average value of the current values of the signal sampling points included in the two adjacent interval signals,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
Fig. 4 shows a schematic diagram of a fragmentation processing result according to an embodiment of the present disclosure. As shown in fig. 4, after the fragmentation process, the original target sequence signal (non-square waveform curve) is divided into a curve signal (square waveform curve) composed of a plurality of sequence signal fragments.
Further, after the target sequence signal is fragmented, the electronic device may determine a corresponding current signal based on the current value average of the signal sampling points included in each sequence signal fragment, and then sequence and draw a curve for each current signal to obtain a target signal curve. The candidate signal curves can be obtained by sequencing and drawing curves of all current signals from small to large. And performing smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
Alternatively, the filtering method of the local polynomial least squares fitting may be a Savitzky-Golay algorithm, which is a filtering method based on the local polynomial least squares fitting in the time domain. The core idea is to perform k-order polynomial fitting on data points in a window with a certain length, so as to obtain a fitted result. The Savitzky-Golay method can be used for filtering signal noise and ensuring that the shape and the width of the signal are unchanged. The filtering process based on the filtering method can be to acquire the curve segments in the candidate signal curves for multiple times according to a second sliding window with the size of 2Q+1 by a preset second step length. According to the formulaThe center signal point x of each curve segment is smoothed,and y is a candidate signal curve, wherein the weight factor is a preset weight factor based on local polynomial least square fitting.
Fig. 5 shows a schematic diagram of a target signal profile according to an embodiment of the present disclosure. As shown in fig. 5, the black curve is a target signal curve drawn by the electronic device based on the target sequence signal after the fragmentation processing, the curve is a curve with values sequentially rising from small to large, and the straight line is the slope of the target signal curve.
Further, after determining the target signal curve, the electronic device may calculate, by statistics, a signal maximum value, a signal minimum value, a signal median, a signal preset percentile, a maximum point of curve curvature, a minimum point of curve curvature, and a curve maximum value in the target signal curve And obtaining a fifth evaluation index by the tangential point of the line and the curve, the slope of the curve maximum connecting line, the curve signal noise and the curve signal noise ratio and the offset degree of the target sequence signal after the fragmentation processing. The preset percentile may be at least one preset percentile, and may include, for example, 90% percentile and 10% percentile. The curvature of the curve can be according to the formulaThe calculation result shows that the method comprises the steps of,is the first derivative of the target signal curve,is the second derivative of the target signal curve. The curve signal noise can be according to the formulaCalculated, the signal-to-noise ratio of the curve signal can be calculated according to the formulaM is calculated as the number of sequence signal fragments of the corresponding target sequence signal in the target signal curve,for the number of signal sample points included in the i-th sequence signal segment,the current value of the jth signal sampling point in the ith sequence of signal segments,the average value of the current values of all signal sampling points in the ith sequence signal segment. The offset degree of the target sequence signal after the fragmentation processing can be calculated according to the Euclidean distance of the signals before and after the fragmentation processing of the target sequence signal.
And S50, evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
In one possible implementation manner, the electronic device obtains a first evaluation index, a second evaluation index, a third evaluation index, a fourth evaluation index and a fifth evaluation index through statistical analysis of a first signal, a second signal and a third signal obtained by dividing an original sequencing signal, and evaluates the original sequencing signal according to the at least one evaluation index based on actual requirements of a specific application scene. The first evaluation index is used for reflecting the sequencing stability of the nanopore for acquiring the original sequencing signal and the stability of the nanopore. The second evaluation index is used for reflecting the sequencing sensitivity of the nanopore in combination with the first evaluation index, and the third evaluation index is used for evaluating the sequencing sensitivity of the nanopore. The fourth evaluation index is used for reflecting the current value of the original sequencing signal obtained by sequencing the nanopore, and the fifth evaluation index is used for reflecting the sequencing performance of the nanopore and the noise and signal-to-noise ratio of the original sequencing signal. Under the conditions that the target signal curve is smoother and the shape is closer to the slope, the better the resolution effect of the nanopore and the better the sequencing performance are. The curve signal noise and curve signal to noise ratio reflect the noise and signal to noise ratio, respectively, of the original sequencing signal.
Further, after the above several evaluation indexes are obtained, the electronic device may also visually display important indexes therein. For example, visual signal graphs corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index respectively can be drawn, and then each visual signal graph is summarized to obtain a signal evaluation summary graph and displayed. The visual signal diagram drawn by each index can comprise at least one, and each visual signal diagram can be a violin diagram or a box diagram.
Fig. 6 shows a schematic diagram of a signal evaluation summary graph, as shown in fig. 6, in which a single sequencing includes multiple biological replicates, using a violin graph and box plot to visualize the distribution of indices, resulting in a signal evaluation summary graph, according to an embodiment of the present disclosure. The signal evaluation summary graph comprises five visual signal graphs drawn according to a first evaluation index, a third evaluation index, a fourth evaluation index and a fifth evaluation index, wherein the first evaluation index corresponds to the visual signal graph for representing comprehensive stability, the third evaluation index corresponds to the visual signal graph for representing sequencing sensitivity, the fourth evaluation index corresponds to the visual signal graph for representing absolute magnitude of sequencing current, and the fifth evaluation index corresponds to the visual signal graph for representing sequencing signal noise and the visual signal graph for representing signal-to-noise ratio of the sequencing signal. In the visualized signal diagram, the width of the violin diagram represents the probability density of the observation of the original sequencing signal at the value, the boundaries of the box diagram represent the upper edge of the observed value, the upper quartile of the observed value, the middle of the observed value, the lower quartile of the observed value and the lower edge of the observed value in sequence from top to bottom, and the black solid points represent outliers, namely outliers.
Fig. 7 shows a schematic diagram of a nanopore sequencing signal evaluation process according to an embodiment of the present disclosure. As shown in fig. 7, after acquiring an original sequencing signal obtained by detecting a nucleic acid sequence by a nanopore sequencing method, the electronic device segments the original sequencing signal to obtain a first signal (a hole signal), a second signal (a Space signal) and a third signal (a sequencing sequence signal). Further, the first signal is subjected to statistical analysis to obtain a first evaluation index, the second signal is subjected to statistical analysis to obtain a second evaluation index, and a third evaluation index is determined based on the first evaluation index and the second evaluation index. And removing the linker sequence signal in the third signal based on the linker sequence signal template to obtain a target sequence signal. And directly carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index, carrying out fragmentation treatment on the target sequence signal to obtain a target signal curve (K-mer signal curve), and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index. Further, the original sequencing signal is evaluated according to at least one of the first, second, third, fourth and fifth evaluation indexes. Further, the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index may be summarized, and a corresponding signal evaluation summary graph may be drawn for visual evaluation.
Based on the technical characteristics, the embodiment of the disclosure can obtain different parts in an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode through signal segmentation, count different part signals in a targeted manner through different methods to obtain various evaluation indexes, and comprehensively and accurately evaluate a sequencing process and a sequencing result of the nanopore sequencing mode.
Fig. 8 shows a schematic diagram of a nanopore sequencing signal evaluation device according to an embodiment of the present disclosure. As shown in fig. 8, the nanopore sequencing signal evaluation device of the embodiment of the present disclosure may include:
a sequence determining module 80, configured to obtain an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing manner;
the signal segmentation module 81 is configured to segment the original sequencing signal to obtain a first signal, a second signal and a third signal, where the first signal is a blank current signal collected when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the nanopore is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the nucleic acid sequence starts continuous via;
a first statistics module 82, configured to perform a statistical analysis based on the first signal and the second signal, to obtain a first evaluation index, a second evaluation index, and a third evaluation index;
The second statistics module 83 is configured to perform statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
an effect evaluation module 84 for evaluating the raw sequencing signal according to at least one of the first, second, third, fourth, and fifth evaluation indices.
In one possible implementation, the first statistics module 82 is further configured to:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
In one possible implementation, the first statistics module 82 is further configured to:
dividing the first signal to obtain a front first signal and a rear first signal;
calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
And determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
In one possible implementation, the first statistics module 82 is further configured to:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
In one possible implementation, the first statistics module 82 is further configured to:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
In a possible implementation manner, the second statistics module 83 is further configured to:
removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
In a possible implementation manner, the second statistics module 83 is further configured to:
Searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
and removing the signal segment from the third signal to obtain a target sequence signal.
In a possible implementation manner, the second statistics module 83 is further configured to:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
and determining a fourth evaluation index according to the third signal mean value and the current median.
In a possible implementation manner, the second statistics module 83 is further configured to:
fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
and sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve.
In a possible implementation manner, the second statistics module 83 is further configured to:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
Calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
and dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment.
In a possible implementation manner, the second statistics module 83 is further configured to:
according to the formulaA signal observation t is calculated, which, among other things,andthe number of signal sampling points included in the adjacent two interval signals,andthe average value of the current values of the signal sampling points included in the two adjacent interval signals,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
In a possible implementation manner, the second statistics module 83 is further configured to:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
and carrying out smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
In a possible implementation manner, the second statistics module 83 is further configured to:
And calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
In one possible implementation, the apparatus further includes:
the signal diagram drawing module is used for drawing visual signal diagrams respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
and the image summarizing module is used for summarizing each visual signal diagram to obtain a signal evaluation summarizing diagram and displaying the signal evaluation summarizing diagram.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
Fig. 9 shows a schematic diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 9, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958 (I/O interface). The electronic device 1900 may operate an operating system based on a memory 1932, such as Windows Server TM ,Mac OS X TM ,Unix TM , Linux TM ,FreeBSD TM Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (10)
1. A method of nanopore sequencing signal evaluation, the method comprising:
acquiring an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode;
dividing the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the nucleic acid molecule is combined with the nanopore and the via hole is not started, and the third signal is a current signal when the nucleic acid sequence starts continuous via hole;
Carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
and evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
2. The method of claim 1, wherein the performing a statistical analysis based on the first signal and the second signal results in a first evaluation index, a second evaluation index, and a third evaluation index, comprising:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
3. The method of claim 2, wherein the performing a statistical analysis on the first signal to obtain a first evaluation index comprises:
dividing the first signal to obtain a front first signal and a rear first signal;
Calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
and determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
4. A method according to claim 2 or 3, wherein said statistically analyzing the second signal to obtain a second evaluation index comprises:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
5. The method of claim 2, wherein performing a comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index comprises:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
6. The method of claim 1, wherein the performing a statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index comprises:
Removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index;
removing the linker sequence signal in the third signal according to a preset linker sequence signal template to obtain a target sequence signal, including:
searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
removing the signal segment from the third signal to obtain a target sequence signal;
the step of performing statistical analysis on the target sequence signal to obtain a fourth evaluation index includes:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
determining a fourth evaluation index according to the third signal mean value and the current median;
the step of carrying out fragmentation processing on the target sequence signal to obtain a target signal curve comprises the following steps:
Fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve;
the step of performing fragmentation processing on the target sequence signal to obtain at least one sequence signal fragment comprises the following steps:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment;
the calculating the signal observation value between the adjacent interval signals comprises the following steps:
according to the formulaCalculating a signal observation value t, wherein +.>And->Respectively, signals included in two adjacent interval signalsNumber of sampling points>And->Respectively, the average value of the current values of the signal sampling points included in the two adjacent interval signals, +. >And->The current value variances of signal sampling points included in two adjacent interval signals are respectively;
sequencing from small to large for each current signal and drawing a curve to obtain a target signal curve, wherein the method comprises the following steps of:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
smoothing filtering is carried out on the candidate signal curve based on a filtering method of local polynomial least square fitting, and a target signal curve is obtained;
the step of performing statistical analysis on the target signal curve to obtain a fifth evaluation index includes:
and calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
7. The method according to claim 1, wherein the method further comprises:
drawing a visual signal diagram respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
And summarizing each visual signal diagram to obtain a signal evaluation summary diagram and displaying the signal evaluation summary diagram.
8. A nanopore sequencing signal evaluation device, the device comprising:
the sequence determining module is used for acquiring an original sequencing signal obtained by detecting the nucleic acid sequence in a nanopore sequencing mode;
the signal segmentation module is used for segmenting the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the hole is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the nucleic acid sequence starts to continuously pass through the hole;
the first statistical module is used for carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
the second statistical module is used for carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
and the effect evaluation module is used for evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the method of any one of claims 1 to 7 when executing the instructions stored by the memory.
10. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410077991.7A CN117594130A (en) | 2024-01-19 | 2024-01-19 | Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410077991.7A CN117594130A (en) | 2024-01-19 | 2024-01-19 | Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117594130A true CN117594130A (en) | 2024-02-23 |
Family
ID=89918814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410077991.7A Pending CN117594130A (en) | 2024-01-19 | 2024-01-19 | Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117594130A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020197618A1 (en) * | 2001-01-20 | 2002-12-26 | Sampson Jeffrey R. | Synthesis and amplification of unstructured nucleic acids for rapid sequencing |
CN103278548A (en) * | 2013-05-02 | 2013-09-04 | 华中科技大学 | Electrical signal calibration method for solid-state nanopore DNA sequencing |
US20190127807A1 (en) * | 2017-10-27 | 2019-05-02 | Sysmex Corporation | Quality evaluation method, quality evaluation apparatus, program, storage medium, and quality control sample |
CN111254190A (en) * | 2020-01-20 | 2020-06-09 | 中国医学科学院病原生物学研究所 | Nanopore third-generation sequencing detection method for plasma virology |
CN112646868A (en) * | 2020-12-23 | 2021-04-13 | 赣南医学院 | Method for detecting pathogenic molecules based on nanopore sequencing |
CN113470751A (en) * | 2021-06-30 | 2021-10-01 | 南方科技大学 | Protein nanopore amino acid sequence screening method, protein nanopore and application of protein nanopore |
CN116246703A (en) * | 2023-03-24 | 2023-06-09 | 赛纳生物科技(广州)有限公司 | Quality assessment method for nucleic acid sequencing data |
CN116434843A (en) * | 2023-03-29 | 2023-07-14 | 赛纳生物科技(北京)有限公司 | Base sequencing quality assessment method |
CN116881634A (en) * | 2023-09-06 | 2023-10-13 | 北京齐碳科技有限公司 | Method, apparatus and storage medium for cleaning nanopore signal data |
-
2024
- 2024-01-19 CN CN202410077991.7A patent/CN117594130A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020197618A1 (en) * | 2001-01-20 | 2002-12-26 | Sampson Jeffrey R. | Synthesis and amplification of unstructured nucleic acids for rapid sequencing |
CN103278548A (en) * | 2013-05-02 | 2013-09-04 | 华中科技大学 | Electrical signal calibration method for solid-state nanopore DNA sequencing |
US20190127807A1 (en) * | 2017-10-27 | 2019-05-02 | Sysmex Corporation | Quality evaluation method, quality evaluation apparatus, program, storage medium, and quality control sample |
CN111254190A (en) * | 2020-01-20 | 2020-06-09 | 中国医学科学院病原生物学研究所 | Nanopore third-generation sequencing detection method for plasma virology |
CN112646868A (en) * | 2020-12-23 | 2021-04-13 | 赣南医学院 | Method for detecting pathogenic molecules based on nanopore sequencing |
CN113470751A (en) * | 2021-06-30 | 2021-10-01 | 南方科技大学 | Protein nanopore amino acid sequence screening method, protein nanopore and application of protein nanopore |
CN116246703A (en) * | 2023-03-24 | 2023-06-09 | 赛纳生物科技(广州)有限公司 | Quality assessment method for nucleic acid sequencing data |
CN116434843A (en) * | 2023-03-29 | 2023-07-14 | 赛纳生物科技(北京)有限公司 | Base sequencing quality assessment method |
CN116881634A (en) * | 2023-09-06 | 2023-10-13 | 北京齐碳科技有限公司 | Method, apparatus and storage medium for cleaning nanopore signal data |
Non-Patent Citations (2)
Title |
---|
FENG XU 等: "Evaluation of real-time nanopore sequencing for Salmonella serotype prediction", 《FOOD MICROBIOLOGY》, vol. 89, 31 August 2020 (2020-08-31), pages 1 - 9 * |
李金;兰海青;梁洪;: "基于纳米孔DNA测序的噪声分析", 生命科学仪器, no. 06, 25 December 2019 (2019-12-25), pages 67 - 74 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Barla et al. | Machine learning methods for predictive proteomics | |
US10097687B2 (en) | Nuisance call detection device and method | |
US10628433B2 (en) | Low memory sampling-based estimation of distinct elements and deduplication | |
CN113361578B (en) | Training method and device for image processing model, electronic equipment and storage medium | |
CN112988753B (en) | Data searching method and device | |
CN112364014B (en) | Data query method, device, server and storage medium | |
WO2020054292A1 (en) | Spectral calibration device and spectral calibration method | |
CN109558600B (en) | Translation processing method and device | |
CN116881430B (en) | Industrial chain identification method and device, electronic equipment and readable storage medium | |
CN112148841B (en) | Object classification and classification model construction method and device | |
CN112559559A (en) | List similarity calculation method and device, computer equipment and storage medium | |
US20230325662A1 (en) | Methods and systems for determining a representative input data set for post-training quantization of artificial neural networks | |
CN117594130A (en) | Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium | |
CN114758720B (en) | Method, apparatus and medium for detecting copy number variation | |
CN116451081A (en) | Data drift detection method, device, terminal and storage medium | |
CN111383766A (en) | Computer data processing method, device, medium and electronic equipment | |
CN115579069A (en) | Construction method and device of scRNA-Seq cell type annotation database and electronic equipment | |
CN109918293B (en) | System test method and device, electronic equipment and computer readable storage medium | |
CN114722401A (en) | Equipment safety testing method, device, equipment and storage medium | |
CN110083807B (en) | Contract modification influence automatic prediction method, device, medium and electronic equipment | |
CN114581711A (en) | Target object detection method, apparatus, device, storage medium, and program product | |
CN113392902A (en) | Data set processing method and device, storage medium and electronic equipment | |
Penariu et al. | A parallel approach on airport runways detection using MPI and CImg | |
CN113721978B (en) | Method and system for detecting open source component in mixed source software | |
CN116992450B (en) | File detection rule determining method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |