CN117594130A - Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium - Google Patents

Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117594130A
CN117594130A CN202410077991.7A CN202410077991A CN117594130A CN 117594130 A CN117594130 A CN 117594130A CN 202410077991 A CN202410077991 A CN 202410077991A CN 117594130 A CN117594130 A CN 117594130A
Authority
CN
China
Prior art keywords
signal
evaluation index
curve
sequencing
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410077991.7A
Other languages
Chinese (zh)
Inventor
杨邵谊
孙琛
王大千
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Puyi Biotechnology Co ltd
Original Assignee
Beijing Puyi Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Puyi Biotechnology Co ltd filed Critical Beijing Puyi Biotechnology Co ltd
Priority to CN202410077991.7A priority Critical patent/CN117594130A/en
Publication of CN117594130A publication Critical patent/CN117594130A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Abstract

The disclosure relates to the field of nucleic acid sequencing, in particular to a nanopore sequencing signal evaluation method, a device, electronic equipment and a storage medium, which are used for acquiring an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode. Dividing the original sequencing signal to obtain a blank current signal which is acquired when the nanopore is not combined with the nucleic acid molecule and is used as a first signal, a current signal which is acquired when the nucleic acid molecule is combined with the nanopore and is not used as a second signal, and a current signal which is used when the nucleic acid sequence starts to continuously pass the hole and is used as a third signal, carrying out statistical analysis on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index, and carrying out statistical analysis on the third signal to obtain a fourth evaluation index and a fifth evaluation index. The raw sequencing signal is evaluated according to at least one evaluation index. The method can obtain various evaluation indexes through signal segmentation and statistical calculation, and evaluate the sequencing process and the sequencing result of the nanopore sequencing mode.

Description

Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of nucleic acid sequencing, and in particular, to a nanopore sequencing signal evaluation method, device, electronic apparatus, and storage medium.
Background
Nanopore sequencing is a new generation of sequencing technology, one of the biggest differences from the second generation of sequencing technology is the conversion of the sequencing signal from an optical signal to a current signal. The quality of the sequencing signal is evaluated, which is a necessary step to ensure the accuracy of the sequencing result. However, the conventional sequencing signal evaluation method of the second generation sequencing technology is not suitable for the third generation sequencing, so that the establishment of a new sequencing signal evaluation system is a key problem in the development of the third generation sequencing technology.
Disclosure of Invention
In view of this, the present disclosure proposes a nanopore sequencing signal evaluation method, device, electronic apparatus and storage medium, aiming at evaluating a nanopore sequencing process and a sequencing result.
According to a first aspect of the present disclosure, there is provided a nanopore sequencing signal evaluation method, the method comprising:
acquiring an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode;
dividing the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the nucleic acid molecule is combined with the nanopore and the via hole is not started, and the third signal is a current signal when the nucleic acid sequence starts continuous via hole;
Carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
and evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
In one possible implementation manner, the performing statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index, and a third evaluation index includes:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
In one possible implementation manner, the performing statistical analysis on the first signal to obtain a first evaluation index includes:
dividing the first signal to obtain a front first signal and a rear first signal;
calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
Calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
and determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
In one possible implementation manner, the performing statistical analysis on the second signal to obtain a second evaluation index includes:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
In one possible implementation manner, the performing a comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index includes:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
In one possible implementation manner, the performing statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index includes:
removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
Carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
In a possible implementation manner, the removing the linker sequence signal in the third signal according to the preset linker sequence signal template to obtain the target sequence signal includes:
searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
and removing the signal segment from the third signal to obtain a target sequence signal.
In one possible implementation manner, the performing statistical analysis on the target sequence signal to obtain a fourth evaluation index includes:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
and determining a fourth evaluation index according to the third signal mean value and the current median.
In one possible implementation manner, the performing a fragmentation process on the target sequence signal to obtain a target signal curve includes:
Fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
and sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve.
In one possible implementation manner, the fragmenting the target sequence signal to obtain at least one sequence signal fragment includes:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
and dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment.
In one possible implementation manner, the calculating a signal observation value between adjacent interval signals includes:
according to the formulaA signal observation t is calculated, which, among other things,andthe number of signal sampling points included in the adjacent two interval signals, Andthe average value of the current values of the signal sampling points included in the two adjacent interval signals,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
In one possible implementation manner, the sorting and plotting the current signals from small to large to obtain a target signal curve includes:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
and carrying out smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
In one possible implementation manner, the performing statistical analysis on the target signal curve to obtain a fifth evaluation index includes:
and calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
In one possible implementation, the method further includes:
drawing a visual signal diagram respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
and summarizing each visual signal diagram to obtain a signal evaluation summary diagram and displaying the signal evaluation summary diagram.
According to a second aspect of the present disclosure, there is provided a nanopore sequencing signal evaluation device, the device comprising:
the sequence determining module is used for acquiring an original sequencing signal obtained by detecting the nucleic acid sequence in a nanopore sequencing mode;
the signal segmentation module is used for segmenting the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the hole is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the nucleic acid sequence starts to continuously pass through the hole;
the first statistical module is used for carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
the second statistical module is used for carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
And the effect evaluation module is used for evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
In one possible implementation, the first statistics module is further configured to:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
In one possible implementation, the first statistics module is further configured to:
dividing the first signal to obtain a front first signal and a rear first signal;
calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
and determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
In one possible implementation, the first statistics module is further configured to:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
In one possible implementation, the first statistics module is further configured to:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
In one possible implementation manner, the second statistics module is further configured to:
removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
In one possible implementation manner, the second statistics module is further configured to:
searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
and removing the signal segment from the third signal to obtain a target sequence signal.
In one possible implementation manner, the second statistics module is further configured to:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
and determining a fourth evaluation index according to the third signal mean value and the current median.
In one possible implementation manner, the second statistics module is further configured to:
fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
and sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve.
In one possible implementation manner, the second statistics module is further configured to:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
And dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment.
In one possible implementation manner, the second statistics module is further configured to:
according to the formulaA signal observation t is calculated, which, among other things,andthe number of signal sampling points included in the adjacent two interval signals,andthe current values of the signal sampling points included in the two adjacent interval signalsThe average value of the two values,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
In one possible implementation manner, the second statistics module is further configured to:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
and carrying out smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
In one possible implementation manner, the second statistics module is further configured to:
and calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
In one possible implementation, the apparatus further includes:
the signal diagram drawing module is used for drawing visual signal diagrams respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
and the image summarizing module is used for summarizing each visual signal diagram to obtain a signal evaluation summarizing diagram and displaying the signal evaluation summarizing diagram.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-described method.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
In an embodiment of the disclosure, an original sequencing signal obtained by detecting a nucleic acid sequence by nanopore sequencing is obtained. Dividing the original sequencing signal to obtain a blank current signal which is acquired when the nanopore is not combined with the nucleic acid molecule and is used as a first signal, a current signal which is acquired when the nucleic acid molecule is combined with the nanopore and is not used as a second signal, and a current signal which is used when the nucleic acid sequence starts to continuously pass the hole and is used as a third signal, carrying out statistical analysis on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index, and carrying out statistical analysis on the third signal to obtain a fourth evaluation index and a fifth evaluation index. The raw sequencing signal is evaluated according to at least one evaluation index. The method can obtain various evaluation indexes through signal segmentation and statistical calculation, and evaluate the sequencing process and the sequencing result of the nanopore sequencing mode.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flowchart of a nanopore sequencing signal evaluation method according to an embodiment of the present disclosure.
Fig. 2 shows a schematic diagram of a signal splitting process according to an embodiment of the present disclosure.
Fig. 3 shows a schematic diagram of a rear first signal according to an embodiment of the present disclosure.
Fig. 4 shows a schematic diagram of a fragmentation processing result according to an embodiment of the present disclosure.
Fig. 5 shows a schematic diagram of a target signal profile according to an embodiment of the present disclosure.
Fig. 6 shows a schematic diagram of a signal evaluation summary graph, according to an embodiment of the disclosure.
Fig. 7 shows a schematic diagram of a nanopore sequencing signal evaluation process according to an embodiment of the present disclosure.
Fig. 8 shows a schematic diagram of a nanopore sequencing signal evaluation device according to an embodiment of the present disclosure.
Fig. 9 shows a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
The nanopore sequencing signal evaluation method of the embodiment of the disclosure can be executed by electronic equipment such as terminal equipment or a server. The terminal device may be any fixed or mobile terminal such as a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, etc. The server may be a single server or a server cluster composed of a plurality of servers. Any electronic device may implement the nanopore sequencing signal evaluation method of the embodiments of the present disclosure by way of a processor invoking computer readable instructions stored in a memory.
Fig. 1 shows a flowchart of a nanopore sequencing signal evaluation method according to an embodiment of the present disclosure. As shown in fig. 1, the nanopore sequencing signal evaluation method of the embodiments of the present disclosure may include the following steps S10 to S50.
And S10, acquiring an original sequencing signal obtained by detecting the nucleic acid sequence in a nanopore sequencing mode.
In one possible implementation, an electronic device is used to obtain an original sequencing signal obtained by detecting a nucleic acid sequence based on a nanopore sequencing mode, where the original sequencing signal is a current signal obtained by acquiring according to a preset sampling rate. The process of detecting the nucleic acid sequence can be realized by a sequencer, namely, the sequencer detects the nucleic acid sequence based on a nanopore sequencing mode, an original sequencing signal with a predetermined sampling rate is obtained, and then the original sequencing signal is sent to electronic equipment. In the nanopore sequencing process, a nucleic acid sequence to be detected is combined with the nanopore through the nanopore, and a current signal is collected by a sequencer according to a preset sampling rate in the whole sequencing process, so that an original sequencing signal is obtained.
Optionally, under the condition that the electronic device has a sequencing function, the electronic device can be used for directly detecting the nucleic acid sequence based on a nanopore sequencing mode to obtain an original sequencing signal with a predetermined sampling rate.
Step S20, dividing the original sequencing signal to obtain a first signal, a second signal and a third signal. In one possible implementation manner, after the electronic device obtains the original sequencing signal, the electronic device may divide the original sequencing signal according to signal characteristics of different portions in the original sequencing signal, so as to obtain three signals, namely a first signal, a second signal and a third signal. The first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal acquired when the hole is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the continuous hole is started after the nucleic acid molecule in the nucleic acid sequence is combined with the nanopore. I.e. the first signal is essentially a hole current signal, the second signal is a Spacer current signal and the third signal is an actual sequencing current signal.
Fig. 2 shows a schematic diagram of a signal splitting process according to an embodiment of the present disclosure. As shown in fig. 2, the black waveform is an original sequencing signal, the front part of the first vertical line on the left side is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the part between the first vertical line on the left side and the middle vertical line is a current signal acquired when the nucleic acid molecule is combined with the nanopore and the via hole is not started, and the signal between the first vertical line on the right side and the middle vertical line is a current signal when the nucleic acid molecule in the nucleic acid sequence is combined with the nanopore and the via hole is started continuously. Thus, the original sequencing signal can be split into a first signal, a second signal and a third signal according to the positions of the three vertical lines in the figure. Optionally, before the electronic device segments the original sequencing signal, the electronic device may also perform low-pass filtering on the original sequencing signal according to an average filter with a preset window length, so as to remove part of interference signals included in the original sequencing signal, and improve accuracy of three signals obtained by segmentation.
And step S30, carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index.
In one possible implementation manner, the electronic device may obtain three kinds of evaluation indexes, i.e., a first evaluation index, a second evaluation index and a third evaluation index, by performing statistical analysis on two kinds of information of the first signal and the second signal, so as to be used for evaluating the quality of the original sequencing signal obtained by sequencing. The first signal can be subjected to statistical analysis to obtain a first evaluation index, the second signal can be subjected to statistical analysis to obtain a second evaluation index, and the first signal and the second signal are subjected to comprehensive statistical analysis to obtain a third evaluation index. Alternatively, one or more index contents may be included in each evaluation index.
Optionally, the first evaluation index may include a front first signal average value, a rear first signal average value, and first signal noise corresponding to the first signal. The front first signal and the rear first signal can be obtained by further dividing the first signal, wherein the front first signal is a first signal positioned before a section of the second signal, and the rear first signal is a first signal positioned after a section of the third signal. Fig. 3 shows a schematic diagram of a rear first signal according to an embodiment of the present disclosure. As shown in fig. 3, the signal to the left of the vertical line is the third signal in the section of the original sequencing signal, and the signal to the right of the vertical line is a portion of the first signal that follows the third signal. That is, typically the electronic device may determine that a front portion of a first signal is a rear first signal and that a rear portion is a front first signal. The duration of the front first signal and the rear first signal may be selected as desired.
When the first evaluation index is determined, the electronic device may divide the first signal to obtain a front first signal and a rear first signal, calculate a front first signal average value and a rear first signal average value, where the signal average value is an average value of current values of each signal sampling point in the corresponding signal. And calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points. And determining a first evaluation index according to the front first signal mean value, the rear first signal mean value and the first signal noise. Wherein the first signal noise can be calculated by the formulaN is calculated as the number of signal sampling points in the first signal,for the current value of the signal sampling point i,the average current value of all signal sampling points in the first signal.
Further, when determining the second evaluation index, the electronic device may calculate a current value average value of each signal sampling point in the second signal, to obtain the second signal average value as the second evaluation index. When determining the third evaluation index, the electronic device may calculate a ratio of the current value means of the first signal and the second signal, to obtain the third evaluation index. The ratio of the first signal average value to the second signal average value can be used as a third evaluation index.
And step S40, carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index.
In one possible implementation, the electronic device may perform statistical analysis on the third signal to obtain two kinds of evaluation indexes, i.e., a fourth evaluation index and a fifth evaluation index, where each evaluation index may include one or more index contents. The electronic device may remove the linker sequence signal in the third signal according to a preset linker sequence signal template, so as to obtain a target sequence signal. I.e. the electronic device may identify a part of the splice sequence signal in the third signal, and delete this part to remove the interference information in the third signal. And then carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index. And carrying out fragmentation processing on the target sequence signal to obtain a target signal curve. And carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
Alternatively, the linker sequence signal in the third signal may be identified based on a dynamic adjustment algorithm, i.e. the electronic device may search the third signal for a signal segment most similar to the linker sequence signal template based on the dynamic adjustment algorithm. And removing the signal segment from the third signal to obtain the target sequence signal. The dynamic adjustment algorithm may be a DTW (Dynamic Time Warping ) algorithm for correctly calculating the similarity between two time series sequences by performing time domain alignment adjustment non-linearly, the basic idea being to find a path in the joint signal sequence template and the third signal such that the euclidean distance sum of the points on the path is minimized. In the process of searching the signal fragments based on the dynamic adjustment algorithm, preset constraint conditions need to be met: (1) The start point of the signal fragment must correspond to the start point of the linker sequence signal template and the end point must correspond to the end point of the linker sequence signal template; (2) The signal segments must be continuous, i.e. the next point in the search path must be adjacent to the current point in the third signal; (3) The search path must be monotonic, i.e. the next point must be to the right of the current point. In searching for the signal segment in the third signal that is most similar to the linker sequence signal template, a two-dimensional array may be created based on the DTW algorithm, with each point in the array representing the distance between two points in the linker sequence signal template and two third signals. Then a path from the start point to the end point is found in the array, so that the distance sum on the path is minimum, and the searching process can be realized through dynamic programming.
Further, after determining the target sequence signal based on the third signal, the electronic device performs statistical analysis on the target sequence signal to obtain a fourth evaluation index. The average value and the median of the current value of each signal sampling point in the target sequence signal can be determined, and the third signal average value and the current median can be obtained. The fourth evaluation index may include a third signal mean and a current median.
In one possible implementation manner, the electronic device may further perform a fragmentation process on the target sequence signal after determining the target sequence signal to obtain the target signal curve. Alternatively, the determining process of the target curve may be that the target sequence signal is firstly subjected to a fragmentation process to obtain at least one sequence signal fragment. And calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal. And sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve. The process of fragmentation processing can be realized based on an independent sample T test (Independent Samples T-test), namely, for signals which are locally in normal distribution or approximately normal distribution, a sliding window is used for carrying out independent sample T test between two adjacent groups of signal points, if the T statistic of the test result is smaller than a preset critical value, the two groups of signal points are regarded as current signals of the same k-mer, and finally, the average value of the current signal points contained in the k-mer is taken as the current value of the k-mer, wherein the k-mer refers to a sub-signal point sequence with the length of k in a long signal point sequence.
Specifically, the electronic device may first sequentially obtain a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and with a preset first step length. And calculating a signal observation value between adjacent interval signals, and determining that the adjacent interval signals belong to two different signal fragments when the signal observation value is larger than a critical value. And further dividing the target sequence signal according to the signal fragments to which the different intermediate signals belong to obtain at least one sequence signal fragment. The signal observations t may be according to the formulaThe calculation result shows that the method comprises the steps of,andthe number of signal sampling points included in the adjacent two interval signals,andthe average value of the current values of the signal sampling points included in the two adjacent interval signals,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
Fig. 4 shows a schematic diagram of a fragmentation processing result according to an embodiment of the present disclosure. As shown in fig. 4, after the fragmentation process, the original target sequence signal (non-square waveform curve) is divided into a curve signal (square waveform curve) composed of a plurality of sequence signal fragments.
Further, after the target sequence signal is fragmented, the electronic device may determine a corresponding current signal based on the current value average of the signal sampling points included in each sequence signal fragment, and then sequence and draw a curve for each current signal to obtain a target signal curve. The candidate signal curves can be obtained by sequencing and drawing curves of all current signals from small to large. And performing smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
Alternatively, the filtering method of the local polynomial least squares fitting may be a Savitzky-Golay algorithm, which is a filtering method based on the local polynomial least squares fitting in the time domain. The core idea is to perform k-order polynomial fitting on data points in a window with a certain length, so as to obtain a fitted result. The Savitzky-Golay method can be used for filtering signal noise and ensuring that the shape and the width of the signal are unchanged. The filtering process based on the filtering method can be to acquire the curve segments in the candidate signal curves for multiple times according to a second sliding window with the size of 2Q+1 by a preset second step length. According to the formulaThe center signal point x of each curve segment is smoothed,and y is a candidate signal curve, wherein the weight factor is a preset weight factor based on local polynomial least square fitting.
Fig. 5 shows a schematic diagram of a target signal profile according to an embodiment of the present disclosure. As shown in fig. 5, the black curve is a target signal curve drawn by the electronic device based on the target sequence signal after the fragmentation processing, the curve is a curve with values sequentially rising from small to large, and the straight line is the slope of the target signal curve.
Further, after determining the target signal curve, the electronic device may calculate, by statistics, a signal maximum value, a signal minimum value, a signal median, a signal preset percentile, a maximum point of curve curvature, a minimum point of curve curvature, and a curve maximum value in the target signal curve And obtaining a fifth evaluation index by the tangential point of the line and the curve, the slope of the curve maximum connecting line, the curve signal noise and the curve signal noise ratio and the offset degree of the target sequence signal after the fragmentation processing. The preset percentile may be at least one preset percentile, and may include, for example, 90% percentile and 10% percentile. The curvature of the curve can be according to the formulaThe calculation result shows that the method comprises the steps of,is the first derivative of the target signal curve,is the second derivative of the target signal curve. The curve signal noise can be according to the formulaCalculated, the signal-to-noise ratio of the curve signal can be calculated according to the formulaM is calculated as the number of sequence signal fragments of the corresponding target sequence signal in the target signal curve,for the number of signal sample points included in the i-th sequence signal segment,the current value of the jth signal sampling point in the ith sequence of signal segments,the average value of the current values of all signal sampling points in the ith sequence signal segment. The offset degree of the target sequence signal after the fragmentation processing can be calculated according to the Euclidean distance of the signals before and after the fragmentation processing of the target sequence signal.
And S50, evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
In one possible implementation manner, the electronic device obtains a first evaluation index, a second evaluation index, a third evaluation index, a fourth evaluation index and a fifth evaluation index through statistical analysis of a first signal, a second signal and a third signal obtained by dividing an original sequencing signal, and evaluates the original sequencing signal according to the at least one evaluation index based on actual requirements of a specific application scene. The first evaluation index is used for reflecting the sequencing stability of the nanopore for acquiring the original sequencing signal and the stability of the nanopore. The second evaluation index is used for reflecting the sequencing sensitivity of the nanopore in combination with the first evaluation index, and the third evaluation index is used for evaluating the sequencing sensitivity of the nanopore. The fourth evaluation index is used for reflecting the current value of the original sequencing signal obtained by sequencing the nanopore, and the fifth evaluation index is used for reflecting the sequencing performance of the nanopore and the noise and signal-to-noise ratio of the original sequencing signal. Under the conditions that the target signal curve is smoother and the shape is closer to the slope, the better the resolution effect of the nanopore and the better the sequencing performance are. The curve signal noise and curve signal to noise ratio reflect the noise and signal to noise ratio, respectively, of the original sequencing signal.
Further, after the above several evaluation indexes are obtained, the electronic device may also visually display important indexes therein. For example, visual signal graphs corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index respectively can be drawn, and then each visual signal graph is summarized to obtain a signal evaluation summary graph and displayed. The visual signal diagram drawn by each index can comprise at least one, and each visual signal diagram can be a violin diagram or a box diagram.
Fig. 6 shows a schematic diagram of a signal evaluation summary graph, as shown in fig. 6, in which a single sequencing includes multiple biological replicates, using a violin graph and box plot to visualize the distribution of indices, resulting in a signal evaluation summary graph, according to an embodiment of the present disclosure. The signal evaluation summary graph comprises five visual signal graphs drawn according to a first evaluation index, a third evaluation index, a fourth evaluation index and a fifth evaluation index, wherein the first evaluation index corresponds to the visual signal graph for representing comprehensive stability, the third evaluation index corresponds to the visual signal graph for representing sequencing sensitivity, the fourth evaluation index corresponds to the visual signal graph for representing absolute magnitude of sequencing current, and the fifth evaluation index corresponds to the visual signal graph for representing sequencing signal noise and the visual signal graph for representing signal-to-noise ratio of the sequencing signal. In the visualized signal diagram, the width of the violin diagram represents the probability density of the observation of the original sequencing signal at the value, the boundaries of the box diagram represent the upper edge of the observed value, the upper quartile of the observed value, the middle of the observed value, the lower quartile of the observed value and the lower edge of the observed value in sequence from top to bottom, and the black solid points represent outliers, namely outliers.
Fig. 7 shows a schematic diagram of a nanopore sequencing signal evaluation process according to an embodiment of the present disclosure. As shown in fig. 7, after acquiring an original sequencing signal obtained by detecting a nucleic acid sequence by a nanopore sequencing method, the electronic device segments the original sequencing signal to obtain a first signal (a hole signal), a second signal (a Space signal) and a third signal (a sequencing sequence signal). Further, the first signal is subjected to statistical analysis to obtain a first evaluation index, the second signal is subjected to statistical analysis to obtain a second evaluation index, and a third evaluation index is determined based on the first evaluation index and the second evaluation index. And removing the linker sequence signal in the third signal based on the linker sequence signal template to obtain a target sequence signal. And directly carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index, carrying out fragmentation treatment on the target sequence signal to obtain a target signal curve (K-mer signal curve), and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index. Further, the original sequencing signal is evaluated according to at least one of the first, second, third, fourth and fifth evaluation indexes. Further, the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index may be summarized, and a corresponding signal evaluation summary graph may be drawn for visual evaluation.
Based on the technical characteristics, the embodiment of the disclosure can obtain different parts in an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode through signal segmentation, count different part signals in a targeted manner through different methods to obtain various evaluation indexes, and comprehensively and accurately evaluate a sequencing process and a sequencing result of the nanopore sequencing mode.
Fig. 8 shows a schematic diagram of a nanopore sequencing signal evaluation device according to an embodiment of the present disclosure. As shown in fig. 8, the nanopore sequencing signal evaluation device of the embodiment of the present disclosure may include:
a sequence determining module 80, configured to obtain an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing manner;
the signal segmentation module 81 is configured to segment the original sequencing signal to obtain a first signal, a second signal and a third signal, where the first signal is a blank current signal collected when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the nanopore is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the nucleic acid sequence starts continuous via;
a first statistics module 82, configured to perform a statistical analysis based on the first signal and the second signal, to obtain a first evaluation index, a second evaluation index, and a third evaluation index;
The second statistics module 83 is configured to perform statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
an effect evaluation module 84 for evaluating the raw sequencing signal according to at least one of the first, second, third, fourth, and fifth evaluation indices.
In one possible implementation, the first statistics module 82 is further configured to:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
In one possible implementation, the first statistics module 82 is further configured to:
dividing the first signal to obtain a front first signal and a rear first signal;
calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
And determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
In one possible implementation, the first statistics module 82 is further configured to:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
In one possible implementation, the first statistics module 82 is further configured to:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
In a possible implementation manner, the second statistics module 83 is further configured to:
removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
and carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index.
In a possible implementation manner, the second statistics module 83 is further configured to:
Searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
and removing the signal segment from the third signal to obtain a target sequence signal.
In a possible implementation manner, the second statistics module 83 is further configured to:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
and determining a fourth evaluation index according to the third signal mean value and the current median.
In a possible implementation manner, the second statistics module 83 is further configured to:
fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
and sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve.
In a possible implementation manner, the second statistics module 83 is further configured to:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
Calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
and dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment.
In a possible implementation manner, the second statistics module 83 is further configured to:
according to the formulaA signal observation t is calculated, which, among other things,andthe number of signal sampling points included in the adjacent two interval signals,andthe average value of the current values of the signal sampling points included in the two adjacent interval signals,andthe current value variances of the signal sampling points included in the adjacent two section signals are respectively.
In a possible implementation manner, the second statistics module 83 is further configured to:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
and carrying out smooth filtering on the candidate signal curve based on a filtering method of local polynomial least square fitting to obtain a target signal curve.
In a possible implementation manner, the second statistics module 83 is further configured to:
And calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
In one possible implementation, the apparatus further includes:
the signal diagram drawing module is used for drawing visual signal diagrams respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
and the image summarizing module is used for summarizing each visual signal diagram to obtain a signal evaluation summarizing diagram and displaying the signal evaluation summarizing diagram.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
Fig. 9 shows a schematic diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 9, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958 (I/O interface). The electronic device 1900 may operate an operating system based on a memory 1932, such as Windows Server TM ,Mac OS X TM ,Unix TM , Linux TM ,FreeBSD TM Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method of nanopore sequencing signal evaluation, the method comprising:
acquiring an original sequencing signal obtained by detecting a nucleic acid sequence in a nanopore sequencing mode;
dividing the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the nucleic acid molecule is combined with the nanopore and the via hole is not started, and the third signal is a current signal when the nucleic acid sequence starts continuous via hole;
Carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
and evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
2. The method of claim 1, wherein the performing a statistical analysis based on the first signal and the second signal results in a first evaluation index, a second evaluation index, and a third evaluation index, comprising:
carrying out statistical analysis on the first signal to obtain a first evaluation index;
carrying out statistical analysis on the second signal to obtain a second evaluation index;
and carrying out comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index.
3. The method of claim 2, wherein the performing a statistical analysis on the first signal to obtain a first evaluation index comprises:
dividing the first signal to obtain a front first signal and a rear first signal;
Calculating the front first signal average value and the rear first signal average value, wherein the signal average value is an average value of current values of each signal sampling point in corresponding signals;
calculating first signal noise of the first signal according to the number of signal sampling points in the first signal, the current value of each signal sampling point and the average current value of all the signal sampling points;
and determining a first evaluation index according to the front first signal average value, the rear first signal average value and the first signal noise.
4. A method according to claim 2 or 3, wherein said statistically analyzing the second signal to obtain a second evaluation index comprises:
and calculating the average value of the current value of each signal sampling point in the second signal to obtain a second signal average value as a second evaluation index.
5. The method of claim 2, wherein performing a comprehensive statistical analysis on the first signal and the second signal to obtain a third evaluation index comprises:
and calculating the ratio of the current value mean value of the first signal to the current value mean value of the second signal to obtain a third evaluation index.
6. The method of claim 1, wherein the performing a statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index comprises:
Removing the joint sequence signal in the third signal according to a preset joint sequence signal template to obtain a target sequence signal;
carrying out statistical analysis on the target sequence signal to obtain a fourth evaluation index;
fragmenting the target sequence signal to obtain a target signal curve;
carrying out statistical analysis on the target signal curve to obtain a fifth evaluation index;
removing the linker sequence signal in the third signal according to a preset linker sequence signal template to obtain a target sequence signal, including:
searching for a signal segment in the third signal that is most similar to the linker sequence signal template based on a dynamic adjustment algorithm;
removing the signal segment from the third signal to obtain a target sequence signal;
the step of performing statistical analysis on the target sequence signal to obtain a fourth evaluation index includes:
determining the average value and the median of the current value of each signal sampling point in the target sequence signal to obtain a third signal average value and a third signal median;
determining a fourth evaluation index according to the third signal mean value and the current median;
the step of carrying out fragmentation processing on the target sequence signal to obtain a target signal curve comprises the following steps:
Fragmenting the target sequence signal to obtain at least one sequence signal fragment;
calculating the average value of the current values of the signal sampling points in each sequence signal segment to obtain a corresponding current signal;
sequencing each current signal from small to large, and drawing a curve to obtain a target signal curve;
the step of performing fragmentation processing on the target sequence signal to obtain at least one sequence signal fragment comprises the following steps:
sequentially acquiring a plurality of interval signals in the target sequence signal according to a first sliding window with a preset size and a preset first step length;
calculating signal observation values between adjacent interval signals;
determining that the adjacent interval signal belongs to two different signal segments in response to the signal observation value being greater than a threshold value;
dividing the target sequence signal according to the signal fragments to which the different regional signals belong to, so as to obtain at least one sequence signal fragment;
the calculating the signal observation value between the adjacent interval signals comprises the following steps:
according to the formulaCalculating a signal observation value t, wherein +.>And->Respectively, signals included in two adjacent interval signalsNumber of sampling points>And->Respectively, the average value of the current values of the signal sampling points included in the two adjacent interval signals, +. >And->The current value variances of signal sampling points included in two adjacent interval signals are respectively;
sequencing from small to large for each current signal and drawing a curve to obtain a target signal curve, wherein the method comprises the following steps of:
sequencing each current signal from small to large, and drawing a curve to obtain a candidate signal curve;
smoothing filtering is carried out on the candidate signal curve based on a filtering method of local polynomial least square fitting, and a target signal curve is obtained;
the step of performing statistical analysis on the target signal curve to obtain a fifth evaluation index includes:
and calculating the maximum value, the minimum value, the median, the preset percentile, the maximum point of the curve curvature, the minimum point of the curve curvature, the tangent point of the curve maximum value connecting line and the curve, the slope of the curve maximum value connecting line, the offset degree after the target sequence signal fragmentation processing, the curve signal noise and the curve signal to noise ratio of the target sequence signal in the target signal curve by statistics to obtain a fifth evaluation index.
7. The method according to claim 1, wherein the method further comprises:
drawing a visual signal diagram respectively corresponding to the first evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index;
And summarizing each visual signal diagram to obtain a signal evaluation summary diagram and displaying the signal evaluation summary diagram.
8. A nanopore sequencing signal evaluation device, the device comprising:
the sequence determining module is used for acquiring an original sequencing signal obtained by detecting the nucleic acid sequence in a nanopore sequencing mode;
the signal segmentation module is used for segmenting the original sequencing signal to obtain a first signal, a second signal and a third signal, wherein the first signal is a blank current signal acquired when the nanopore is not combined with the nucleic acid molecule, the second signal is a current signal when the hole is not started after the nucleic acid molecule is combined with the nanopore, and the third signal is a current signal when the nucleic acid sequence starts to continuously pass through the hole;
the first statistical module is used for carrying out statistical analysis based on the first signal and the second signal to obtain a first evaluation index, a second evaluation index and a third evaluation index;
the second statistical module is used for carrying out statistical analysis based on the third signal to obtain a fourth evaluation index and a fifth evaluation index;
and the effect evaluation module is used for evaluating the original sequencing signal according to at least one of the first evaluation index, the second evaluation index, the third evaluation index, the fourth evaluation index and the fifth evaluation index.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the method of any one of claims 1 to 7 when executing the instructions stored by the memory.
10. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.
CN202410077991.7A 2024-01-19 2024-01-19 Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium Pending CN117594130A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410077991.7A CN117594130A (en) 2024-01-19 2024-01-19 Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410077991.7A CN117594130A (en) 2024-01-19 2024-01-19 Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117594130A true CN117594130A (en) 2024-02-23

Family

ID=89918814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410077991.7A Pending CN117594130A (en) 2024-01-19 2024-01-19 Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117594130A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020197618A1 (en) * 2001-01-20 2002-12-26 Sampson Jeffrey R. Synthesis and amplification of unstructured nucleic acids for rapid sequencing
CN103278548A (en) * 2013-05-02 2013-09-04 华中科技大学 Electrical signal calibration method for solid-state nanopore DNA sequencing
US20190127807A1 (en) * 2017-10-27 2019-05-02 Sysmex Corporation Quality evaluation method, quality evaluation apparatus, program, storage medium, and quality control sample
CN111254190A (en) * 2020-01-20 2020-06-09 中国医学科学院病原生物学研究所 Nanopore third-generation sequencing detection method for plasma virology
CN112646868A (en) * 2020-12-23 2021-04-13 赣南医学院 Method for detecting pathogenic molecules based on nanopore sequencing
CN113470751A (en) * 2021-06-30 2021-10-01 南方科技大学 Protein nanopore amino acid sequence screening method, protein nanopore and application of protein nanopore
CN116246703A (en) * 2023-03-24 2023-06-09 赛纳生物科技(广州)有限公司 Quality assessment method for nucleic acid sequencing data
CN116434843A (en) * 2023-03-29 2023-07-14 赛纳生物科技(北京)有限公司 Base sequencing quality assessment method
CN116881634A (en) * 2023-09-06 2023-10-13 北京齐碳科技有限公司 Method, apparatus and storage medium for cleaning nanopore signal data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020197618A1 (en) * 2001-01-20 2002-12-26 Sampson Jeffrey R. Synthesis and amplification of unstructured nucleic acids for rapid sequencing
CN103278548A (en) * 2013-05-02 2013-09-04 华中科技大学 Electrical signal calibration method for solid-state nanopore DNA sequencing
US20190127807A1 (en) * 2017-10-27 2019-05-02 Sysmex Corporation Quality evaluation method, quality evaluation apparatus, program, storage medium, and quality control sample
CN111254190A (en) * 2020-01-20 2020-06-09 中国医学科学院病原生物学研究所 Nanopore third-generation sequencing detection method for plasma virology
CN112646868A (en) * 2020-12-23 2021-04-13 赣南医学院 Method for detecting pathogenic molecules based on nanopore sequencing
CN113470751A (en) * 2021-06-30 2021-10-01 南方科技大学 Protein nanopore amino acid sequence screening method, protein nanopore and application of protein nanopore
CN116246703A (en) * 2023-03-24 2023-06-09 赛纳生物科技(广州)有限公司 Quality assessment method for nucleic acid sequencing data
CN116434843A (en) * 2023-03-29 2023-07-14 赛纳生物科技(北京)有限公司 Base sequencing quality assessment method
CN116881634A (en) * 2023-09-06 2023-10-13 北京齐碳科技有限公司 Method, apparatus and storage medium for cleaning nanopore signal data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FENG XU 等: "Evaluation of real-time nanopore sequencing for Salmonella serotype prediction", 《FOOD MICROBIOLOGY》, vol. 89, 31 August 2020 (2020-08-31), pages 1 - 9 *
李金;兰海青;梁洪;: "基于纳米孔DNA测序的噪声分析", 生命科学仪器, no. 06, 25 December 2019 (2019-12-25), pages 67 - 74 *

Similar Documents

Publication Publication Date Title
Barla et al. Machine learning methods for predictive proteomics
US10097687B2 (en) Nuisance call detection device and method
US10628433B2 (en) Low memory sampling-based estimation of distinct elements and deduplication
CN113361578B (en) Training method and device for image processing model, electronic equipment and storage medium
CN112988753B (en) Data searching method and device
CN112364014B (en) Data query method, device, server and storage medium
WO2020054292A1 (en) Spectral calibration device and spectral calibration method
CN109558600B (en) Translation processing method and device
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN112148841B (en) Object classification and classification model construction method and device
CN112559559A (en) List similarity calculation method and device, computer equipment and storage medium
US20230325662A1 (en) Methods and systems for determining a representative input data set for post-training quantization of artificial neural networks
CN117594130A (en) Nanopore sequencing signal evaluation method and device, electronic equipment and storage medium
CN114758720B (en) Method, apparatus and medium for detecting copy number variation
CN116451081A (en) Data drift detection method, device, terminal and storage medium
CN111383766A (en) Computer data processing method, device, medium and electronic equipment
CN115579069A (en) Construction method and device of scRNA-Seq cell type annotation database and electronic equipment
CN109918293B (en) System test method and device, electronic equipment and computer readable storage medium
CN114722401A (en) Equipment safety testing method, device, equipment and storage medium
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN114581711A (en) Target object detection method, apparatus, device, storage medium, and program product
CN113392902A (en) Data set processing method and device, storage medium and electronic equipment
Penariu et al. A parallel approach on airport runways detection using MPI and CImg
CN113721978B (en) Method and system for detecting open source component in mixed source software
CN116992450B (en) File detection rule determining method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination