CN113837036A - Characterization method, device and equipment of biological polymer and computer storage medium - Google Patents

Characterization method, device and equipment of biological polymer and computer storage medium Download PDF

Info

Publication number
CN113837036A
CN113837036A CN202111054279.8A CN202111054279A CN113837036A CN 113837036 A CN113837036 A CN 113837036A CN 202111054279 A CN202111054279 A CN 202111054279A CN 113837036 A CN113837036 A CN 113837036A
Authority
CN
China
Prior art keywords
feature extraction
matrix
time sequence
combined
biopolymer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111054279.8A
Other languages
Chinese (zh)
Other versions
CN113837036B (en
Inventor
魏强
卓远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Qitan Technology Ltd
Original Assignee
Chengdu Qitan Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Qitan Technology Ltd filed Critical Chengdu Qitan Technology Ltd
Priority to CN202111054279.8A priority Critical patent/CN113837036B/en
Priority claimed from CN202111054279.8A external-priority patent/CN113837036B/en
Publication of CN113837036A publication Critical patent/CN113837036A/en
Priority to PCT/CN2022/104435 priority patent/WO2023035757A1/en
Application granted granted Critical
Publication of CN113837036B publication Critical patent/CN113837036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a characterization method, a characterization device, characterization equipment and a computer storage medium of a biopolymer, wherein local feature extraction and time sequence feature extraction can be carried out on a first electric signal of a target biopolymer through a preset neural network model to obtain combined feature information of relevant local features and time sequence features; and characterizing the target biopolymer according to the combined characteristic information. Therefore, in the characterization process of the biopolymer, the combined characteristic information which gives consideration to the local characteristic and the time sequence characteristic can be obtained through the preset neural network model and is used for identifying the structural unit sequence or the modification information of the target biopolymer, so that the characterization result is more accurate.

Description

Characterization method, device and equipment of biological polymer and computer storage medium
Technical Field
The present disclosure relates to the field of biotechnology, and in particular, to a method, an apparatus, a device, and a computer storage medium for characterizing a biopolymer.
Background
The characterization of the biopolymer is to utilize membrane array nano-pores or similar pores in the characterization device of the biopolymer, collect electric signals generated by the biopolymer in the process of passing through the pores, and then identify the internal structural composition of the biopolymer through a Convolutional Neural Network (CNN) according to the collected electric signals. For example, using nanopore to perform gene sequencing, DNA double helix strand is subjected to the action of unwinding speed-control enzyme, and DNA is unwound into single helix strand and then transported into nanopore, so as to obtain electric signal of corresponding base in nanopore. The CNN is used for extracting the characteristic information of electric signals among bases, so that the base sequence can be identified, and the sequencing is completed.
However, due to the influence of factors such as nanopores and rate-controlling enzymes, the speed of different biopolymers passing through the nanopores is different, so that the step duration of the acquired electric signals is greatly different, and the CNN network cannot accurately and comprehensively extract the feature information of biopolymers with different step durations of the electric signals due to the limitation of the network structure of the CNN network, so that the accuracy of the finally obtained characterization result is low.
Disclosure of Invention
The embodiment of the disclosure provides a characterization method, a characterization device and a characterization device of a biopolymer, and a computer storage medium, which can improve the accuracy of gene sequencing.
In one aspect, embodiments of the present disclosure provide a method for characterizing a biopolymer, including:
obtaining a first electrical signal as the biopolymer passes through the array of holes, the first electrical signal being a data set comprising a plurality of electrical parameter values;
arranging the electrical parameter values in the electrical signal data set according to a time sequence order to generate a first matrix;
inputting the first matrix into a preset neural network model, and performing local feature extraction and time sequence feature extraction to obtain combined feature information associated with local features and time sequence features; the neural network model comprises at least two combined feature extraction networks, and each combined feature extraction network comprises a local feature extraction network and a time sequence feature extraction network which are sequentially connected;
and characterizing the target biopolymer according to the combined characteristic information.
In some embodiments, inputting the first matrix into a preset neural network model, performing local feature extraction and time sequence feature extraction, and obtaining combined feature information associated with the local feature and the time sequence feature, includes:
inputting the first matrix into a local feature extraction network in the combined feature extraction network, and performing convolution processing to obtain a second matrix containing the local feature information of the first electric signal;
inputting data contained in the second matrix into a time sequence characteristic extraction network according to a time sequence to obtain a third matrix containing the time sequence characteristic information of the first electric signal;
under the condition that the iteration times do not meet the preset iteration times, updating the third matrix into a first matrix, and iteratively executing a local feature extraction network which inputs the first matrix into the combined feature extraction network until the iteration times meet the preset iteration times;
and under the condition that the iteration times meet the preset iteration times, outputting the combined characteristic information of the first electric signal according to the third matrix.
In some embodiments, outputting the combined signature information for the first electrical signal according to a third matrix comprises:
determining a third matrix as the combined characteristic information of the first electrical signal;
and outputting the combined characteristic information.
In some embodiments, characterizing the target biopolymer from the combined feature information includes:
and analyzing the combined characteristic information through a preset machine learning model to obtain the recognition result of the structural unit in the target biopolymer and/or the recognition result of the modification information in the target biopolymer.
In some embodiments, the preset machine learning model is a conditional random field model or a hidden markov HMM model.
In some embodiments, prior to acquiring the first electrical signal as the target biopolymer passes through the array of wells, the method further comprises:
obtaining test sample data; the test sample data comprises combined characteristic information of a second electrical signal corresponding to a biopolymer structural unit sequence, wherein the combined characteristic information of the second electrical signal comprises a fifth matrix associated with local characteristic information and time sequence characteristic information of the second electrical signal;
and the fifth matrix is used for training a local feature extraction network and a time sequence feature extraction network to obtain a neural network model.
In another aspect, embodiments of the present disclosure provide a device for characterizing a biopolymer, the device including:
an acquisition module for acquiring a first electrical signal of a target biopolymer as it passes through the array of wells, the first electrical signal being a data set of electrical signals comprising a plurality of electrical parameter values;
the generating module is used for arranging the electrical parameter values in the electrical signal data set according to a time sequence order to generate a first matrix;
the extraction module is used for inputting the first matrix into a preset neural network model, performing local feature extraction and time sequence feature extraction and obtaining combined feature information associated with the local features and the time sequence features; the neural network model comprises at least two combined feature extraction networks, and each combined feature extraction network comprises a local feature extraction network and a time sequence feature extraction network which are sequentially connected;
and the characterization module is used for characterizing the target biopolymer according to the combined characteristic information.
In some embodiments, the apparatus further comprises:
the extraction module specifically comprises:
the local feature extraction submodule is used for inputting the first matrix into a local feature extraction network in the combined feature extraction network, and performing convolution processing to obtain a second matrix containing the local feature information of the first electric signal;
the time sequence characteristic extraction submodule is used for inputting the data contained in the second matrix into a time sequence characteristic extraction network according to a time sequence order to obtain a third matrix containing the time sequence characteristic information of the first electric signal;
the iteration submodule is used for updating the third matrix into the first matrix under the condition that the iteration times do not meet the preset iteration times, and iteratively executing the local feature extraction network which inputs the first matrix into the combined feature extraction network until the iteration times meet the preset iteration times;
and the output submodule is used for outputting the combined characteristic information of the first electric signal according to the third matrix under the condition that the iteration times meet the preset iteration times.
In yet another aspect, embodiments of the present disclosure provide a device for characterizing a biopolymer, the device including: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, performs the method of characterizing a biopolymer as in any of the embodiments of an aspect.
In yet another aspect, embodiments of the present disclosure provide a computer storage medium having computer program instructions stored thereon, which when executed by a processor, implement a method for characterizing a biopolymer as in any of the embodiments of an aspect.
According to the characterization method, the characterization device, the characterization equipment and the computer storage medium of the biopolymer, local feature extraction and time sequence feature extraction can be performed on a first electric signal of a target biopolymer through a preset neural network model, and combined feature information of relevant local features and time sequence features is obtained; and characterizing the target biopolymer according to the combined characteristic information. Therefore, in the characterization process of the biopolymer, the combined characteristic information which gives consideration to the local characteristic and the time sequence characteristic can be obtained through the preset neural network model and is used for characterizing the structural unit sequence or the modification information of the target biopolymer, so that the characterization result is more accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments of the present disclosure will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method for characterization of a biopolymer provided by one embodiment of the present disclosure;
FIG. 2 is a schematic flow diagram of a method of characterization of a biopolymer in one particular example of the present disclosure;
FIG. 3 is a schematic diagram of a neural network model in another embodiment of the present disclosure;
FIG. 4 is a schematic diagram of local feature extraction in one example of the present disclosure;
FIG. 5 is a schematic diagram of timing feature extraction in another example of the present disclosure; wherein, A is a structural schematic diagram of a neuron in the time sequence feature extraction module;
FIG. 6 is a schematic diagram of multiple-use combined feature extraction in yet another example of the present disclosure;
FIG. 7 is a schematic flow diagram of a method of characterization of a biopolymer in yet another example of the present disclosure;
FIG. 8 is a schematic structural diagram of a biopolymer characterization device provided by another embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a biopolymer characterization device according to yet another embodiment of the present disclosure.
Detailed Description
Features and exemplary embodiments of various aspects of the present disclosure will be described in detail below, and in order to make objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting of the disclosure. It will be apparent to one skilled in the art that the present disclosure may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present disclosure by illustrating examples of the present disclosure.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The biological polymer characterization is to utilize the biological polymer to characterize the membrane array nano-pore or similar pore in the device, the biological polymer generates different impedance in the process of passing through the pore due to the voltage difference between two ends of the membrane, thereby the current intensity passing through the pore is influenced transiently, and finally the internal structure composition of the corresponding biological polymer can be identified by detecting the change of the current signal with time.
In the process that the biological polymer passes through the nanometer hole or the similar hole, the speed of the biological polymer via hole is different, so that the step duration of the electric signal of the biological polymer of the via hole is inconsistent. When the electric signals collected by the biopolymer characterization equipment with fixed sampling rate are input into the convolutional Neural network CNN (convolutional Neural networks) for identification, because the size of the convolutional kernel in the CNN network is fixed, in the process of calculating by utilizing the inner product of the convolutional kernel and the input electric signal data, the accuracy of the characteristic information extracted by the CNN can be reduced by the electric signals with inconsistent step durations.
And the electric signals acquired by the biopolymer characterization equipment are obtained when the biopolymers sequentially pass through the holes, but when the CNN network is subjected to convolution processing, only local information in the electric signals can be acquired, and time sequence information among the biopolymers cannot be acquired, so that the accuracy of the characterization result is influenced.
In order to solve the problems of the prior art, embodiments of the present disclosure provide a method, an apparatus, a device, and a computer storage medium for characterization of a biopolymer. The following first presents a method for characterization of biopolymers provided in the examples of the present disclosure.
Fig. 1 shows a schematic flow diagram of a method for characterization of a biopolymer provided by one embodiment of the present disclosure. As shown in fig. 1, the method includes steps S101 to S104:
s101: a first electrical signal is acquired as the target biopolymer passes through the array of wells, the first electrical signal being a data set comprising a plurality of electrical parameter values.
The target biopolymer may be any one of a polynucleotide, a polypeptide, a polysaccharide and a lipid, wherein the polynucleotide includes DNA (deoxyribo nucleic Acid) and/or RNA (Ribonucleic Acid).
The target biopolymer comprises a plurality of structural units, and the plurality of structural units of the target biopolymer sequentially pass through the holes in the hole array to obtain a first electric signal comprising an electric parameter value corresponding to each structural unit. For example, a single-stranded DNA molecule after unwinding includes a plurality of bases, and the electrical parameter value of each base is acquired when the single-stranded DNA passes through the hole.
The array of wells may be an array of nanopores or a similar array of wells.
The electrical parameter value may be a current value, or may be other electrical signal values, and the embodiment of the present application is not limited.
The target biopolymer passes through the nanopore or the similar pore array at a certain speed through a preset speed control mechanism, and a first electric signal of the target biopolymer is obtained under the action of voltage at two ends of the nanopore or the similar pore array. For example, when each structural unit on the target biopolymer passes through the hole, different current values are generated, and after obtaining these current values, a current value data set is formed as the first electric signal.
S102: and arranging the electric signal data in the electric signal data set according to a time sequence order to generate a first matrix.
The electrical signal data in the electrical signal data set may be raw data or preprocessed data.
The preprocessing may be signal normalization or signal segmentation, and this embodiment is not limited.
Arranging the original electric signal data or the preprocessed electric signal data according to a time sequence order to generate a first matrix.
S103: inputting the first matrix into a preset neural network model, performing local feature extraction and time sequence feature extraction, and obtaining combined feature information associated with the local features and the time sequence features; the neural network model comprises at least two combined feature extraction networks, and the combined feature extraction networks comprise a local feature extraction network and a time sequence feature extraction network which are sequentially connected.
The preset neural network model comprises a plurality of (at least two) combined feature extraction networks, wherein when the plurality of combined feature extraction networks are included, the plurality of combined feature extraction networks are connected in sequence (namely, the output of the previous combined feature extraction network can be used as the input of the next combined feature extraction network).
Each combined feature extraction network comprises a local feature extraction network and a time sequence feature extraction network which are connected in sequence, and the output of the local feature extraction network can be used as the input of the time sequence feature extraction network. The local feature extraction network can adopt a convolutional neural network, and the time sequence feature extraction network can adopt a recurrent neural network or a Transformer network.
And after the first electric signal passes through the combined feature extraction network, combined feature information of the associated local features and the time sequence features is obtained and is used for subsequent characterization of the target biopolymer.
S104: characterizing the target biopolymer according to the combinatorial characterization information.
Characterization of the target biopolymer includes, among other things, sequence recognition (i.e., sequencing) of structural units in the biopolymer and/or recognition of modification information in the biopolymer, where the polymer modification information can be polymer methylation.
According to the method provided by the embodiment, the local feature extraction and the time sequence feature extraction can be performed on the first electric signal of the target biopolymer through the preset neural network model, so that the combined feature information of the associated local feature and time sequence feature is obtained; and characterizing the biopolymer based on the combined characteristic information. Therefore, in the characterization process of the biopolymer, the combined characteristic information which gives consideration to the local characteristic and the time sequence characteristic can be obtained through the preset neural network model and is used for characterizing the structural sequence of the target biopolymer, so that the characterization result of the biopolymer is more accurate.
Illustratively, in the process of acquiring the first electrical signal in step S101, under the action of the speed control mechanism, when the target biopolymer passes through the nanopore array or the similar pore array, impedances of different magnitudes are generated, and under the condition that the voltage at two ends of the nanopore is not changed, currents of different magnitudes are generated, so as to form the first electrical signal including a plurality of current values.
Wherein, before the first electric signal including a plurality of current values is formed, the current values generated by the obtained target biopolymer can be screened according to the judgment condition of the effective signal, and the obviously invalid current values are eliminated.
The valid signal determination condition includes, but is not limited to: and (4) discarding abnormal current values caused by the conditions of hole plugging of the nano-holes or similar holes, electric leakage and the like, wherein the abnormal current values are values exceeding a preset threshold range.
The screened effective current value can be preprocessed to obtain higher-quality electric signal data.
The preprocessing may include normalization, segmentation, and the like.
The normalization process scales the electrical signal data to fall within a small specified interval. For example, the original electrical signal data is mapped to the [0,1] interval by linear transformation using dispersion normalization, i.e., N (0, 1) normalization; a normalization method, such as med (median) -mad (median absolute difference) normalization, may also be used, and this embodiment is not limited.
The segmentation process is performed to segment the whole electrical signal data, because the obtained sequencing file (i.e. electrical signal data) may contain data of a plurality of samples in the sequencing of the structural unit of the biopolymer. In this example, the cutting process may be implemented by a script file to cut the opening current, interrupt reads, and the like, which is not limited in this embodiment.
The preprocessed current value data set may be used as the first electric signal.
For example, after the first electrical signal is acquired, in step S102, a plurality of current values included in the first electrical signal are arranged in a time-series order, and a first matrix is generated.
In this example, the first matrix may be a one-dimensional matrix.
In other examples, the first matrix may also be a two-dimensional matrix, for example, a differential two-dimensional matrix obtained by adding differential signals, etc. The current values in the two-dimensional matrix are also arranged in chronological order.
The generated first matrix is arranged by current values according to a time sequence order, and time sequence information can be reserved to be extracted when subsequent feature extraction is carried out.
For example, as shown in fig. 2, when performing the feature extraction in step S103, inputting the first matrix into a preset neural network model, performing the local feature extraction and the time series feature extraction, and obtaining the combined feature information associated with the local feature and the time series feature, specifically, the method may include:
s201: and inputting the first matrix into a local feature extraction network in the combined feature extraction network, performing convolution processing, and obtaining a second matrix containing the local feature information of the first electric signal.
In this example, the preset neural network model may refer to the schematic structural diagram shown in fig. 3.
As shown in fig. 3, the combined feature extraction network 301 in the preset neural network model 300 includes a local feature extraction network 302 and a time series feature extraction network 303 connected in sequence, and output data of the local feature extraction network may be used as input data of the time series feature extraction network.
For example, a single-layer or multi-layer Convolutional Neural Networks (CNN) may be used in the local feature extraction network. The timing feature extraction Network module may employ a Recurrent Neural Network (RNN), and in other examples, the timing feature extraction module may also employ a Neural Network with an attention mechanism (attention mechanism), such as a transform Network, or employ other Neural networks with a timing feature extraction capability, which is not limited in the embodiment of the present disclosure.
In this example, referring to fig. 4, the local feature extraction network is a CNN network, which includes a convolutional layer 401 and a pooling layer 402. When step S201 is executed, the first matrix 403 is input to the one-dimensional convolutional layer 401 of the CNN network, inner product calculation is performed in the convolutional layer 401 by certain convolutional kernel window translation, and a result of the inner product calculation is input to a preset activation function to perform nonlinear processing. The activation function may be, but is not limited to, a Linear rectification function (ReLU).
After redundant information is removed from the calculation result processed by the activation function through the pooling layer 402, a second matrix 404 containing local characteristic information of the first electrical signal is output.
Optionally, the calculation result processed by the activation function may also be a stride (convolution step) set in the convolution layer 401 to remove redundant information, and output a second matrix including local characteristic information of the first electrical signal.
In this example, the second matrix 404 resulting from the convolution processing of the first matrix is also a one-dimensional matrix.
In the CNN network, in the process of extracting the local features in the data set by using a certain convolution kernel sliding window, since the current values in the first matrix 403 are sorted according to the time sequence order, in the second matrix 404 obtained after the convolution kernel convolves the first matrix 403 with a fixed step length, the time sequence information is retained in addition to the local feature information of the first electrical signal, and the time sequence features can be extracted through step S202.
S202: and inputting the data contained in the second matrix into the corresponding time sequence characteristic extraction network according to the time sequence order, and obtaining a third matrix containing the time sequence characteristic information of the first electric signal.
In this step, referring to fig. 5, the time sequence feature extraction network specifically adopts an LSTM (Long Short-Term Memory) neural network 500, where the LSTM neural network 500 includes a plurality of neurons 501 connected in sequence, and data included in the second matrix is respectively input to the neurons 501 in the LSTM neural network according to a time sequence order to perform time sequence feature extraction.
As shown in a in fig. 5, a neuron 501 of the LSTM neural network 500 includes a forgetting gate 502, an input gate 203, and an output gate 504. The structure of a single neuron 501 in the LSTM neural network 500 shown in block a in fig. 5 is taken as an example for illustration:
the forgetting gate (forget gate)502 can be represented by the following formula (1):
ft=σ(Wf.[ht-1,Xt]+bf) (1)
in formula (1): f. oftForgetting to control the door, WfIs the weight matrix of the forgetting gate, ht-1Is the output vector of the neuron at time t-1, XtIs input data of a neuron at time t, bfIs the bias term for the forgetting gate, σ is the sigmoid function.
The calculation of the input gate (input gate)503 may include the following equations (2) and (3):
it=σ(Wi.[ht-1,Xt]+bi) (2)
Figure BDA0003254036340000101
in formulae (2) and (3): i.e. itIs the gate control of the input gate,
Figure BDA0003254036340000102
representing the state vector, W, input at time tiWc are all weight matrices of the input gates, ht-1Is the output vector of the neuron at time t-1, XtIs the input data of the neuron at time t; biBc is the bias term for the input gate and σ is the sigmoid function.
The output gate 504 may include the following equations (4) to (6):
Figure BDA0003254036340000103
Ot=(WO·[ht-1,Xt]+bo) (5)
ht=Ot*tanh(Ct) (6)
in formulae (4) to (6): ct,Ct-1Neuron state vectors at times t and t-1, respectively, OtIs output gate control, WoIs a weight matrix of output gates, ht-1Is the output vector of the neuron at time t-1, XtIs input data of a neuron at time t, boIs the offset term of the output gate, σ is the sigmoid function, htIs the output vector of the neuron at time t, tanh is the activation function.
Referring to fig. 5, the second matrix obtained in step S201 includes data { x }1,……xt-1,xt,xt+1Inputting the data in the second matrix into the corresponding neurons of the LSTM neural network according to the time sequence order, and passing through an input gate, a forgetting gate and an output gate { f ] in the neurons at respective momentst,it,Ct,OtCarries out calculation and outputs data h1,......ht-1,ht,ht+1And according to output data { h }1,......ht-1,ht,ht+1Generate a third matrix.
In the process of extracting the time sequence characteristics through the LSTM neural network, the forgetting gate can control the long-term state of continuously stored data, the input gate can update the new instant state to the long-term state, and the output gate can control which part of information is finally output, so that the LSTM neural network outputs the long-term memory of related information through selective memory and forgetting of previous information, and the time sequence characteristics are extracted.
S203: and under the condition that the iteration times do not meet the preset iteration times, updating the third matrix into the first matrix, and iteratively executing the local feature extraction network for inputting the first matrix into the combined feature extraction network until the iteration times meet the preset iteration times or the result meets the preset requirement.
Due to the complexity of the step duration of the electric signal corresponding to the biopolymer structure unit sequence and the limitation of the transmission of the time sequence information of the LSTM neural network, certain defects exist in the features extracted by the LSTM network. In order to avoid the lack of feature extraction from affecting the accuracy of the final sequencing result, multiple mixed feature extractions are performed by performing the above step S203 to obtain richer feature information.
In this example, step S203 may specifically include:
s2031: and under the condition that the iteration times do not meet the preset iteration times, updating the third matrix into the first matrix, and iteratively inputting the first matrix into a local feature extraction network in the combined feature extraction network to carry out convolution processing.
Referring to fig. 6, in the case that the iteration count does not satisfy the preset iteration count, the third matrix is updated to the first matrix, and the first matrix is input into the combined feature extraction network 601, sequentially passes through the local feature extraction network 602, and step S201 is executed to perform convolution processing, and extract local feature information again.
S2032: the data (also, matrix) output in step S2031 is input to the time series feature extraction network 603, and step S202 is executed to extract time series features.
In an actual application scenario, the above steps S2031 to S2032 may be executed cyclically by one combined feature extraction network 601 included in the neural network model 600, or the above steps S2031 to S2032 may be executed once in each combined feature extraction network 601 by providing a plurality of sequentially connected combined feature extraction networks 601 in the neural network model 600. And extracting the combined features until the target times is reached.
S2033: and stopping the steps S2031 to S2032 until the iteration times meet the preset iteration times.
After multiple times of returning, combining and extracting, more accurate and rich characteristic information can be obtained.
S204: and under the condition that the iteration times meet the preset iteration times, outputting the combined characteristic information of the first electric signal according to the third matrix.
And under the condition that the iteration times meet the preset iteration times, determining the finally obtained third matrix as the combined characteristic information of the first electric signal, and outputting the combined characteristic information.
The method adopts a local feature extraction network and a time sequence feature extraction network which are mixed for multiple times to extract the electric signal features of the structural unit sequence in the target biopolymer, can make up the defect of limited relevance of the time sequence feature extraction network to the time sequence features by utilizing the capability of extracting network-related local information of the local features while the time sequence feature extraction network acquires the electric signal time sequence features, improves the extraction capability of the time sequence extraction network to the electric signal features, and is favorable for acquiring a biopolymer sequencing result with higher accuracy.
After the combination feature information is obtained through the steps S201 to S204, the combination feature information is identified through the step S104 to characterize the sequence and/or modification information of the target biopolymer.
Optionally, in step S104, the combined feature information may be analyzed through a preset machine learning model, so as to obtain a characterization result of the target biopolymer.
For example, the preset machine learning model may employ a conditional random field model or a hidden markov HMM model.
In step S104, the combined feature information is decoded by using a preset machine learning model, and the decoding method may include: viterbi algorithm decoding or beam search etc. That is, the probability distribution of each electric signal data in the combined characteristic information is solved, and corresponding structural units are determined, so that the characterization result of the biopolymer is obtained. In the process of identifying the polymer modification information, the size of the probability matrix in the neural network can be revised according to the structure type of the modification information.
Optionally, in step S104, the obtained combined feature information may be further transmitted to an MLP (multi layer per Perceptron) layer or a CNN layer, and converted into an output result equal to the number of output data, and the output result of the MLP layer or the CNN layer is converted into a final result capable of explaining the probability distribution on the output data through a softmax layer (i.e., a classifier), and used as the characterization result of the target biopolymer.
According to the method disclosed by the embodiment of the disclosure, the local feature extraction and the time sequence feature extraction can be performed on the first electric signal of the structural unit sequence in the target biopolymer through a preset neural network model, so as to obtain the combined feature information of the associated local feature and time sequence feature; and characterizing the structural unit sequence or modification information of the target biopolymer according to the combined characteristic information. Therefore, in the characterization process of the biopolymer, the combined characteristic information which gives consideration to the local characteristic and the time sequence characteristic can be obtained through the preset neural network model and is used for characterizing the structural information of the target biopolymer, so that the characterization result is more accurate.
In a specific embodiment, before acquiring the first electrical signal, the method further includes a step of training a neural network model, specifically including:
s701: obtaining test sample data;
in this step, the test sample data includes combined characteristic information of the second electrical signal, the second electrical signal is acquired when the biopolymer structure unit sequence passes through the hole array, and the combined characteristic information of the second electrical signal includes a fifth matrix associated with local characteristic information and timing characteristic information of the second electrical signal.
S702: and training the local characteristic extraction network and the time sequence characteristic extraction network in sequence according to the fifth matrix, and obtaining a neural network model.
And inputting the fifth matrix as label data into a CNN network of the local characteristic extraction network module and an LSTM network or a Transformer network of the time sequence characteristic extraction network module, and training to obtain a neural network model.
It is understood that training a neural network using test sample data is a well-established technique in the art and will not be described in detail herein.
Fig. 8 shows a schematic structural diagram of a characterization device for a biopolymer provided by an embodiment of the present disclosure. As shown in fig. 8, the apparatus includes:
an acquiring module 801 for acquiring a first electrical signal of a target biopolymer as it passes through an array of wells, the first electrical signal being a data set comprising a plurality of electrical parameter values;
a generating module 802, configured to arrange electrical parameter values in the first electrical signal according to a time sequence order to generate a first matrix;
the extraction module 803 is configured to input the first matrix to a preset neural network model, perform local feature extraction and time sequence feature extraction, and obtain combined feature information associated with the local feature and the time sequence feature; the neural network model comprises at least two combined feature extraction networks, and the combined feature extraction comprises a local feature extraction network and a time sequence feature extraction network which are sequentially connected;
a characterization module 804 for characterizing the target biopolymer according to the combined feature information.
For example, the obtaining module 801 may perform the step S101 shown in fig. 1, the generating module 802 may perform the step S102 shown in fig. 1, the extracting module 803 may perform the step S103 shown in fig. 1, and the characterizing module 804 may perform the step S104 shown in fig. 1.
Optionally, the extracting module 803 specifically includes:
the local feature extraction submodule 8031 is configured to input the first matrix to a local feature extraction network in the combined feature extraction network, and perform convolution processing to obtain a second matrix including the local feature information of the first electrical signal;
the timing characteristic extraction submodule 8032 is configured to input data included in the second matrix to the timing characteristic extraction network according to a timing sequence, so as to obtain a third matrix including the timing characteristic information of the first electrical signal;
an iteration submodule 8033, configured to update the third matrix to the first matrix when the iteration number does not meet the preset iteration number, and perform iteration on the local feature extraction network in which the first matrix is input to the combined feature extraction network until the iteration number meets the preset iteration number;
the output sub-module 8034 is configured to, when the iteration number satisfies a preset iteration number, output the combination characteristic information of the first electrical signal according to the third matrix.
For example, the local feature extraction sub-module 8031 may perform step S2031 in the above embodiment, the time-series feature extraction sub-module 8032 may perform step S2032 in the above embodiment, the iteration sub-module 8033 may perform step S2033 in the above embodiment, and the output sub-module 8034 may perform step S2034 in the above embodiment.
It should be noted that all relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and the corresponding technical effect can be achieved, and for brevity, no further description is provided herein.
Fig. 9 shows a hardware structure diagram of a characterization device for a biopolymer provided by an embodiment of the present disclosure.
The characterization device for the biopolymer may comprise a processor 901 and a memory 902 in which computer program instructions are stored.
In particular, the processor 301 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present disclosure.
Memory 902 may include mass storage for data or instructions. By way of example, and not limitation, memory 902 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 902 may include removable or non-removable (or fixed) media, where appropriate. The memory 902 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 902 is a non-volatile solid-state memory.
In general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., a memory device) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the method according to an aspect of the disclosure.
The processor 901 realizes the method of characterizing a biopolymer in any of the above embodiments by reading and executing computer program instructions stored in the memory 902.
In one example, the biopolymer characterization device may also include a communication interface 903 and a bus 910. As shown in fig. 9, the processor 901, the memory 902, and the communication interface 903 are connected via a bus 910 to complete communication with each other.
The communication interface 903 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present disclosure.
Bus 910 includes hardware, software, or both to couple the components of the characterization device of the biopolymer to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 910 can include one or more buses, where appropriate. Although this disclosed embodiment describes and illustrates a particular bus, this disclosure contemplates any suitable bus or interconnect.
In addition, in combination with the characterization method of the biopolymer in the above embodiments, the embodiments of the present disclosure may be implemented by providing a computer storage medium. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement a method of characterizing a biopolymer as in any of the above embodiments.
It is to be understood that this disclosure is not limited to the particular configurations and processes described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present disclosure are not limited to the specific steps described and illustrated, and those skilled in the art may make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present disclosure.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present disclosure are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present disclosure is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed several steps at the same time.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present disclosure are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the present disclosure, and these modifications or substitutions should be covered within the scope of the present disclosure.

Claims (10)

1. A method of characterizing a biopolymer, comprising:
obtaining a first electrical signal as the target biopolymer passes through the array of wells, the first electrical signal being a data set comprising a plurality of electrical parameter values;
arranging the electrical parameter values in the first electrical signals according to a time sequence order to generate a first matrix;
inputting the first matrix into a preset neural network model, and performing local feature extraction and time sequence feature extraction to obtain combined feature information associated with local features and time sequence features; the neural network model comprises at least two combined feature extraction networks, wherein each combined feature extraction network comprises a local feature extraction network and a time sequence feature extraction network which are sequentially connected;
characterizing the target biopolymer according to the combined feature information.
2. The method according to claim 1, wherein the inputting the first matrix into a preset neural network model, performing local feature extraction and time series feature extraction, and obtaining combined feature information associated with local features and time series features comprises:
inputting the first matrix into a local feature extraction network in the combined feature extraction network, and performing convolution processing to obtain a second matrix containing the local feature information of the first electric signal;
inputting the data contained in the second matrix into a time sequence characteristic extraction network according to the time sequence to obtain a third matrix containing the time sequence characteristic information of the first electric signal;
under the condition that the iteration times do not meet the preset iteration times, updating the third matrix into the first matrix, and iteratively executing the local feature extraction network which inputs the first matrix into the combined feature extraction network until the iteration times meet the preset iteration times;
and under the condition that the iteration times meet the preset iteration times, outputting the combined characteristic information of the first electric signal according to the third matrix.
3. The method of claim 2, wherein outputting the combined signature information of the first electrical signal according to the third matrix comprises:
determining the third matrix as the combined characteristic information of the first electrical signal;
and outputting the combined characteristic information.
4. The method of claim 2, wherein characterizing the target biopolymer columns based on the combined feature information comprises:
and analyzing the combined characteristic information through a preset machine learning model to obtain the recognition result of the structural unit in the target biopolymer and/or the recognition result of the modification information in the target biopolymer.
5. The method of claim 4, wherein the predetermined machine learning model is a conditional random field model or a hidden Markov HMM model.
6. The method of claim 1, wherein prior to said obtaining the first electrical signal as the target biopolymer passes through the array of wells, the method further comprises
Obtaining test sample data; the test sample data comprises combined characteristic information of a second electrical signal corresponding to a biopolymer structural unit sequence, wherein the combined characteristic information of the second electrical signal comprises a fifth matrix associated with local characteristic information and time sequence characteristic information of the second electrical signal;
and training the local feature extraction network and the time sequence feature extraction network according to the fifth matrix to obtain the neural network model.
7. A biopolymer characterization device, the device comprising:
an acquisition module for acquiring a first electrical signal of a target biopolymer as it passes through the array of wells, the first electrical signal being a data set comprising a plurality of electrical parameter values;
the generating module is used for arranging the electrical parameter values in the first electrical signal according to a time sequence order to generate a first matrix;
the extraction module is used for inputting the first matrix into a preset neural network model, performing local feature extraction and time sequence feature extraction and obtaining combined feature information associated with the local features and the time sequence features; the neural network model comprises at least two combined feature extraction networks, wherein each combined feature extraction network comprises a local feature extraction network and a time sequence feature extraction network which are sequentially connected;
and the characterization module is used for characterizing the target biopolymer according to the combined characteristic information.
8. The apparatus according to claim 7, wherein the extraction module specifically comprises:
the local feature extraction submodule is used for inputting the first matrix into a local feature extraction network in the combined feature extraction network and performing convolution processing to obtain a second matrix containing the local feature information of the first electric signal;
the time sequence characteristic extraction submodule is used for inputting the data contained in the second matrix into a time sequence characteristic extraction network according to the time sequence to obtain a third matrix containing the time sequence characteristic information of the first electric signal;
the iteration submodule is used for updating the third matrix into the first matrix under the condition that the iteration times do not meet the preset iteration times, and iteratively executing the local feature extraction network which inputs the first matrix into the combined feature extraction network until the iteration times meet the preset iteration times;
and the output submodule is used for outputting the combined characteristic information of the first electric signal according to the third matrix under the condition that the iteration times meet the preset iteration times.
9. A biopolymer characterization device, the device comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a method of characterizing a biopolymer according to any one of claims 1-6.
10. A computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method of characterizing a biopolymer according to any one of claims 1-6.
CN202111054279.8A 2021-09-09 2021-09-09 Method, device, equipment and computer storage medium for characterizing biopolymer Active CN113837036B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111054279.8A CN113837036B (en) 2021-09-09 Method, device, equipment and computer storage medium for characterizing biopolymer
PCT/CN2022/104435 WO2023035757A1 (en) 2021-09-09 2022-07-07 Biopolymer characterization method, apparatus, and device, and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111054279.8A CN113837036B (en) 2021-09-09 Method, device, equipment and computer storage medium for characterizing biopolymer

Publications (2)

Publication Number Publication Date
CN113837036A true CN113837036A (en) 2021-12-24
CN113837036B CN113837036B (en) 2024-08-02

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023035757A1 (en) * 2021-09-09 2023-03-16 成都齐碳科技有限公司 Biopolymer characterization method, apparatus, and device, and computer storage medium
CN117423423A (en) * 2023-12-18 2024-01-19 四川互慧软件有限公司 Health record integration method, equipment and medium based on convolutional neural network

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012021149A1 (en) * 2010-08-12 2012-02-16 Winters-Hilt Stephen N Methods and systems for nanopore biosensing
CN104321441A (en) * 2012-02-16 2015-01-28 牛津楠路珀尔科技有限公司 Analysis of measurements of a polymer
CN106844701A (en) * 2017-01-03 2017-06-13 宁波亿拍客网络科技有限公司 A kind of specific markers and application method that identification is perceived based on computer vision
CN107679466A (en) * 2017-09-21 2018-02-09 百度在线网络技术(北京)有限公司 Information output method and device
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
US20190303535A1 (en) * 2018-04-03 2019-10-03 International Business Machines Corporation Interpretable bio-medical link prediction using deep neural representation
CN110706738A (en) * 2019-10-30 2020-01-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for predicting structure information of protein
CN111243674A (en) * 2020-01-08 2020-06-05 华南理工大学 Method, device and storage medium for identifying base sequence
CN111462822A (en) * 2020-04-29 2020-07-28 北京晶派科技有限公司 Method and device for generating protein sequence characteristics and computing equipment
CN112069883A (en) * 2020-07-28 2020-12-11 浙江工业大学 Deep learning signal classification method fusing one-dimensional and two-dimensional convolutional neural network
CN112183486A (en) * 2020-11-02 2021-01-05 中山大学 Method for rapidly identifying single-molecule nanopore sequencing base based on deep network
CN113168890A (en) * 2018-12-10 2021-07-23 生命科技股份有限公司 Deep base recognizer for Sanger sequencing

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012021149A1 (en) * 2010-08-12 2012-02-16 Winters-Hilt Stephen N Methods and systems for nanopore biosensing
CN104321441A (en) * 2012-02-16 2015-01-28 牛津楠路珀尔科技有限公司 Analysis of measurements of a polymer
CN106844701A (en) * 2017-01-03 2017-06-13 宁波亿拍客网络科技有限公司 A kind of specific markers and application method that identification is perceived based on computer vision
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
CN107679466A (en) * 2017-09-21 2018-02-09 百度在线网络技术(北京)有限公司 Information output method and device
US20190303535A1 (en) * 2018-04-03 2019-10-03 International Business Machines Corporation Interpretable bio-medical link prediction using deep neural representation
CN113168890A (en) * 2018-12-10 2021-07-23 生命科技股份有限公司 Deep base recognizer for Sanger sequencing
CN110706738A (en) * 2019-10-30 2020-01-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for predicting structure information of protein
CN111243674A (en) * 2020-01-08 2020-06-05 华南理工大学 Method, device and storage medium for identifying base sequence
CN111462822A (en) * 2020-04-29 2020-07-28 北京晶派科技有限公司 Method and device for generating protein sequence characteristics and computing equipment
CN112069883A (en) * 2020-07-28 2020-12-11 浙江工业大学 Deep learning signal classification method fusing one-dimensional and two-dimensional convolutional neural network
CN112183486A (en) * 2020-11-02 2021-01-05 中山大学 Method for rapidly identifying single-molecule nanopore sequencing base based on deep network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HAOYANG YU 等: "Deep Reinforcement Learning for Protein Folding in the Hydrophobic-Polar Model with Pull Moves", 《GITHUB》, 31 December 2020 (2020-12-31), pages 1 - 9 *
何晓旭: "时间序列数据挖掘若干关键问题研究", 《中国博士学位论文全文数据库基础科学辑》, no. 10, 15 October 2014 (2014-10-15), pages 002 - 66 *
周暄焯: "基于RNN及其融合方法的DNA甲基化预测模型研究", 《中国优秀硕士学位论文全文数据库基础科学辑》, no. 07, 15 July 2020 (2020-07-15), pages 006 - 140 *
巨荣辉: "基于深度学习和医疗数据的疾病提前诊断和风险预测方法研究", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》, no. 06, 15 June 2019 (2019-06-15), pages 060 - 1 *
李薇: "基于心电信号动态时间序列分析的身份信息识别", 《科研信息化技术与应用》, vol. 5, no. 3, 31 May 2014 (2014-05-31), pages 19 - 19 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023035757A1 (en) * 2021-09-09 2023-03-16 成都齐碳科技有限公司 Biopolymer characterization method, apparatus, and device, and computer storage medium
CN117423423A (en) * 2023-12-18 2024-01-19 四川互慧软件有限公司 Health record integration method, equipment and medium based on convolutional neural network
CN117423423B (en) * 2023-12-18 2024-02-13 四川互慧软件有限公司 Health record integration method, equipment and medium based on convolutional neural network

Also Published As

Publication number Publication date
WO2023035757A1 (en) 2023-03-16

Similar Documents

Publication Publication Date Title
WO2023035757A1 (en) Biopolymer characterization method, apparatus, and device, and computer storage medium
CN109048492B (en) Tool wear state detection method, device and equipment based on convolutional neural network
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN110245685B (en) Method, system and storage medium for predicting pathogenicity of genome single-site variation
CN111368920B (en) Quantum twin neural network-based classification method and face recognition method thereof
CN110363220B (en) Behavior class detection method and device, electronic equipment and computer readable medium
CN111192631A (en) Method and system for constructing model for predicting protein-RNA interaction binding site
CN114862838A (en) Unsupervised learning-based defect detection method and equipment
CN111507155A (en) U-Net + + and UDA combined microseism effective signal first-arrival pickup method and device
CN114139624A (en) Method for mining time series data similarity information based on integrated model
CN115184054B (en) Mechanical equipment semi-supervised fault detection and analysis method, device, terminal and medium
CN105184286A (en) Vehicle detection method and detection device
CN111863151A (en) Prediction method of polymer molecular weight distribution based on Gaussian process regression
CN113838524B (en) S-nitrosylation site prediction method, model training method and storage medium
Qu et al. Open-set gas recognition: A case-study based on an electronic nose dataset
CN117315263B (en) Target contour device, training method, segmentation method, electronic equipment and storage medium
CN113837036B (en) Method, device, equipment and computer storage medium for characterizing biopolymer
CN105006231A (en) Distributed large population speaker recognition method based on fuzzy clustering decision tree
CN112329810A (en) Image recognition model training method and device based on saliency detection
CN114358096B (en) Deep learning Morse code identification method and device based on step-by-step threshold judgment
CN116361454A (en) Automatic course teaching case assessment method based on Bloom classification method
CN114841216A (en) Electroencephalogram signal classification method based on model uncertainty learning
CN114913921A (en) System and method for identifying marker gene
CN113297376A (en) Legal case risk point identification method and system based on meta-learning
CN113064497A (en) Statement identification method, device, equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20211224

Assignee: Chengdu Qicarbon Taike Biotechnology Co.,Ltd.

Assignor: CHENGDU QITAN TECHNOLOGY LTD.

Contract record no.: X2023980041554

Denomination of invention: Characterization methods, devices, equipment, and computer storage media for biopolymers

License type: Common License

Record date: 20230912

GR01 Patent grant