CN112288783B - Method for constructing sequencing template based on image, base identification method and device - Google Patents

Method for constructing sequencing template based on image, base identification method and device Download PDF

Info

Publication number
CN112288783B
CN112288783B CN201810961277.9A CN201810961277A CN112288783B CN 112288783 B CN112288783 B CN 112288783B CN 201810961277 A CN201810961277 A CN 201810961277A CN 112288783 B CN112288783 B CN 112288783B
Authority
CN
China
Prior art keywords
image
registered
pixel
images
bright spots
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810961277.9A
Other languages
Chinese (zh)
Other versions
CN112288783A (en
Inventor
李林森
徐伟彬
金欢
姜泽飞
周志良
颜钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genemind Biosciences Co Ltd
Original Assignee
Genemind Biosciences Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genemind Biosciences Co Ltd filed Critical Genemind Biosciences Co Ltd
Priority to CN201810961277.9A priority Critical patent/CN112288783B/en
Publication of CN112288783A publication Critical patent/CN112288783A/en
Application granted granted Critical
Publication of CN112288783B publication Critical patent/CN112288783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/32Determination of transform parameters for the alignment of images, i.e. image registration using correlation-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The invention discloses a method and a device for constructing a sequencing template based on an image. The so-called images comprise first, second, third and fourth images corresponding to a same field of view of four base extension reactions A, T/U, G and C, respectively, the first, second, third and fourth images comprising images M1 and M2, images N1 and N2, images P1 and P2, and images Q1 and Q2, respectively, the method of constructing a sequencing template comprising: combining any two images of the images M1, M2, N1, N2, P1, P2, Q1 and Q2 for patch matching, and causing the images M1, N1, N2, P1, P2, Q1 and Q2 to each participate in the combination at least once, obtaining a plurality of combined images containing first coincident patches, two or more patches on the combined images at a distance smaller than a first predetermined pixel being one first coincident patch; and combining the first overlapped bright spots on the plurality of combined images to obtain a bright spot set corresponding to the sequencing template. The method can effectively obtain the light spot set of the corresponding nucleic acid template.

Description

Method for constructing sequencing template based on image, base identification method and device
Technical Field
The present invention relates to the field of image processing and information recognition, and in particular, to a method for constructing a sequencing template based on an image, a base recognition method, an apparatus for constructing a sequencing template based on an image, a base recognition apparatus, and a computer product.
Background
In the related art, including a sequencing platform that performs image acquisition of a nucleic acid molecule (template) in a biochemical reaction multiple times based on an imaging system to determine the nucleotide sequence of the nucleic acid molecule, how to process and correlate the images acquired at the multiple different time points, including information on the images, to efficiently and accurately obtain the nucleotide composition and sequence of at least a portion of the nucleic acid template is a matter of concern.
Disclosure of Invention
Embodiments of the present invention are directed to solving at least one of the technical problems occurring in the related art or at least providing an alternative practical solution.
According to one embodiment of the present invention, there is provided a method for constructing a sequencing template based on images, said images comprising a first image, a second image, a third image and a fourth image corresponding to a same field of view of A, T/U, G and C four base extension reactions, respectively, wherein the field of view of the base extension reactions contains a plurality of nucleic acid molecules with optically detectable labels, at least a portion of the nucleic acid molecules appear as bright spots on the images defining the sequence or simultaneously performing four types of base extension reactions in a single round of sequencing reactions, the first image comprises image M1 and image M2, the second image comprises image N1 and image N2, the third image comprises image P1 and image P2, the fourth image comprises image Q1 and image Q2, image M1 and image M2 are from two rounds of sequencing reactions, image N1 and image N2 are from two rounds of sequencing reactions, image P1 and image P2 were from two rounds of sequencing reactions, respectively, and image Q1 and image Q2 were from two rounds of sequencing reactions, respectively, the method comprising: combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 for speckle matching, and causing the image M1, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 to participate in the combining at least once, obtaining a plurality of combined images containing first coincident speckle, two or more speckle having a distance smaller than a first predetermined pixel on the combined image being one first coincident speckle; and combining the first coincident bright spots on the plurality of combined images to obtain a set of bright spots corresponding to the sequencing template.
According to an embodiment of the present invention, there is provided an apparatus for image-based construction of a sequencing template, which is used to implement all or part of the steps of the method for image-based construction of a sequencing template in the above-mentioned embodiments of the present invention. The images include a first image, a second image, a third image and a fourth image corresponding to a same visual field of A, T/U, G and C four base extension reactions respectively, wherein the visual field of the base extension reactions contains a plurality of nucleic acid molecules with optically detectable labels, at least a part of the nucleic acid molecules are represented as bright spots on the images, the four types of base extension reactions are performed in a sequence or simultaneously as a round of sequencing reaction, the first image includes an image M1 and an image M2, the second image includes an image N1 and an image N2, the third image includes an image P1 and an image P2, the fourth image includes an image Q1 and an image Q2, the images M1 and M2 are from two rounds of sequencing reactions respectively, the image N1 and the image N2 are from two rounds of sequencing reactions respectively, the images P1 and P2 are from two rounds of sequencing reactions respectively, the images Q1 and Q2 are from two rounds of sequencing reactions respectively, the device includes: a combination unit for combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 for speckle matching, and causing the image M1, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 to each participate in the combination at least once, obtaining a plurality of combined images containing first coincident speckle, two or more speckle having a distance smaller than a first predetermined pixel on the combined images being one first coincident speckle; and the merging unit is used for merging the first overlapped bright spots on the plurality of combined images from the combining unit to obtain a bright spot set corresponding to the sequencing template.
According to an embodiment of the present invention, there is provided a computer-readable storage medium for storing a program for execution by a computer, the execution of the program comprising performing the method for image-based construction of a sequencing template according to any of the above embodiments. Computer-readable storage media include, but are not limited to, read-only memory, random-access memory, magnetic or optical disks, and the like.
There is also provided, in accordance with an embodiment of the present invention, a terminal, a computer product, including instructions, which when executed by a computer, cause the computer to perform the method for constructing a sequencing template based on images in the above-described embodiment of the present invention.
The sequencing template constructed by the method, the device, the computer readable storage medium and/or the computer product based on the image is a bright spot set corresponding to the sequencing template, the bright spot set can effectively, accurately and comprehensively reflect the information of the sequencing template, and is favorable for further accurate base identification (base call), namely, at least one part of nucleotide sequence of the template nucleic acid is accurately identified and obtained.
According to another embodiment of the present invention, there is provided a method of base recognition, the method including matching a spot on an image obtained from a base extension reaction to a set of spots corresponding to a sequencing template, base recognition being performed based on the matched spots, a plurality of nucleic acid molecules with an optically detectable label being present in a field of view corresponding to the image obtained from the base extension reaction, at least a portion of the nucleic acid molecules appearing as spots on the image obtained from the base extension reaction, the set of spots corresponding to the sequencing template being constructed and obtained by the method, apparatus, computer-readable storage medium, and/or computer product for constructing a sequencing template based on the image according to the above-described embodiments of the present invention.
According to an embodiment of the present invention, there is provided a base recognition apparatus for performing the base recognition method according to the above-described embodiment of the present invention, the apparatus being configured to match a spot on an image obtained from a base extension reaction to a set of spots corresponding to a sequencing template, and perform base recognition based on the matched spots, wherein a plurality of nucleic acid molecules with optically detectable labels are present in a field of view corresponding to the image obtained from the base extension reaction, at least a part of the nucleic acid molecules appear as spots on the image obtained from the base extension reaction, and the set of spots corresponding to the sequencing template is constructed by the method and/or apparatus for constructing a sequencing template based on the image according to the above-described embodiment of the present invention.
According to an embodiment of the present invention, there is provided a computer-readable storage medium storing a program for execution by a computer, the execution of the program including performing the base recognition method in any one of the above embodiments. Computer-readable storage media include, but are not limited to, read-only memory, random-access memory, magnetic or optical disks, and the like.
According to an embodiment of the present invention, there is also provided a computer product including instructions for performing base recognition, which, when the program is executed by a computer, cause the computer to perform the method of base recognition in the above-described embodiment of the present invention.
By using the base identification method, the base identification device, the computer-readable storage medium and/or the computer product, the type of the base combined with the template nucleic acid can be identified during the base extension reaction based on the constructed speckle set corresponding to the sequencing template, and the method can be used for realizing accurate determination of the template nucleic acid sequence.
Additional aspects and advantages of embodiments of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the invention.
Drawings
FIG. 1 is a schematic flow chart of a method for image-based construction of a sequencing template in an embodiment of the invention.
Fig. 2 is a schematic diagram of combining and merging images Repeat1, Repeat5, Repeat6 and Repeat7 based on a hot spot to construct a sequencing template in an embodiment of the present invention.
FIG. 3 is a schematic diagram of a rectification process and a rectification result in an embodiment of the present invention.
FIG. 4 is a diagram of a matrix corresponding to candidate hot spots and associated pixels in accordance with an embodiment of the present invention.
Fig. 5 is a schematic diagram of pixel values in a range of m1 × m2 centered on a central pixel point of the pixel point matrix according to the embodiment of the present invention.
Fig. 6 is a schematic diagram illustrating comparison between bright spot detection results before and after the determination according to the second bright spot detection threshold in the embodiment of the present invention.
FIG. 7 is a schematic diagram of an apparatus for constructing a sequencing template based on an image according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, the terms "first", "second", "third", "fourth", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any order or number of indicated technical features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Referring to FIG. 1, the present invention provides a method for constructing a sequencing template based on images, wherein the images are collected from a same field of view, and the method comprises a first image, a second image, a third image and a fourth image respectively collected from A, T/U, G and C four-base extension reactions, wherein a plurality of nucleic acid molecules with optically detectable labels exist in the field of view during the base extension reactions, at least a part of the nucleic acid molecules are shown as bright spots on the images, four types of base extension reactions are performed in one round of sequencing reaction sequentially or simultaneously, the first image comprises an image M1 and an image M2, the second image comprises an image N1 and an image N2, the third image comprises an image P1 and an image P2, the fourth image comprises an image Q1 and an image Q2, the image M1 and the image M2 are respectively from two rounds of sequencing reactions, the image N1 and the image N2 are respectively from two rounds of sequencing reactions, image P1 and image P2 were from two rounds of sequencing reactions, respectively, and image Q1 and image Q2 were from two rounds of sequencing reactions, respectively, the method comprising: s10 combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1, and the image Q2 for speckle matching, and causing the image M1, the image N1, the image N2, the image P1, the image P2, the image Q1, and the image Q2 to participate in the combining at least once, obtaining a plurality of combined images including first coincident speckles, two or more speckles on the combined image at a distance smaller than a first predetermined pixel being one first coincident speckle; and S20 merging the first coincident bright spots on the plurality of combined images to obtain a set of bright spots corresponding to the sequencing template. The term "bright spots" is also referred to as "spots" (spots or peaks), and refers to light-emitting points on an image, where one light-emitting point occupies at least one pixel. So called "pixel point" is the same as "pixel".
According to the method, intersection sets and union sets of the bright spots on the images are taken firstly, and the bright spot sets corresponding to the template nucleic acid molecules can be obtained. The sequencing template obtained by the method is a bright spot set corresponding to the sequencing template, the bright spot set can effectively, accurately and comprehensively reflect the information of the sequencing template, and the obtained bright spot set can be further used for accurate base identification (base call), namely accurately obtaining at least one part of nucleotide sequence of the template nucleic acid.
The one-round sequencing reaction sequentially or simultaneously realizes four types of base extension reactions at a time, may be a round of sequencing reaction simultaneously realized by four types of base reaction substrates (e.g., nucleotide analogs/base analogs) in one base extension reaction system, may be a round of sequencing reaction realized by two types of base analogs in one base extension reaction system and another two types of reaction substrates in the next base extension reaction system, or may be a round of sequencing reaction realized by adding one type of base analog in one base extension reaction system and sequentially adding the four types of base analogs in four consecutive base extension reaction systems. It is understood that the first image, the second image, the third image and the fourth image may be acquired from two base extension reactions or more base extension reactions. In addition, one base extension reaction may comprise one image acquisition, or may comprise multiple image acquisitions.
In one example, a round of sequencing reactions includes a plurality of base extension reactions, such as monochromatic sequencing, using reaction substrates (nucleotide analogs) corresponding to four types of bases each carrying the same fluorescent dye, a round of sequencing reactions includes four base extension reactions (4repeats), and for one field of view, one base extension reaction includes one image acquisition, and image M1, image N1, image P1, and image Q1 are the same field of view of the four base extension reactions from a round of sequencing reactions, respectively.
In another example, such as a single-molecule two-color sequencing reaction, two of the reaction substrates (nucleotide analogs) corresponding to four types of bases are used with one fluorescent dye and two fluorescent dyes with different excitation wavelengths, one cycle of the sequencing reaction includes two base extension reactions, two types of base reaction substrates with different dyes are subjected to a binding reaction in one base extension reaction, and one base extension reaction includes two image acquisitions at different excitation wavelengths for one field of view, and the image M1, the image N1, the image P1 and the image Q1 are respectively from the same field of view at two excitation wavelengths of two base extension reactions of one cycle of the sequencing reaction.
In yet another example, a round of sequencing reactions includes a single base extension reaction, such as a two-color sequencing reaction of a second generation sequencing platform, with four types of base reaction substrates (e.g., nucleotide analogs) with dye a, dye b, dye a and dye b, and without any dye, respectively, the excitation wavelengths of dye a and dye b being different; the four types of reaction substrates realize one round of sequencing reaction in the same base extension reaction, wherein one round of base extension reaction comprises two times of image acquisition under different excitation wavelengths, the first image and the third image, the second image and the fourth image, and the image M1 and the image N1 are respectively from different rounds of sequencing reactions or the same field of view under different excitation wavelengths in the same round of sequencing reactions.
In certain embodiments, S20 merges the first coincident bright spots on the plurality of combined images, including one or more matches of the first coincident bright spots in different combined images to obtain a set of bright spots corresponding to the sequencing template. Therefore, the method is beneficial to obtaining an accurate set of the bright spots corresponding to the template nucleic acid molecules one by one and building an accurate template based on the image information.
In certain embodiments, image M1, image N1, image P1, and image Q1 are obtained sequentially, and image M2, image N2, image P2, and image Q2 are obtained sequentially, i.e., image M1, image N1, image P1, and image Q1 are obtained in one round of sequencing reaction, image M2, image N2, image P2, and image Q2 are obtained in another round of sequencing reaction, and S10 includes: combining the images M1, M2, N1, N2, P1, P2, Q1 and Q2 at intervals of S images in pairs to obtain K combined images and matching bright spots on the combined images, and discarding non-coincident bright spots on the combined imagesS is an integer, 0-Smax,SmaxTotal number of images participating in the combination-4. It can be calculated that K ═ [ (total number of images participating in the combination B-S-1) +1](total number of images participating in the combination B-S-1)/2, i.e.
Figure GDA0003036720500000041
For example, when S is 2, K is 15. Thus, the complete sequencing template can be constructed by fully utilizing the image information as less as possible.
The sequence comprises four times of base extension reactions for one round of sequencing reaction, namely each time the base extension reaction only contains one type of nucleotide analogue, preferably S is more than 1, more preferably S is more than 2, which is beneficial to avoiding or reducing the interference of noise brought by biochemical test factors on the construction of a sequencing template based on images and is beneficial to effectively and accurately determining the template.
In one embodiment, the total number of images participating in the combination is 12, and S is 2, so that a more complete sequencing template can be obtained, and the loss of off-line data (reads) is reduced.
In another embodiment, the total number of images participating in the combination is 8, and S-2. When Repeat is 5, image Repeat1-5 (images of Repeat1 and Repeat 5) and Repeat2-5 are respectively subjected to overlapping bright spot matching, and then matching results are combined into a Template container (Template; initially empty); in one example, since Repeat4 images are used for construction of reference images, to reduce the amount of computation, template construction starts with image Repeat 5; when Repeat is 6, overlapping bright spot matching is respectively carried out on the Repeat1-6, Repeat2-6 and Repeat3-6 of the images, and matching results are combined into Template; when Repeat is 7, respectively carrying out superposition bright spot matching on images Repeat1-7, Repeat2-7, Repeat3-7 and Repeat4-7, and combining matching results into Template; when Repeat is 8, overlapping bright spots of images Repeat1-8, Repeat2-8, Repeat3-8, Repeat4-8 and Repeat5-8 are matched, and matching results are combined into a Template; finally, counting the bright spots in all Template containers, and outputting, wherein each bright spot coordinate represents one chain, namely one reads. After the template is successfully constructed, the total number of reads TotalRead can be known. Fig. 2 is a schematic diagram of the process, the upper four images in fig. 2 are Repeat1, Repeat5, Repeat6 and Repeat7 in sequence, the middle image change shows the process and the result of the Repeat1 and Repeat5 overlapping bright spot matching, and the lower image shows the result of the Repeat1, Repeat5, Repeat6 and Repeat7 overlapping bright spot matching.
In one example, in an imaging system, the electronic sensor has a size of 6.5 μm, and the microscope is magnified 60 times, with the smallest dimension seen being 0.1 μm. The size of the bright spot corresponding to a nucleic acid molecule is typically less than 10 x 10 pixels.
The first predetermined pixel is referred to as, in one example, 1.05 pixels.
In one example, two first coincident bright spots having a distance greater than 1.85 pixels are set as the two first coincident bright spots.
In one example, coincident patches that are more than 1.05 pixels from one coincident patch but less than 1.85 pixels from the other are discarded. Thus, the method is beneficial to constructing an accurate sequencing template.
In some embodiments, the images are registered images. Therefore, the method is beneficial to accurately acquiring the bright spot set corresponding to the sequencing template.
The embodiment of the invention does not limit the mode of realizing image registration, namely deviation rectification. In some examples, image registration is performed using a method comprising: performing first registration on an image to be registered based on a reference image, wherein the reference image and the image to be registered correspond to the same object, the reference image and the image to be registered both comprise a plurality of bright spots, the first offset of a predetermined area on the image to be registered and a corresponding predetermined area on the reference image is determined, all the bright spots on the image to be registered are moved based on the first offset, and the image to be registered after the first registration is obtained; and carrying out second registration on the first registered image to be registered based on the reference image, wherein the second registration comprises merging the first registered image to be registered and the reference image to obtain a merged image, calculating the offset of all overlapped bright spots of a preset area on the merged image to determine a second offset, taking two or more bright spots with the distance smaller than a preset pixel as one overlapped bright spot, and moving all the bright spots on the first registered image to be registered based on the second offset to realize the registration of the image to be registered. The image registration method can be relatively called coarse registration and fine registration through two times of associated registration, comprises the step of performing fine registration by using bright spots on an image, can quickly realize high-precision deviation correction of the image based on a small amount of data information, and is particularly suitable for scenes with high-precision image deviation correction requirements. For example, single molecule-level image detection, such as images of sequencing reactions from third generation sequencing platforms. The term single molecule scale refers to a size with resolution of a single or a few molecules, e.g. 10, 8, 5, 4 or less than 3 molecules.
In some embodiments, the image to be registered, i.e., the image from which the sequencing template is constructed, is from a sequencing platform that utilizes optical imaging principles for sequence determination. The term sequencing, also known as sequencing, refers to nucleic acid sequencing, including DNA sequencing and/or RNA sequencing, including long-fragment sequencing and/or short-fragment sequencing, and sequencing a biochemical reaction including extension of a base. Sequencing can be carried out by a sequencing platform, and the sequencing platform can be selected from but not limited to Hisq/Miseq/Nextseq sequencing platform of Illumina, Ion Torrent platform of Thermo Fisher/Life Technologies, BGISEQ platform of Huada gene and single-molecule sequencing platform; the sequencing mode can select single-ended sequencing or double-ended sequencing; the sequencing results/data obtained, i.e.the fragments read by the assay, are called reads (reads), the length of which is called read length. The so-called "bright spots" correspond to the optical signal of an extended base or base cluster.
The predetermined area on the image may be the entire image or a part of the image. In one example, the predetermined region on the image is a portion of the image, such as a 512 x 512 region in the center of the image. The center of the image is the center of the field of view, the intersection point of the optical axis of the imaging system and the imaging plane can be referred to as the image center point, and the region centered on the center point can be regarded as the image center region.
In some embodiments, the image to be registered is from a nucleic acid sequencing platform, which includes an imaging system and a nucleic acid sample carrying system, the nucleic acid molecules to be detected with optical detection marks are fixed in a reactor, which is also called a chip, and the chip is loaded on a movable stage, and the movable stage drives the chip to move to realize image acquisition of the nucleic acid molecules to be detected at different positions (different fields of view) of the chip. Generally, there are precision limitations on the movement of the optical system and/or the mobile station, for example, there are deviations between the position to which the command specifies the movement and the position to which the actual movement of the mechanical structure is to be made, especially in application scenarios with high precision requirements, whereby, in the process of moving hardware according to the command to perform multiple image acquisitions of the same position (field of view) at different time points, it is difficult to completely align the multiple images of the same field of view acquired at different time points, and the images are aligned in a de-skewing manner, which is advantageous for accurately determining the nucleotide sequence of the nucleic acid molecule based on the change of information in the multiple images acquired at the multiple time points.
In some embodiments, the reference image is obtained by construction, and the reference image may be constructed during registration of the image to be registered, or may be pre-constructed and recalled as needed for storage.
In some examples, constructing the reference image includes: acquiring a fifth image and a sixth image, wherein the fifth image and the sixth image correspond to the same object as the image to be registered; performing coarse registration on the sixth image based on the fifth image, wherein the coarse registration comprises determining the offset of the sixth image and the fifth image, and moving the sixth image based on the offset to obtain a coarsely registered sixth image; and combining the fifth image and the coarsely registered sixth image to obtain a reference image, wherein the fifth image and the sixth image both comprise a plurality of bright spots. Therefore, the image containing more or relatively more complete information is obtained by construction, and the image is used as the deviation rectifying reference, so that more accurate image registration is favorably realized. For the image obtained by the nucleic acid sequence determination, a plurality of images are utilized to construct a reference image, which is beneficial to enabling the reference image to obtain complete speckle information of corresponding nucleic acid molecules and is beneficial to image rectification based on the speckle.
In some embodiments, the fifth image and the sixth image are from the same field of view at different times of a nucleic acid sequencing reaction (sequencing reaction). In one example, a round of sequencing reactions includes multiple base extension reactions, such as monochromatic sequencing, using reaction substrates (nucleotide analogs) corresponding to four types of bases all with the same fluorescent dye, a round of sequencing reactions includes four base extension reactions (4repeats), one base extension reaction includes one image acquisition for one field, and the fifth and sixth images are from the same field of different base extension reactions, respectively. Therefore, the reference image obtained by processing and collecting the information of the fifth image and the sixth image is used as the basis for deviation correction, and more accurate image deviation correction is facilitated.
In another example, a single-molecule two-color sequencing reaction uses two of the reaction substrates (nucleotide analogs) corresponding to four types of bases with one fluorescent dye and two fluorescent dyes with different excitation wavelengths, one round of sequencing reaction includes two base extension reactions, two types of base reaction substrates with different dyes perform a binding reaction in one base extension reaction, one base extension reaction includes two image acquisitions at different excitation wavelengths for one field of view, and the fifth image and the sixth image are from different base extension reactions or the same field of view at different excitation wavelengths in the same base extension reaction, respectively. Therefore, the reference image obtained by processing and collecting the information of the fifth image and the sixth image is used as the basis for deviation correction, and more accurate image deviation correction is facilitated.
In yet another example, a round of sequencing reactions includes a single base extension reaction, such as a two-color sequencing reaction of a second generation sequencing platform, with four types of base reaction substrates (e.g., nucleotide analogs) with dye a, dye b, dye a and dye b, and without any dye, respectively, the excitation wavelengths of dye a and dye b being different; the four types of reaction substrates realize one round of sequencing reaction in the same base extension reaction, and the fifth image and the sixth image are respectively from the same field of view under different excitation wavelengths in different rounds of sequencing reactions or the same round of sequencing reactions. Therefore, the reference image obtained by processing and collecting the information of the fifth image and the sixth image is used as the basis for deviation correction, and more accurate image deviation correction is facilitated.
The fifth image and/or the sixth image may be one image or a plurality of images. In one example, the fifth image is the first image and the sixth image is the second image. Further, in some embodiments, the method further includes constructing a reference image by using the seventh image and the eighth image, the image to be registered, the fifth image, the sixth image, the seventh image and the eighth image are from the same field of view of the sequencing reaction, the fifth image, the sixth image, the seventh image and the eighth image respectively correspond to fields of view of A, T/U, G and the four types of base extension reaction, the fields of view of the base extension reaction have a plurality of nucleic acid molecules with optically detectable labels, at least a part of the nucleic acid molecules appear as bright spots on the images, and constructing the reference image further includes: performing coarse registration on the seventh image based on the fifth image, wherein the coarse registration comprises determining the offset of the seventh image and the fifth image, and moving the seventh image based on the offset to obtain a coarsely registered seventh image; performing coarse registration on the eighth image based on the fifth image, wherein the coarse registration comprises determining the offset of the eighth image and the fifth image, and moving the eighth image based on the offset to obtain a coarsely registered eighth image; and merging the fifth image and the coarsely registered sixth image, the coarsely registered seventh image and the coarsely registered eighth image to obtain a reference image.
Embodiments of the present invention do not limit the implementation manner of the first registration, and for example, the first offset may be determined by using frequency domain registration using fourier transform. Specifically, for example, the first shift amount, the shift amounts of the sixth image and the fifth image, the shift amounts of the seventh image and the fifth image, and/or the shift amounts of the eighth image and the fifth image may be determined by two-dimensional discrete fourier transform in a pure Phase Correlation Function (Phase-Only Correlation Function) in Kenji TAKITA et al, ice trans. The first registration/coarse registration may achieve a 1-pixel (1pixel) accuracy. In this way, the first offset can be determined quickly and accurately and/or a reference image favorable for accurate rectification can be constructed.
In some embodiments, the reference image and the image to be registered are binarized images. Therefore, the method is favorable for reducing the calculation amount and quickly rectifying the deviation.
In one example, both the image to be rectified and the reference image are binarized images, that is, each pixel in the image is not a, that is, b, for example, a is 1, b is 0, and the pixel mark is 1 and is brighter than the pixel mark is 0, or has greater intensity; the reference image is constructed using images repeat1, repeat2, repeat3 and repeat4 of four base extension reactions of a round of sequencing reactions, the fifth and sixth images being selected from any one, two or three of images repeat 1-4.
In one example, the fifth image is image repeat1, images repeat2, repeat3 and repeat4 are sixth images, and image repeat2-4 is subjected to coarse registration in sequence based on image repeat1 to obtain coarsely registered images repeat2-4 respectively; the image repeat1 and the coarsely registered image repeat2-4 are combined to obtain a reference image. The merged image is referred to as a coincident bright spot in the merged image. Two bright spots on two images that are not more than 1.5 pixels apart are set as coincident bright spots in one example, based primarily on the size of the bright spots of the corresponding nucleic acid molecules and the imaging system resolution. The central area of the synthesized image with 4repeat is used as a reference image, so that the reference image has a sufficient amount of bright spots and subsequent registration is facilitated, the bright spots in the central area of the image are detected and positioned, the information of the bright spots is relatively more accurate, and accurate registration is facilitated.
In one example, the following steps are performed to deskew an image: 1) roughly correcting the deviation of an image repeat5 of a certain view field of a primary base extension reaction collected from another round of sequencing reaction, wherein repeat5 is a binarized image, a center 512 × 512 region of the image is taken, and a center image (a center 512 × 512 region of a corresponding reference image) synthesized with repeat1-4 is subjected to two-dimensional discrete Fourier transform, and frequency domain registration is used to obtain an offset (x0, y0), namely, the image rough registration is realized, and x0 and y0 can reach the precision of 1 pixel; 2) combining (merge) the roughly registered image and the reference image based on the bright spots on the image, including calculating an offset (x1, y1) of the overlapped bright spots in the central area of the repeat5 image and the corresponding area of the reference image, which is the coordinate position of the bright spot of the image to be rectified-the coordinate position of the corresponding bright spot on the reference image, which can be expressed as offset (x1, y1) which is curRepeatPoints-basePoints; the average offset of all the superimposed patches is found to give a fine offset in the range of [0,0] to [1,1 ]. In one example, two bright spots on two images with a distance of no more than 1.5 pixels are set as coincident bright spots; 3) in summary, the offsets (x0, y0) - (x1, y1) of different cycles of a visual field image (fov) are obtained, and can be expressed as: currepeatopoints + (x0, y0) - (x1, y1), which represent the original coordinates of the bright spot, i.e., the coordinates in the image before rectification. The deviation rectifying result obtained by the image deviation rectifying has higher accuracy, and the deviation rectifying precision is less than or equal to 0.1 pixel. Fig. 3 illustrates a deviation rectifying process and a result, in fig. 3, an image C is rectified based on an image a, circles in the image a and the image C represent bright spots, bright spots marked by the same number are overlapped bright spots, and an image C- > a represents a deviation rectifying result, that is, a result of aligning the image C to the image a.
The embodiment of the present invention does not limit the manner of recognizing and detecting the bright spots on the image. In some embodiments, performing image registration further includes identifying a hot spot, including performing hot spot detection on the image using a k1 × k2 matrix, determining that a matrix having a center pixel value not less than any pixel value other than the center pixel value of the matrix corresponds to a candidate hot spot, and determining whether the candidate hot spot is a hot spot, where k1 and k2 are both odd numbers greater than 1, and where k1 × k2 matrix includes k1 × k2 pixels. The image is selected from at least one of the images to be registered, the images constituting the reference image. By using the method to detect the bright spots on the image, the detection of the bright spots (spots or peaks) on the image can be quickly and effectively realized, and particularly the image collected from the nucleic acid sequence determination reaction can be detected. The method has no special limitation on the image to be detected, namely the original input data, is suitable for processing and analyzing the image generated by any platform for carrying out nucleic acid sequence determination by using the optical detection principle, including but not limited to second generation and third generation sequencing, has the characteristics of high accuracy and high efficiency, and can acquire more information representing the sequence from the image. Especially for random images and signal recognition with high accuracy requirements.
In some embodiments, the image is from a nucleic acid sequencing reaction, the nucleic acid molecule has an optically detectable label, such as a fluorescent label, and the fluorescent molecule is capable of being excited to fluoresce when illuminated with a laser of a particular wavelength, and the image is acquired by an imaging system. The acquired image includes a spot of light/bright spot that may correspond to the location of the fluorescent molecule. Understandably, when the image is at the focal plane position, the size of the bright spot corresponding to the position of the fluorescent molecule in the acquired image is small and the brightness is high; when the fluorescent light source is located at the non-focal surface position, the size of a bright spot corresponding to the position of the fluorescent molecules in the acquired image is larger and the brightness is lower. In addition, other non-target or subsequently difficult to utilize substances/information may be present in the field of view, such as impurities and the like; further, in photographing a single-molecule field of view, a large amount of molecular aggregation (cluster) and the like may also interfere with the target single-molecule information acquisition. A single molecule is said to be a few molecules, for example no more than 10 molecules, for example one, two, three, four, five, six, eight or ten molecules.
In some examples, a center pixel value of the matrix is greater than a first preset value, any pixel value not in the center of the matrix is greater than a second preset value, and the first preset value and the second preset value are related to an average pixel value of the image.
In some embodiments, the image may be subjected to traversal detection using a k1 × k2 matrix, the set of first and/or second preset values being related to the average pixel value of the image. For a grayscale image, the pixel values are the same as the grayscale values. k1 × k2 matrix, k1 and k2 may be equal or unequal. In one example, the imaging system related parameters are: the objective lens is 60 times, the size of the electronic sensor is 6.5 μm, the minimum size of the image formed by the microscope is 0.1 μm, the obtained image or the input image can be a 16-bit gray scale or color image of 512 × 512, 1024 × 1024 or 2048 × 2048, and the value ranges of k1 and k2 are both more than 1 and less than 10. In one example, k1 ═ k2 ═ 3; in another example, k 1-k 2-5. If the image is a color image, one pixel point of the color image has three pixel values, the color image can be converted into a gray image, and then bright spot detection is carried out, so that the calculated amount and the complexity of the image detection process are reduced. The non-grayscale image may be optionally, but not limited to, converted to a grayscale image using a floating-point algorithm, an integer method, a shift method, or an average value method, etc.
In one example, the inventors can obtain the bright spot detection result from the optical detection mark by counting a large amount of image processing, and taking the first preset value as 1.4 times and the second preset value as 1.1 times as large as the average pixel value of the image, so as to eliminate interference.
The size, the similarity degree and/or the strength with the ideal bright spots can be used for further screening judgment of the candidate bright spots. In some embodiments, the size of the candidate bright spots on the comparison image is quantitatively reflected by the size of the connected domain corresponding to the candidate bright spots, so as to screen and judge whether the candidate bright spots are the wanted bright spots.
In one example, determining whether the candidate hot spot is a hot spot comprises: and calculating the size Area of the connected domain corresponding to one candidate bright spot, wherein the size Area of the corresponding connected domain is larger than a third preset value, judging that the candidate bright spot corresponding to the connected domain with the size larger than the third preset value is one bright spot, A represents the size of the connected pixels/connected pixels of the row where the center of the matrix corresponding to the candidate bright spot is located, B represents the size of the connected pixels/connected pixels of the column where the center of the matrix corresponding to the candidate bright spot is located, and defining the connected pixels which are larger than the average pixel value in a k1 k2 matrix as the connected domain corresponding to the candidate bright spot. Therefore, the bright spots corresponding to the marker molecules and conforming to the subsequent sequence identification can be effectively obtained, and the nucleic acid sequence information can be obtained.
In one example, with the average pixel value of the image as a reference, two or more adjacent pixels not smaller than the average pixel value are called connected pixels/connected pixels (pixel connectivity), as shown in fig. 4, the two or more adjacent pixels are enlarged to indicate the center of the matrix corresponding to the candidate bright spot, the bold frame indicates the 3 × 3 matrix corresponding to the candidate bright spot, the pixel marked with 1 is a pixel not smaller than the average pixel value of the image, the pixel marked with 0 is a pixel smaller than the average pixel value, a is 3, B is 6, and the size of the connected component corresponding to the candidate bright spot is a B is 3 — 6.
The third preset value can be determined according to the information of the sizes of the connected components corresponding to all the candidate bright spots on the image. For example, the size of the connected domain corresponding to each candidate bright spot on the image is calculated, and the average value of the sizes of the connected domains of the bright spots is taken as a third preset value to represent one characteristic of the image; for another example, the sizes of the connected components corresponding to the candidate bright spots on the image may be sorted from small to large, and the size of the 50 th, 60 th, 70 th, 80 th, or 90 th percentile connected component may be taken as the third preset value. Therefore, the speckle information can be effectively obtained, and the subsequent identification of the nucleic acid sequence is facilitated.
In some examples, candidate blobs are screened by statistically setting parameters to quantitatively reflect the intensity characteristics of the comparative candidate blobs. In one example, determining whether the candidate hot spot is a hot spot comprises: calculating Score of one candidate spot ((k1 × k2-1) CV-EV)/((CV + EV)/(k1 × k2)), and determining that the candidate spot with the Score larger than the fourth preset value is one spot, CV represents a central pixel value of a matrix corresponding to the candidate spot, and EV represents a sum of non-central pixel values of the matrix corresponding to the spot. Therefore, the bright spots corresponding to the marker molecules and conforming to the subsequent sequence identification can be effectively obtained, and the nucleic acid sequence information can be obtained.
The fourth predetermined value may be determined according to the information of the scores of all candidate bright spots on the image. For example, when the number of the candidate bright spots on the image is greater than a certain number, which meets the requirement of statistical quantitative requirements, for example, the number of the candidate bright spots on the image is greater than 30, the Score values of all the candidate bright spots of the image can be calculated and sorted in ascending order, and the fourth preset value can be set as the Score value of the 50 th, 60 th, 70 th, 80 th or 90 th quantile, so that the candidate bright spots smaller than the Score value of the 50 th, 60 th, 70 th, 80 th or 90 th quantile can be excluded, which is beneficial to effectively obtaining the target bright spot and is beneficial to accurately identifying the subsequent base sequence. The basis for this processing or screening setting is that, in general, the bright spots that have a large difference in central and edge intensities/pixel values and that converge are considered to be the bright spots corresponding to the positions of the molecules to be detected. Typically, the number of candidate bright spots on the image is greater than 50, greater than 100, or greater than 1000.
In some examples, candidate bright spots are screened in combination with morphology and intensity/brightness. In one example, determining whether the candidate hot spot is a hot spot comprises: calculating the size Area of a connected domain corresponding to a candidate bright spot, and calculating the Score of the candidate bright spot, wherein the Score is ((k1 k2-1) CV-EV)/((CV + EV)/(k1 k2)), A represents the size of connected pixels/connected pixels of a row where the center of a matrix corresponding to the candidate bright spot is located, B represents the size of connected pixels/connected pixels of a column where the center of the matrix corresponding to the candidate bright spot is located, a connected pixel which is larger than the average pixel value in a k1 k2 matrix is defined as a connected domain corresponding to the candidate bright spot, CV represents the center pixel value of the matrix corresponding to the candidate bright spot, and EV represents the sum of non-center pixel values of the matrix corresponding to the candidate bright spot; and judging the candidate bright spots of which the size of the corresponding connected domain is larger than the third preset value and the score is larger than the fourth preset value as one bright spot. Thus, the speckle information corresponding to the nucleic acid molecule and beneficial to the subsequent sequence recognition can be effectively obtained. The third preset value and/or the fourth preset value may be considered and set with reference to the previous embodiments.
In some embodiments, the image registration method further comprises bright spot identification detection, comprising: preprocessing an image to obtain a preprocessed image, wherein the preprocessed image is selected from at least one of a first image, a second image, a third image, a fourth image, a fifth image, a sixth image, a seventh image and an eighth image; determining a critical value to simplify the preprocessed image, wherein assignment of pixel values of pixel points on the preprocessed image smaller than the critical value to a first preset value and assignment of pixel values of pixel points on the preprocessed image not smaller than the critical value to a second preset value is carried out to obtain a simplified image; determining a first speckle detection threshold c1 based on the pre-processed image; identifying candidate bright spots on the image based on the preprocessed image and the simplified image, including judging a pixel point matrix satisfying at least two conditions in a) to c) as a candidate bright spot, a) identifying candidate bright spots on the preprocessed imageThe pixel value of the central pixel of the pixel matrix is the maximum, the pixel matrix can be represented as r1 × r2, r1 and r2 are both odd numbers larger than 1, the r1 × r2 pixel matrix comprises r1 × r2 pixels, b) in the simplified image, the pixel value of the central pixel of the pixel matrix is the second preset value, and the connected pixels of the pixel matrix are larger than the connected pixels of the pixel matrix
Figure GDA0003036720500000101
And c) the pixel value of the central pixel of the pixel matrix in the preprocessed image is greater than a third preset value and meets the requirement of g1 × g2>c1, g1 is a correlation coefficient of two-dimensional Gaussian distribution in a range of m1 × m2 by taking a central pixel point of the pixel point matrix as a center, g2 is a pixel in a range of m1 × m2, m1 and m2 are both odd numbers larger than 1, and a range of m1 × m2 contains m1 × m2 pixel points; and determining whether the candidate hot spot is a hot spot. The method for detecting the bright spots on the image comprises the step of training the judgment condition or the combination of the judgment conditions determined by the inventor through a large amount of data, and can quickly and effectively realize the detection of the bright spots on the image, particularly the image collected from the nucleic acid sequence determination reaction. The method has no special limitation on the image to be detected, namely the original input data, is suitable for processing and analyzing the image generated by any platform for carrying out nucleic acid sequence determination by using the optical detection principle, including but not limited to second generation and third generation sequencing, has the characteristics of high accuracy and high efficiency, and can acquire more information representing the sequence from the image. Especially for random images and signal recognition with high accuracy requirements.
For a grayscale image, the pixel values are the same as the grayscale values. If the image is a color image, one pixel point of the color image has three pixel values, the color image can be converted into a gray image, and then bright spot detection is carried out, so that the calculated amount and the complexity of the image detection process are reduced. The non-grayscale image may be optionally, but not limited to, converted to a grayscale image using a floating-point algorithm, an integer method, a shift method, or an average value method, etc.
In some embodiments, pre-processing the image comprises: determining the background of the image by utilizing an opening operation; converting the image into a first image by utilizing top hat operation based on the background; performing Gaussian blur processing on the first image to obtain a second image; the second image is sharpened to obtain what is referred to as a pre-processed image. Therefore, the method can effectively reduce noise of the image or improve the signal to noise ratio of the image, and is favorable for accurate detection of the bright spots.
The opening operation is a morphological treatment, namely, a process of expanding firstly and then corroding, wherein the corrosion operation can make the foreground (the interested part) smaller, and the expanding can make the foreground larger; the on operation can be used to eliminate small objects, separate objects at fine points, and smooth the boundaries of larger objects without significantly changing their area. The size of the structural element p1 × p2 (basic template for processing an image) for performing an open operation on an image in this embodiment is not particularly limited, and p1 and p2 are odd numbers. In one example, the structural elements p 1p 2 may be 15 x 15, 31 x 31, etc., which ultimately enable a pre-processed image to be obtained that facilitates subsequent processing analysis.
Top hat operations are often used to separate patches that are brighter than nearby points (bright spots/bright spots), and in the case where an image has a large background and tiny objects are regular, top hat operations can be used to extract the background. In one example, top-hat transforming the image includes performing an open operation on the image, and subtracting the open operation result from the original image to obtain a first image, i.e., a top-hat transformed image. The mathematical expression of top-hat transformation is dst tophat (src, element) ═ src-open (src, element). The inventor considers that the result of the opening operation enlarges the crack or the local low-brightness area, so that the image obtained by subtracting the image after the opening operation from the original image highlights the area brighter than the area around the outline of the original image, the operation is related to the size of the selected nucleus, and can be considered to be related to the expected size of the bright point/bright spot, if the bright point is not the expected size, the effect after the processing can cause the whole image to generate a plurality of small bulges, and particularly, the bright point/bright spot can be stained in a lump by referring to the virtual focus image. In one example, the expected size of the bright spot, i.e., the size of the selected kernel, is 3 × 3, and the resulting top-hat transformed image is favorable for further denoising processing.
Gaussian blur (gaussian blur), also known as gaussian filtering, is a linear smoothing filter, is suitable for eliminating gaussian noise, and is widely applied to noise reduction processes of image processing. Generally speaking, gaussian filtering is a process of performing weighted average on the whole image, and the value of each pixel point is obtained by performing weighted average on the value of each pixel point and other pixel values in the neighborhood. The specific operation of gaussian filtering is: each pixel in the image is scanned using a template (or convolution, mask), and the weighted average gray value of the pixels in the neighborhood determined by the template is used to replace the value of the pixel in the center of the template. In one example, the first image is subjected to gaussian blurring, which is performed in OpenCV using a gaussian filtering gaussian blur function, the gaussian distribution parameter Sigma takes 0.9, the two-dimensional filter matrix (convolution kernel) used is 3 × 3, and after the gaussian blurring from the image perspective, the small protrusions on the first image are smoothed and the image edges are smooth. Further, the second image, i.e., the gaussian filtered image, is sharpened, for example, by performing a two-dimensional laplacian sharpening, and after the image is processed from the viewpoint of the image, the edge is sharpened, and the image after the gaussian blur is restored.
In some embodiments, simplifying the pre-processed image comprises: determining a critical value based on the background and the preprocessed image; and comparing the pixel value of the pixel point on the preprocessed image with the critical value, assigning the pixel value of the pixel point on the preprocessed image smaller than the critical value as a first preset value, and assigning the pixel value of the pixel point on the preprocessed image not smaller than the critical value as a second preset value to obtain the simplified image. Therefore, according to the critical value determining mode and the determined critical value summarized by a large amount of test data of the inventor, the preprocessed image is simplified, such as binaryzation, so that the method is beneficial to accurate detection of subsequent bright spots, accurate identification of subsequent bases, acquisition of high-quality data and the like.
Specifically, in some examples, obtaining the simplified image includes: dividing the sharpened result obtained after preprocessing by an open operation result to obtain a group of numerical values corresponding to the image pixel points; and determining the critical value of the image after the binarization preprocessing through the set of values. For example, the set of values may be sorted in ascending order of magnitude, and the value corresponding to the 20 th, 30 th or 40 th percentile of the set of values may be used as the binarization critical value/threshold value. Therefore, the obtained binary image is beneficial to accurate detection and identification of subsequent bright spots.
In one example, the structural element of the open operation during image preprocessing is p1 × p2, so called dividing the preprocessed image (sharpened result) by the open operation result to obtain a group of arrays/matrices p1 × p2 with the same size as the structural element, in each array, arranging the p1 × p2 values contained in the array in ascending order of size, and taking the value corresponding to the thirty-th percentile in the array as the binarization critical value/threshold value of the region (value matrix), so as to determine to binarize each region on the threshold image respectively, and the finally obtained binarization result emphasizes the required information while denoising, which is favorable for accurate detection of subsequent bright spots.
In some examples, the determination of the first speckle detection threshold is made using the Otsu method. Otsu's method (OTSU algorithm) can also be called maximum inter-class variance method, and it utilizes the maximum inter-class variance to segment images, meaning that the probability of misclassification is small and the accuracy is high. Assuming that the segmentation threshold of the foreground and the background of the preprocessed image is T (c1), the proportion of the number of pixels belonging to the foreground in the whole image is w0Average gray of μ0(ii) a The proportion of the number of pixels belonging to the background to the whole image is w1Average gray of μ1. And (3) recording the total average gray level of the image to be processed as mu and the between-class variance as var, and then:
μ=ω0011;var=ω00-μ)211-μ)2substituting the latter into the former to obtain an equivalent formula: var ═ ω0ω110)2. And obtaining a segmentation threshold T which enables the inter-class variance to be maximum by adopting a traversal method, namely obtaining the first speckle detection threshold c 1.
In some embodiments, identifying the candidate hot spot on the image based on the preprocessed image and the simplified image includes determining a pixel matrix satisfying all of the conditions a) -c) as a candidate hot spot. Therefore, the accuracy of the subsequent determination of the nucleic acid sequence based on the speckle information and the quality of the off-line data can be effectively improved.
Specifically, in one example, the conditions that need to be satisfied by the determination of the candidate bright spots include a), k1, and k2 may be equal or unequal. In one example, the imaging system related parameters are: the objective lens is 60 times, the size of the electronic sensor is 6.5 μm, the minimum size of the image formed by the microscope is 0.1 μm, the obtained image or the input image can be a 16-bit gray scale or color image of 512 × 512, 1024 × 1024 or 2048 × 2048, and the value ranges of k1 and k2 are both more than 1 and less than 10. In one example, in a pre-processed image, k 1-k 2-3 is set according to the expected size of the bright spot; in another example, k 1-k 2-5 is set.
In one example, the condition that the candidate bright spot needs to be determined includes b), in the simplified image, the pixel value of the central pixel of the pixel matrix is a second preset value, and the connected pixels of the pixel matrix are larger than the connected pixels of the pixel matrix
Figure GDA0003036720500000111
That is, the pixel value of the central pixel is greater than the threshold value and the connected pixels are greater than two-thirds of the matrix. Here, two or more pixels whose adjacent pixel values are all the second preset value are called connected pixels/connected pixels (pixelconnection), for example, the simplified image is a binarized image, the first preset value is 0, the second preset value is 1, as shown in fig. 4, the bold and enlarged representation indicates the center of the called pixel matrix, the thick frame indicates a pixel matrix 3 × 3, that is, k1 ═ k2 ═ 3, the pixel value of the center pixel of the matrix is 1, the connected pixels are 4, and smaller than the connected pixels (pixelconnection), and the thick frame indicates the pixel matrix 3 × 3, that is, k1 ═ k2 ═ 3, and the pixel value of the center pixel of the matrix
Figure GDA0003036720500000121
The pixel point matrix does not meet the condition b), and the pixel point matrix is not a candidate bright spot.
In one example, the condition that needs to be satisfied for the determination of the candidate bright spot includes c), in the preprocessed image, g2 is the modified m1 m2 range of pixels, i.e., the modified m1 m2 range of pixel sums. In an example, the correction is performed according to the proportion of the pixels having the pixel values of the second preset value in the range of m1 × m2 corresponding to the simplified image, for example, as shown in fig. 5, m1 is m2 is 5, the proportion of the pixels having the pixel values of the second preset value in the range of m1 × m2 corresponding to the simplified image is 13/25(13 pieces of "1"), and g2 after the correction is 13/25. Therefore, the method is beneficial to more accurately detecting and identifying the bright spots and is beneficial to analyzing and reading the subsequent bright spot information.
In some examples, the determining whether the candidate hot spot is a hot spot further comprises: determining a second hot spot detection threshold value based on the preprocessed image, and judging the candidate hot spots with the pixel values not less than the second hot spot detection threshold value as hot spots. In a specific example, the pixel value of the pixel point where the coordinate of the candidate hot spot is located is taken as the pixel value of the candidate hot spot. Through further screening of the candidate bright spots by using the second bright spot detection threshold determined based on the preprocessed image, at least one part of the bright spots which are more likely to be the image background and have brightness (intensity) and/or shape of 'bright spots' can be excluded, so that accurate identification of a subsequent sequence based on the bright spots is facilitated, and the quality of off-line data is improved.
In one example, the coordinates of the candidate bright spots, including sub-pixel level coordinates, may be obtained using a barycentric method. And calculating the gray value of the coordinate position of the candidate bright spot by using a bilinear interpolation method.
In some specific examples, determining whether the candidate hot spot is a hot spot includes: dividing the preprocessed image into a group of regions (blocks) with a preset size, and sequencing pixel values of pixel points in the regions to determine a second bright spot detection threshold corresponding to the regions; and judging the candidate bright spots with the pixel values not less than the second bright spot detection threshold value corresponding to the area as the bright spots. Therefore, the difference of different areas of the image, such as the integral fall of light intensity, is distinguished, the further detection and identification of the bright spots are separately carried out, the accurate identification of the bright spots is facilitated, and more bright spots are obtained.
The preprocessed image is said to be divided into a set of regions (blocks) of a predetermined size, with or without overlap between the blocks. In one example, there is no overlap between blocks. In some embodiments, the size of the pre-processed image is not less than 512 × 512, such as 512 × 512, 1024 × 1024, 1800 × 1800, or 2056 × 2056, and the region of the predetermined size may be set to 200 × 200. Therefore, the method is beneficial to quickly calculating, judging and identifying the bright spots.
In some embodiments, when the second bright spot detection threshold corresponding to the region is determined, the pixel values of the pixels in each block are arranged in an ascending order according to the size, p10+ (p10-p1) × 4.1 is taken as the second bright spot detection threshold corresponding to the block, that is, the background of the block, p1 represents the pixel value of the tenth percentile, and p10 represents the pixel value of the tenth percentile. The threshold is a stable threshold obtained by a large amount of data training tests of the inventor, and can eliminate a large amount of bright spots on the background. It will be appreciated that this threshold may need to be adjusted appropriately when the optical system is adjusted and the overall pixel distribution of the image changes. Fig. 6 is a schematic diagram showing comparison between the bright spot detection results before and after the processing, that is, a schematic diagram showing the bright spot detection results before and after the background of the area is eliminated, the upper half of fig. 6 is the bright spot detection result after the processing, the lower half is the bright spot detection result without the processing, and the cross mark is the candidate bright spot or the bright spot.
The embodiment of the invention also provides a base recognition method, which comprises the steps of matching the bright spots on the image obtained from the base extension reaction to the set of the bright spots corresponding to the sequencing template, carrying out base recognition according to the matched bright spots, wherein a plurality of nucleic acid molecules with optically detectable labels exist in the visual field corresponding to the image obtained from the base extension reaction, at least one part of the nucleic acid molecules are expressed as the bright spots on the image obtained from the base extension reaction, and the set of the bright spots corresponding to the sequencing template is obtained by the method for constructing the sequencing template based on the image in any one of the embodiments.
The above description of the technical features and advantages of the method for constructing a sequencing template based on an image in any embodiment is also applicable to the method for base recognition in this embodiment of the present invention, and will not be repeated herein.
Specifically, the hot spots on the image obtained from the base extension reaction can be matched with the constructed hot spot set in a traversal manner. In certain embodiments, if any of the set of spots corresponding to the sequencing template is present at a distance less than the third predetermined pixel from any of the spots on the image obtained from the base extension reaction, then the spot on the image obtained from the base extension reaction is determined to match the set of spots corresponding to the sequencing template. In one example, the third predetermined pixel is referred to as 2. Thus, accurate base discrimination can be achieved, and a partial base sequence (read) of the template can be obtained.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as a sequence listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
The embodiment of the present invention also provides an apparatus 100 for constructing a sequencing template based on an image, as shown in fig. 7, for implementing the method for constructing a sequencing template based on an image according to any of the above embodiments of the present invention, wherein the image includes a first image, a second image, a third image and a fourth image corresponding to a same field of view of A, T/U, G and C four base extension reactions, respectively, the field of view of the base extension reactions contains a plurality of nucleic acid molecules with optically detectable labels, at least a part of the nucleic acid molecules appear as bright spots on the image, and four types of base extension reactions are performed in a round of sequencing reaction in a defined sequence or simultaneously, the first image includes an image M1 and an image M2, the second image includes an image N1 and an image N2, the third image includes an image P1 and an image P2, the fourth image includes an image Q1 and an image Q2, image M1 and image M2 were from two rounds of sequencing reactions, image N1 and image N2 were from two rounds of sequencing reactions, image P1 and image P2 were from two rounds of sequencing reactions, and image Q1 and image Q2 were from two rounds of sequencing reactions, respectively, the apparatus comprising: a combining unit 110 for combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1, and the image Q2 for speckle matching, and causing the image M1, the image N1, the image N2, the image P1, the image P2, the image Q1, and the image Q2 to each participate in the combination at least once, obtaining a plurality of combined images including first coincident speckle, two or more speckle having a distance smaller than a first predetermined pixel on the combined images being one first coincident speckle; a merging unit 130, configured to merge the first overlapped bright spots on the multiple combined images from the combining unit to obtain a set of bright spots corresponding to the sequencing template.
The above description of the technical features and advantages of the method for constructing a sequencing template based on an image in any embodiment of the present invention is also applicable to the apparatus 100 in this embodiment of the present invention, and will not be described herein again.
For example, in the merging unit 130, merging the first overlapping hot spots on the multiple combined images includes performing one or more matching on the first overlapping hot spots in different combined images to obtain a hot spot set corresponding to the sequencing template.
In some examples, image M1, image N1, image P1, and image Q1 are obtained sequentially, image M2, image N2, image P2, and image Q2 are obtained sequentially, and combining unit 130 is to: every S images are combined pairwise for the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 to obtain K combined images, the bright spots on the combined images are matched, the non-overlapped bright spots on the combined images are discarded, S is an integer, S is greater than or equal to 0 and less than or equal to Smax, Smax is the total number of the images participating in combination-4, and K is [ (the total number of the images participating in combination-S-1) +1] (the total number of the images participating in combination-S-1)/2.
In some examples, the image is a registered image.
Specifically, the apparatus 100 further includes a registration unit 108, where the registration unit is configured to perform image registration, and the registration unit includes a first registration module and a second registration module, where the first registration module is configured to perform first registration on the image to be registered based on the reference image, where the reference image and the image to be registered correspond to the same field of view, and includes determining a first offset between a predetermined region on the image to be registered and a corresponding predetermined region on the reference image, and moving all bright spots on the image to be registered based on the first offset to obtain a first registered image; the second registration module is used for carrying out second registration on the first registered image to be registered based on the reference image, and comprises the steps of merging the first registered image to be registered and the reference image to obtain a merged image, calculating the offset of all second merged bright spots of a preset area on the merged image to determine a second offset, taking two or more bright spots with the distance smaller than a second preset pixel on the merged image as one second merged bright spot, and moving all the bright spots on the first registered image to be registered based on the second offset to realize the registration of the image to be registered.
In some examples, the reference image is obtained by construction, the registration unit 108 further comprises a reference image construction module for: acquiring a fifth image and a sixth image, wherein the fifth image and the sixth image correspond to the same visual field as the image to be registered; performing coarse registration on the sixth image based on the fifth image, wherein the coarse registration comprises determining the offset of the sixth image and the fifth image, and moving the sixth image based on the offset to obtain a coarsely registered sixth image; and combining the fifth image and the coarsely registered sixth image to obtain a reference image.
In some examples, in constructing the reference image using the reference image construction module, further comprising using a seventh image and an eighth image, the seventh image and the eighth image being from the same field of view of the sequencing reaction as the image to be registered, the fifth image, the sixth image, the seventh image, and the eighth image corresponding to A, T/U, G and the field of view of the four types of base extension reactions C, respectively, constructing the reference image further comprises: performing coarse registration on the seventh image based on the fifth image, wherein the coarse registration comprises determining the offset of the seventh image and the fifth image, and moving the seventh image based on the offset to obtain a coarsely registered seventh image; performing coarse registration on the eighth image based on the fifth image, wherein the coarse registration comprises determining the offset of the eighth image and the fifth image, and moving the eighth image based on the offset to obtain a coarsely registered eighth image; and merging the fifth image and the coarsely registered sixth image, the coarsely registered seventh image and the coarsely registered eighth image to obtain a reference image.
In some examples, the reference image and the image to be registered are binarized images.
In some examples, the first offset, the offset of the sixth and fifth images, the offset of the seventh and fifth images, and/or the offset of the eighth and fifth images are determined using a two-dimensional discrete fourier transform.
In some examples, the apparatus 100 further comprises a bright spot detection unit 106, the bright spot detection unit 106 to: preprocessing the image to obtain a preprocessed image; determining a critical value to simplify the preprocessed image, wherein assignment of pixel values of pixel points on the preprocessed image smaller than the critical value to a first preset value and assignment of pixel values of pixel points on the preprocessed image not smaller than the critical value to a second preset value is carried out to obtain a simplified image; determining a first speckle detection threshold c1 based on the pre-processed image; identifying candidate bright spots on the image based on the preprocessed image and the simplified image, including judging a pixel point matrix meeting at least two conditions in a) to c) as a candidate bright spot, a) in the preprocessed image, pixel pointsThe pixel value of the central pixel of the matrix is maximum, the pixel matrix can be represented as k1 × k2, both k1 and k2 are odd numbers larger than 1, the k1 × k2 pixel matrix comprises k1 × k2 pixels, b) in the simplified image, the pixel value of the central pixel of the pixel matrix is a second preset value, and the connected pixels of the pixel matrix are larger than those of the connected pixels of the pixel matrix, wherein b) in the simplified image, the pixel value of the central pixel of the pixel matrix is a second preset value
Figure GDA0003036720500000141
And c) the pixel value of the central pixel of the pixel matrix in the preprocessed image is greater than a third preset value and meets the requirement of g1 × g2>c1, g1 is a correlation coefficient of two-dimensional Gaussian distribution in a range of m1 × m2 with a central pixel point of the pixel point matrix as a center, g2 is a pixel in the range of m1 × m2, m1 and m2 are both odd numbers larger than 1, and m1 × m2 contains m1 × m2 pixel points.
In some examples, the bright spot detection unit 106 further includes a processor configured to determine whether the candidate bright spot is a bright spot, including: determining a second hot spot detection threshold value based on the pre-processed image, and judging the candidate hot spot of which the pixel value is not less than the second hot spot detection threshold value as the hot spot.
In some examples, the pixel value of the candidate hot spot is the pixel value of the pixel point where the coordinates of the candidate hot spot are located.
In some examples, determining whether the candidate bright spot is a bright spot in the bright spot detection unit 106 includes: dividing the preprocessed image into a group of regions with preset sizes, sequencing pixel values of pixel points in the regions to determine second bright spot detection thresholds corresponding to the regions, and judging candidate bright spots with pixel values not smaller than the second bright spot detection thresholds corresponding to the regions as the bright spots for the candidate bright spots in the regions.
In some examples, the image is pre-processed in the bright spot detection unit 106, including: determining the background of the image by using an opening operation, converting the image into a first image by using a top hat operation based on the background, performing Gaussian blur processing on the first image to obtain a second image, and sharpening the second image to obtain a preprocessed image.
In some examples, determining a critical value in the bright spot detection unit 106 to simplify the pre-processed image, obtaining a simplified image, includes: and determining a critical value based on the background and the preprocessed image, and comparing the pixel value of the pixel point on the preprocessed image with the critical value to obtain a simplified image.
In some examples, g2 is the corrected pixels in the range of m1 m2, and the correction is performed according to the proportion of the pixels with the pixel values of the second preset value in the corresponding range of m1 m2 of the simplified image.
An embodiment of the present invention further provides a base recognition apparatus 1000 for implementing the base recognition method according to any one of the above embodiments of the present invention, wherein the apparatus 1000 is configured to match the bright spots on the image obtained from the base extension reaction to the set of bright spots corresponding to the sequencing template, and perform base recognition according to the matched bright spots, a plurality of nucleic acid molecules with optically detectable labels are present in the field of view corresponding to the image obtained from the base extension reaction, at least a part of the nucleic acid molecules appear as bright spots on the image obtained from the base extension reaction, and the set of bright spots corresponding to the sequencing template is constructed by the method for constructing the sequencing template based on the image and/or the apparatus for constructing the sequencing template based on the image in any one of the above embodiments.
Specifically, in the base recognition apparatus 1000, if any one of the bright spots in the set of bright spots corresponding to the sequencing template exists at a distance smaller than the third predetermined pixel from any one of the bright spots on the image obtained from the base extension reaction, it is determined that the bright spot in the set of bright spots corresponding to the sequencing template matches the set of bright spots corresponding to the sequencing template.
There is also provided, in accordance with an embodiment of the present invention, a computer product including instructions for implementing image-based construction of a sequencing template, the instructions, when executed by a computer, cause the computer to perform the method for image-based construction of a sequencing template according to any one of the embodiments of the present invention described above.
According to an embodiment of the present invention, there is provided another computer product including instructions for performing base recognition, the instructions causing a computer to execute the base recognition method according to any one of the above embodiments of the present invention when the computer executes the program.
Those skilled in the art will appreciate that, in addition to implementing the controller/processor in purely computer readable program code means, the same functionality can be implemented entirely by logically transforming method steps into logic such that the controller takes the form of logic gates, switches, application specific integrated circuits, editable logic controllers, embedded microcontrollers and the like. Thus, such a controller/processor may be considered a hardware component, and the means included therein for performing the various functions may also be considered as an arrangement within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
In the description of the present specification, a description of one embodiment, some embodiments, one or some specific embodiments, one or some examples, etc. means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one example or example of the present invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, etc. described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (38)

1. A method for constructing a sequencing template based on images, wherein the images comprise a first image, a second image, a third image and a fourth image corresponding to a same field of view of A, T/U, G and C four base extension reactions, respectively, wherein a plurality of nucleic acid molecules with optically detectable labels are present in the field of view during the base extension reactions, at least a part of the nucleic acid molecules appear as bright spots on the images, four types of base extension reactions are performed at a time in a sequence defined by sequence or simultaneously as a round of sequencing reaction,
the first image comprising image M1 and image M2, the second image comprising image N1 and image N2, the third image comprising image P1 and image P2, the fourth image comprising image Q1 and image Q2,
image M1 and image M2 were from two rounds of sequencing reactions, image N1 and image N2 were from two rounds of sequencing reactions, image P1 and image P2 were from two rounds of sequencing reactions, and image Q1 and image Q2 were from two rounds of sequencing reactions, respectively, the method comprising:
combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 for speckle matching, and causing the image M1, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 to participate in the combining at least once, obtaining a plurality of combined images containing first coincident speckle, two or more speckle having a distance smaller than a first predetermined pixel on the combined image being one first coincident speckle;
and combining the first overlapped bright spots on the plurality of combined images to obtain a bright spot set corresponding to the sequencing template.
2. The method of claim 1, wherein merging the first coincident blobs on the plurality of combined images comprises matching the first coincident blobs in different combined images one or more times to obtain a set of blobs corresponding to the sequencing template.
3. The method of claim 1, wherein image M1, image N1, image P1, and image Q1 are obtained sequentially, image M2, image N2, image P2, and image Q2 are obtained sequentially,
combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 for speckle matching, and causing the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 to participate in the combining at least once, obtaining a plurality of combined images containing first coincident speckle, including:
images M1, M2, N1, N2, P1, and S image pairsCombining the P2, the image Q1 and the image Q2 in pairs to obtain K combined images and matching the bright spots on the combined images, abandoning the non-coincident bright spots on the combined images, wherein S is an integer and is more than or equal to 0 and less than or equal to Smax,SmaxTotal number of images participating in the combination-4.
4. The method of claim 1, wherein the image is a registered image.
5. The method of claim 4, wherein registering the images comprises:
performing first registration on an image to be registered based on a reference image, wherein the reference image and the image to be registered correspond to the same visual field, and the first registration comprises the steps of,
determining a first offset of a preset area on the image to be registered and a corresponding preset area on the reference image, and moving all bright spots on the image to be registered based on the first offset to obtain a first registered image to be registered;
and performing second registration on the image to be registered after the first registration based on the reference image, including,
merging the first registered image to be registered and the reference image to obtain a merged image,
calculating the offset of all second composite bright spots of the predetermined area on the combined image to determine a second offset, two or more bright spots on the combined image having a distance smaller than a second predetermined pixel being one second composite bright spot,
and moving all the bright spots on the image to be registered after the first registration based on the second offset so as to realize the registration of the image to be registered.
6. The method of claim 5, wherein the reference image is obtained by a construction comprising:
acquiring a fifth image and a sixth image, wherein the fifth image and the sixth image correspond to the same visual field as the image to be registered;
performing coarse registration on the sixth image based on the fifth image, wherein the coarse registration comprises determining the offset of the sixth image and the fifth image, and moving the sixth image based on the offset to obtain a coarsely registered sixth image;
and combining the fifth image and the coarsely registered sixth image to obtain a reference image.
7. The method of claim 6, wherein constructing the reference image further comprises using a seventh image and an eighth image, the seventh image and the eighth image being from the same field of view of the sequencing reaction as the image to be registered, the fifth image, the sixth image, the seventh image, and the eighth image corresponding to the field of view of A, T/U, G and the four types of base extension reactions, respectively, and wherein constructing the reference image further comprises:
performing coarse registration on the seventh image based on the fifth image, wherein the coarse registration comprises determining the offset of the seventh image and the fifth image, and moving the seventh image based on the offset to obtain a coarsely registered seventh image;
performing coarse registration on the eighth image based on the fifth image, wherein the coarse registration comprises determining the offset of the eighth image and the fifth image, and moving the eighth image based on the offset to obtain a coarsely registered eighth image;
and merging the fifth image and the coarsely registered sixth image, the coarsely registered seventh image and the coarsely registered eighth image to obtain a reference image.
8. The method according to claim 5, characterized in that the reference image and the image to be registered are binarized images.
9. The method according to any of claims 5 to 8, wherein the first offset, the offset of the sixth image and the fifth image, the offset of the seventh image and the fifth image and/or the offset of the eighth image and the fifth image is determined using a two-dimensional discrete Fourier transform.
10. The method of any one of claims 1-8, further comprising detecting bright spots on the image, including:
preprocessing the image to obtain a preprocessed image;
determining a critical value to simplify the preprocessed image, wherein assignment of pixel values of pixel points on the preprocessed image smaller than the critical value to a first preset value and assignment of pixel values of pixel points on the preprocessed image not smaller than the critical value to a second preset value is carried out to obtain a simplified image;
determining a first speckle detection threshold c1 based on the pre-processed image;
identifying candidate bright spots on the image based on the preprocessed image and the simplified image, including determining a pixel matrix satisfying at least two of the following conditions a) -c) as a candidate bright spot,
a) in the preprocessed image, the pixel value of the central pixel point of the pixel point matrix is maximum, the pixel point matrix can be represented as r1 × r2, r1 and r2 are both odd numbers larger than 1, the r1 × r2 pixel point matrix comprises r1 × r2 pixel points,
b) in the simplified image, the pixel value of the central pixel point of the pixel point matrix is a second preset value, and the connected pixels of the pixel point matrix are larger than
Figure FDA0003057254880000021
And
c) the pixel value of the central pixel point of the pixel point matrix in the preprocessed image is larger than a third preset value and meets g1 g2> c1, g1 is a correlation coefficient of two-dimensional Gaussian distribution in a range of m1 m2 with the central pixel point of the pixel point matrix as the center, g2 is the pixel in the range of m1 m2, m1 and m2 are both odd numbers larger than 1, and m1 m2 includes m1 m2 pixel points.
11. The method of claim 10, further comprising determining whether the candidate hot spot is a hot spot, comprising:
determining a second speckle detection threshold based on the pre-processed image, an
And judging the candidate bright spots with the pixel values not less than the second bright spot detection threshold value as the bright spots.
12. The method of claim 11, wherein the pixel value of the candidate hot spot is a pixel value of a pixel point where coordinates of the candidate hot spot are located.
13. The method of claim 11 or 12, wherein determining whether the candidate hot spot is a hot spot comprises:
the pre-processed image is divided into a set of regions of a predetermined size,
sorting the pixel values of the pixel points in the area to determine a second hot spot detection threshold corresponding to the area,
and judging the candidate bright spots with the pixel values not less than the second bright spot detection threshold value corresponding to the area as the bright spots.
14. The method of claim 10, wherein pre-processing the image comprises:
the background of the image is determined using an on operation,
based on the background, converting the image into a first image by using top hat operation,
performing Gaussian blur processing on the first image to obtain a second image,
and sharpening the second image to obtain a preprocessed image.
15. The method of claim 10, wherein determining a threshold value to simplify the pre-processed image to obtain a simplified image comprises:
determining a threshold value based on the background and the pre-processed image,
and comparing the pixel value of the pixel point on the preprocessed image with a critical value to obtain a simplified image.
16. The method of claim 15, wherein g2 is the modified pixels in m1 m2, and is modified according to the percentage of pixels in the corresponding m1 m2 of the simplified image that have the second predetermined value.
17. A base recognition method comprising matching a spot on an image obtained from a base extension reaction to a set of spots corresponding to a sequencing template, and performing base recognition based on the matched spots, wherein a plurality of nucleic acid molecules having an optically detectable label are present in a field corresponding to the image obtained from the base extension reaction, at least a part of the nucleic acid molecules appear as spots on the image obtained from the base extension reaction, and the set of spots corresponding to the sequencing template is constructed by the method according to any one of claims 1 to 16.
18. The method of claim 17, wherein the presence of any plaque in the set of plaques corresponding to the sequencing template that is less than the third predetermined pixel from any plaque on the image obtained from the base extension reaction is determined to match the set of plaques corresponding to the sequencing template.
19. An apparatus for constructing a sequencing template based on images, wherein the images comprise a first image, a second image, a third image and a fourth image corresponding to a same field of view of A, T/U, G and C four base extension reactions, respectively, wherein a plurality of nucleic acid molecules with optically detectable labels are present in the field of view during the base extension reactions, at least a part of the nucleic acid molecules appear as bright spots on the images, four types of base extension reactions are performed at a time to define a sequence or simultaneously to form a round of sequencing reactions,
the first image comprising image M1 and image M2, the second image comprising image N1 and image N2, the third image comprising image P1 and image P2, the fourth image comprising image Q1 and image Q2,
image M1 and image M2 were from two rounds of sequencing reactions, image N1 and image N2 were from two rounds of sequencing reactions, image P1 and image P2 were from two rounds of sequencing reactions, and image Q1 and image Q2 were from two rounds of sequencing reactions, respectively, the apparatus comprising:
a combination unit for combining any two of the image M1, the image M2, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 for speckle matching, and causing the image M1, the image N1, the image N2, the image P1, the image P2, the image Q1 and the image Q2 to each participate in the combination at least once, obtaining a plurality of combined images containing first coincident speckle, two or more speckle having a distance smaller than a first predetermined pixel on the combined images being one first coincident speckle;
and the merging unit is used for merging the first overlapped bright spots on the plurality of combined images from the combining unit to obtain a bright spot set corresponding to the sequencing template.
20. The apparatus according to claim 19, wherein in the merging unit, merging the first overlapping hot spots on the plurality of combined images comprises performing one or more matching on the first overlapping hot spots in different combined images to obtain a set of hot spots of the corresponding sequencing template.
21. The apparatus of claim 19, wherein the image M1, the image N1, the image P1 and the image Q1 are obtained sequentially, and wherein the image M2, the image N2, the image P2 and the image Q2 are obtained sequentially, and wherein the combining unit is configured to:
every two of the images M1, M2, N1, N2, P1, P2, Q1 and Q2 are combined at intervals of S to obtain K combined images and match bright spots on the combined images, non-coincident bright spots on the combined images are abandoned, S is an integer, and S is more than or equal to 0 and less than or equal to Smax,SmaxTotal number of images participating in the combination-4.
22. The apparatus of claim 19, wherein the image is a registered image.
23. The apparatus of claim 22, further comprising a registration unit for image registration, the registration unit comprising a first registration module and a second registration module,
the first registration module is used for carrying out first registration on the image to be registered based on the reference image, the reference image and the image to be registered correspond to the same visual field, the first registration module comprises a first offset for determining a preset area on the image to be registered and a corresponding preset area on the reference image, and all bright spots on the image to be registered are moved based on the first offset to obtain a first registered image to be registered;
the second registration module is used for carrying out second registration on the first registered image to be registered based on the reference image, and comprises the steps of merging the first registered image to be registered and the reference image to obtain a merged image, calculating the offset of all second merged bright spots of a preset area on the merged image to determine a second offset, taking two or more bright spots with the distance smaller than a second preset pixel on the merged image as one second merged bright spot, and moving all the bright spots on the first registered image to be registered based on the second offset to realize the registration of the image to be registered.
24. The apparatus according to claim 23, wherein the reference image is obtained by construction, the registration unit further comprising a reference image construction module configured to:
acquiring a fifth image and a sixth image, wherein the fifth image and the sixth image correspond to the same visual field as the image to be registered;
performing coarse registration on the sixth image based on the fifth image, wherein the coarse registration comprises determining the offset of the sixth image and the fifth image, and moving the sixth image based on the offset to obtain a coarsely registered sixth image;
and combining the fifth image and the coarsely registered sixth image to obtain a reference image.
25. The apparatus of claim 24, wherein in constructing the reference image using the reference image construction module, further comprising using a seventh image and an eighth image, the seventh image and the eighth image being from the same field of view of the sequencing reaction as the image to be registered, the fifth image, the sixth image, the seventh image, and the eighth image corresponding to the field of view of A, T/U, G and the four types of C base extension reactions, respectively, constructing the reference image further comprises:
performing coarse registration on the seventh image based on the fifth image, wherein the coarse registration comprises determining the offset of the seventh image and the fifth image, and moving the seventh image based on the offset to obtain a coarsely registered seventh image;
performing coarse registration on the eighth image based on the fifth image, wherein the coarse registration comprises determining the offset of the eighth image and the fifth image, and moving the eighth image based on the offset to obtain a coarsely registered eighth image;
and merging the fifth image and the coarsely registered sixth image, the coarsely registered seventh image and the coarsely registered eighth image to obtain a reference image.
26. The apparatus according to claim 23, wherein the reference image and the image to be registered are binarized images.
27. The apparatus according to any of claims 23-26, wherein the first offset, the offset of the sixth and fifth images, the offset of the seventh and fifth images and/or the offset of the eighth and fifth images is determined using a two-dimensional discrete fourier transform.
28. The apparatus according to any one of claims 19-26, further comprising a bright spot detection unit configured to:
preprocessing the image to obtain a preprocessed image;
determining a critical value to simplify the preprocessed image, wherein assignment of pixel values of pixel points on the preprocessed image smaller than the critical value to a first preset value and assignment of pixel values of pixel points on the preprocessed image not smaller than the critical value to a second preset value is carried out to obtain a simplified image;
determining a first speckle detection threshold c1 based on the pre-processed image;
identifying candidate bright spots on the image based on the preprocessed image and the simplified image, including determining a pixel matrix satisfying at least two of the following conditions a) -c) as a candidate bright spot,
a) in the preprocessed image, the pixel value of the central pixel point of the pixel point matrix is maximum, the pixel point matrix can be represented as r1 × r2, r1 and r2 are both odd numbers larger than 1, the r1 × r2 pixel point matrix comprises r1 × r2 pixel points,
b) in the simplified image, the pixel value of the central pixel point of the pixel point matrix is a second preset value, and the connected pixels of the pixel point matrix are larger than
Figure FDA0003057254880000051
And
c) the pixel value of the central pixel point of the pixel point matrix in the preprocessed image is larger than a third preset value and meets g1 g2> c1, g1 is a correlation coefficient of two-dimensional Gaussian distribution in a range of m1 m2 with the central pixel point of the pixel point matrix as the center, g2 is the pixel in the range of m1 m2, m1 and m2 are both odd numbers larger than 1, and m1 m2 includes m1 m2 pixel points.
29. The apparatus of claim 28, wherein the speckle detection unit further comprises a module for determining whether the candidate speckle is a speckle, comprising:
determining a second speckle detection threshold based on the pre-processed image, an
And judging the candidate bright spots with the pixel values not less than the second bright spot detection threshold value as the bright spots.
30. The apparatus of claim 29, wherein the pixel value of the candidate hot spot is a pixel value of a pixel point where coordinates of the candidate hot spot are located.
31. The apparatus according to claim 29 or 30, wherein determining in the hot spot detection unit whether the candidate hot spot is a hot spot comprises:
the pre-processed image is divided into a set of regions of a predetermined size,
sorting the pixel values of the pixel points in the area to determine a second hot spot detection threshold corresponding to the area,
and judging the candidate bright spots with the pixel values not less than the second bright spot detection threshold value corresponding to the area as the bright spots.
32. The apparatus of claim 28, wherein preprocessing the image in the bright spot detection unit comprises:
the background of the image is determined using an on operation,
based on the background, converting the image into a first image by using top hat operation,
performing Gaussian blur processing on the first image to obtain a second image,
and sharpening the second image to obtain a preprocessed image.
33. The apparatus of claim 28, wherein determining a critical value in the speckle detection unit to simplify the pre-processed image to obtain a simplified image comprises:
determining a threshold value based on the background and the pre-processed image,
and comparing the pixel value of the pixel point on the preprocessed image with a critical value to obtain a simplified image.
34. The apparatus of claim 33 wherein g2 is the modified m1 m2 pixels, and is modified according to the percentage of pixels in the corresponding m1 m2 pixels of the simplified image that have the second predetermined value.
35. A base recognition apparatus for matching a spot on an image obtained from a base extension reaction to a set of spots corresponding to a sequencing template, base recognition being performed based on the matched spots, a plurality of nucleic acid molecules having an optically detectable label being present in a field of view corresponding to the image obtained from the base extension reaction, at least a part of the nucleic acid molecules appearing as spots on the image obtained from the base extension reaction, the set of spots corresponding to the sequencing template being constructed by the apparatus according to any one of claims 19 to 34.
36. The apparatus of claim 35, wherein the set of bright spots corresponding to the sequencing template is located closer than the third predetermined pixel to any of the bright spots on the image obtained from the base extension reaction, and the set of bright spots on the image obtained from the base extension reaction is determined to match the set of bright spots corresponding to the sequencing template.
37. A terminal comprising instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1 to 16.
38. A terminal comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of claim 17 or 18.
CN201810961277.9A 2018-08-22 2018-08-22 Method for constructing sequencing template based on image, base identification method and device Active CN112288783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810961277.9A CN112288783B (en) 2018-08-22 2018-08-22 Method for constructing sequencing template based on image, base identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810961277.9A CN112288783B (en) 2018-08-22 2018-08-22 Method for constructing sequencing template based on image, base identification method and device

Publications (2)

Publication Number Publication Date
CN112288783A CN112288783A (en) 2021-01-29
CN112288783B true CN112288783B (en) 2021-06-29

Family

ID=74418958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810961277.9A Active CN112288783B (en) 2018-08-22 2018-08-22 Method for constructing sequencing template based on image, base identification method and device

Country Status (1)

Country Link
CN (1) CN112288783B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657947A (en) * 2015-02-06 2015-05-27 哈尔滨工业大学深圳研究生院 Noise reducing method for basic group image
CN105046665A (en) * 2015-07-22 2015-11-11 哈尔滨工业大学深圳研究生院 Wavelet denoising method for high-throughput gene sequencing image
WO2017212055A1 (en) * 2016-06-10 2017-12-14 F. Hoffmann-La Roche Ag System for bright field image simulation
CN107918931A (en) * 2016-10-10 2018-04-17 深圳市瀚海基因生物科技有限公司 Image processing method and system
CN112289381A (en) * 2018-08-22 2021-01-29 深圳市真迈生物科技有限公司 Method, apparatus and computer program product for image-based construction of sequencing templates

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8189892B2 (en) * 2006-03-10 2012-05-29 Koninklijke Philips Electronics N.V. Methods and systems for identification of DNA patterns through spectral analysis
JP6033301B2 (en) * 2011-07-28 2016-11-30 メディテクト アーベー Method for providing an image of a tissue section
EP3252452A1 (en) * 2016-05-25 2017-12-06 The Board of Trustees of the Leland Stanford Junior University Method for imaging and analysis of a biological specimen
CN106295124B (en) * 2016-07-27 2018-11-27 广州麦仑信息科技有限公司 The method of a variety of image detecting technique comprehensive analysis gene subgraph likelihood probability amounts
CN108229098A (en) * 2016-12-09 2018-06-29 深圳市瀚海基因生物科技有限公司 Monomolecular identification, method of counting and device
CN108345085A (en) * 2017-01-25 2018-07-31 广州康昕瑞基因健康科技有限公司 Focus method and focusing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657947A (en) * 2015-02-06 2015-05-27 哈尔滨工业大学深圳研究生院 Noise reducing method for basic group image
CN105046665A (en) * 2015-07-22 2015-11-11 哈尔滨工业大学深圳研究生院 Wavelet denoising method for high-throughput gene sequencing image
WO2017212055A1 (en) * 2016-06-10 2017-12-14 F. Hoffmann-La Roche Ag System for bright field image simulation
CN107918931A (en) * 2016-10-10 2018-04-17 深圳市瀚海基因生物科技有限公司 Image processing method and system
CN112289381A (en) * 2018-08-22 2021-01-29 深圳市真迈生物科技有限公司 Method, apparatus and computer program product for image-based construction of sequencing templates

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Denoising time-resolved microscopy image sequences with singular value thresholding;TomFurnival等;《http://dx.doi.org/10.1016/j.ultramic.2016.05.005》;20160510;112-124 *

Also Published As

Publication number Publication date
CN112288783A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112823352B (en) Base recognition method, system and sequencing system
CN103543277B (en) A kind of blood group result recognizer based on gray analysis and category identification
CN107945150B (en) Image processing method and system for gene sequencing and computer readable storage medium
US8073233B2 (en) Image processor, microscope system, and area specifying program
WO2020037572A1 (en) Method and device for detecting bright spot on image, and image registration method and device
US11847766B2 (en) Method and device for detecting bright spots on image, and computer program product
WO2014087689A1 (en) Image processing device, image processing system, and program
CN113012757B (en) Method and system for identifying bases in nucleic acids
CN112204615A (en) Fluorescent image registration method, gene sequencer system and storage medium
CN113158895B (en) Bill identification method and device, electronic equipment and storage medium
US12008775B2 (en) Method and device for image registration, and computer program product
CN112289377B (en) Method, apparatus and computer program product for detecting bright spots on an image
CN112289381B (en) Method, device and computer product for constructing sequencing template based on image
CN112288781A (en) Image registration method, apparatus and computer program product
WO2020037571A1 (en) Method and apparatus for building sequencing template on basis of images, and computer program product
CN107274349B (en) Method and device for determining inclination angle of fluorescence image of biochip
CN112288783B (en) Method for constructing sequencing template based on image, base identification method and device
US11170506B2 (en) Method for constructing sequencing template based on image, and base recognition method and device
CN112285070B (en) Method and device for detecting bright spots on image and image registration method and device
JP2000048120A (en) Method for extracting character area of gray level image and recording medium having recorded the program thereon
CN115546145A (en) Defect detection method and device based on machine vision and electronic equipment
CN113128500A (en) Mask-RCNN-based non-motor vehicle license plate recognition method and system
Yang et al. A novel binarization approach for license plate
CN108369735B (en) Method for determining the position of a plurality of objects in a digital image
CN110120118A (en) A kind of identifying system for splicing coin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40035911

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 518000 podium 502A and 502B, podium 602, Luohu investment holding building, No. 112, Qingshuihe 1st Road, Qingshuihe community, Qingshuihe street, Luohu District, Shenzhen, Guangdong

Patentee after: Shenzhen Zhenmai Biotechnology Co.,Ltd.

Country or region after: China

Address before: 518000 5th and 6th floors, block 2, Shenye Jinyuan Building, No.116, Qingshuihe 1st Road, Qingshuihe street, Luohu District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen Zhenmai Biotechnology Co.,Ltd.

Country or region before: China