CN116559142B

CN116559142B - Colony Raman spectrum rapid selection method based on HCA +'

Info

Publication number: CN116559142B
Application number: CN202310523753.XA
Authority: CN
Inventors: 李新立; 洪喜; 赵银苹; 刘闯; 张凯凯
Original assignee: Changguang Chenying Hangzhou Scientific Instrument Co ltd
Current assignee: Changguang Chenying Hangzhou Scientific Instrument Co ltd
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-11-21
Anticipated expiration: 2043-05-11
Also published as: CN116559142A

Abstract

The invention provides a colony Raman spectrum rapid selection method based on HCA+, which uses Raman spectrum technology and an unsupervised clustering method to realize more mixed sample colony selection schemes with fewer selection times according to the complementary relation of different clustering methods on colony Raman spectrum adaptability; compared with the traditional morphological screening method or the derivative method based on the morphological screening method, the invention carries out fingerprint detection on the bacterial colony based on the Raman spectrum of the bacterial colony, realizes more accurate identification of the bacterial colony, and can obviously shorten the time consumption of bacterial selection and promote the bacterial colony selection efficiency on the premise of comprehensively and accurately taking the bacterial selection into account compared with the traditional bacterial selection without deviation; the invention provides a novel quick selection method for colony detection so as to improve colony selection efficiency, and the method is scientific and concise in design and suitable for popularization.

Description

Colony Raman spectrum rapid selection method based on HCA +'

Technical Field

The invention relates to the technical field of colony identification, in particular to a colony Raman spectrum rapid selection method based on HCA+.

Background

Colony detection and identification are an important part of bacterial research, and conventional bacterial identification comprises morphological, physiological, biochemical and genetic characteristics, and the aim of the identification is usually to divide pure culture monoclonal colonies; researchers pick colonies of interest by characterization of colony morphology (e.g., size, morphology, color, transparency, density, and edges), etc., and then conduct identification or other studies.

In reality, the microorganisms from different sources may have similar colony morphology, so that the task of selecting the microbial colonies without deviation of the species is often not performed efficiently, and this problem is particularly remarkable when the sampling amount is large.

The colony Raman spectrum is a fingerprint spectrum of the colony, contains abundant phenotypic information of the colony in a specific physiological state, has the advantages of no marking and no damage to a sample, and provides a new selection method for selecting the colony based on the colony Raman spectrum, so that researchers can quickly select and even determine specific colony species of the colony.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a colony Raman spectrum rapid selection method based on HCA < + >, which greatly improves the selection efficiency and sampling specificity of microorganism colony and reduces the time for acquiring specific microorganisms.

The technical scheme adopted for solving the technical problems is as follows: a colony raman spectrum rapid selection method based on hca+' comprising:

step one: inputting colony Raman spectrum information;

inputting Raman spectrum information of the colony into a spectrum projection module and an HCA+ bacteria picking scheme module;

as an illustration, the colony raman spectral information includes: colony culture dish information, colony information, and the like.

As an illustration, the colony culture dish information includes: number of culture dishes, number of culture dish colonies, coordinates of culture dish colonies, and the like.

As an illustration, the colony information further comprises: colony numbering, colony extraction position coordinates, the number of Raman spectra collected at colony picking positions, and the like.

Step two: spectral projection and colony distribution visualization;

carrying out spectrum projection on the input colony Raman spectrum information;

further, before the spectrum projection, spectrum preprocessing and displacement calibration processing are performed, including the following operations:

step1: spectral pretreatment;

first, removing cosmic rays for raman spectra of each colony;

secondly, carrying out self-adaptive baseline correction on Raman spectra of each colony;

finally, carrying out standardization treatment on the Raman spectrum of each colony so as to realize dimensionless treatment of the spectrum intensity;

as an illustration, the removal of cosmic rays may employ: a median fitting algorithm and/or a second order differential derivative method.

As an illustration, the adaptive baseline correction may employ: an adaptive iterative re-weighting penalty least squares algorithm (Adaptive Iterative Re-weighted Penalized Least Squares, airPLS) and/or an ALS alternating least squares algorithm (alternating least squares, ALS).

As an example, the normalization process includes: normalization methods and/or normalization methods project spectra between [0,1 ].

Step2: performing displacement calibration;

firstly, judging and intercepting a Raman shift intersection area of a colony Raman spectrum;

then, fitting and interpolating the spectra of the intersection region to ensure that the Raman spectra of the input colony have the same spectral resolution under the same displacement;

as an illustration, the fitting and interpolation operation includes: polynomial fitting interpolation and/or spline interpolation.

Step3: spectral projection;

performing nonlinear dimension reduction on the colony spectrum data subjected to spectrum pretreatment and displacement calibration treatment, wherein the nonlinear dimension reduction comprises the following steps:

firstly, converting the space distance of Raman spectrum data of any two colonies in a high-dimensional space into similarity probability;

then, replacing the conditional probability of random neighborhood embedding with the probability between the high-dimensional space data point and the analog data point of the corresponding low-dimensional space, and using a distribution strategy in the low-dimensional space to relieve the problem of data point crowding in the low-dimensional space;

as an illustration, the spatial distance includes: one or a combination of euclidean distance, manhattan distance, min Shi distance, or entropy of information.

As an illustration, the distribution strategy is: t distribution and/or normal distribution.

Step three: "HCA+" picking protocol;

an "HCA" moiety;

input:

(1) Colony Raman spectrum data set D= { x ₁₁ ,x ₁₂ ,...,x _ab A represents the number of colonies, b represents the number of raman spectra collected for each colony, and the number of datasets n=a×b;

(2) The number of clusters is K ₁ ,K ₁ ≤a；

(3) Colony selection threshold of N ₁ ,N ₁ ≤b；

(4) Colony raman spectral space distance measurement

Wherein: p and q are the spectral spatial positions of two colonies;

and (3) outputting:

picking a colony number set: c (C) _t ＝{C _NO.1 ,C _NO.2 ,…C _NO.t }；；

Clustering: initializing the number of clusters count=n, wherein count is used for controlling the iteration times of the clustering loop, i.e. the Raman spectrum in each colony is independently set as a cluster C _i Wherein: i=1, 2,. -%, n;

step1: merging two clusters with the smallest distance in the current cluster number;

step2: the clusters are renumbered after merging, and the distances between the new numbers and other clusters are calculated;

step3: count=count-1, repeat Step1-2 until count=k ₁ When the iterative process is finished, the current clustering result C= { C is output ₁ ,C ₂ ,…C _k1 }；

According to the current clustering result, K is adjusted ₁ Value, re-executing Step1-3;

step4: statistics of intra-cluster C _i The spectrum number of each colony is sequenced from big to small, and whether the spectrum number of the largest colony is larger than or equal to N is judged ₁ If satisfied, will beMaximum colony number C _NO.i Store to C _t In the set, otherwise, traversing the next cluster until all K is traversed ₁ Clustering clusters;

according to the current C _t Statistics result, adjust N ₁ Value, re-execute Step4, determine "HCA" part final C _t 。

The "+" section, comprising the steps of:

input:

(1) The colony Raman spectrum data set is D;

(2) Number of clusters K ₂ ，K ₂ ≤a；

(3) Colony selection threshold N ₂ ，N ₂ ≤b；

(4) Picking colony number set C outputted by HCA _t ；

And (3) outputting: picking colony number set C _t ＝{C _NO.1 ,C _NO.2 ,…C _NO.t }；

Clustering:

s1: selecting a clustering method of non-HCA distance measurement dist (p, q), carrying out clustering statistics on a colony Raman spectrum dataset, and adjusting K according to a statistical result ₂ Value, readjusting clustering result;

as an illustration, the cluster statistics includes: a partition-based cluster, a hierarchical-based cluster, a density-based cluster, and/or a grid-based cluster.

S2: counting the spectrum number of each colony in each cluster, sequencing from big to small, and judging whether the spectrum number of the largest colony is more than or equal to N ₂ If the number is satisfied, the maximum colony number C is determined _NO.i Whether or not it belongs to C _t Aggregate, if not, C _NO.i Store to C _t A collection;

according to "HCA+" two-part C _t Final statistics, N is adjusted ₂ Value, re-executing step S2, and outputting final picked colony number set C _t ；

Step four: picking colony information;

according to step threePicking colony number set C returned by HCA+' picking scheme _t C is carried out by _t The corresponding information of the colony culture dish and the colony coordinate information are sent to colony picking equipment for colony picking.

As an illustration, to further verify the picking accuracy, the colonies selected were placed in PCR tubes and identified by 16s-PCR sequencing.

The invention has the beneficial effects that:

according to the invention, a Raman spectrum technology and an unsupervised clustering method are used, and more mixed sample colony picking schemes are realized with fewer picking times according to the complementary relation of different clustering methods on the adaptability of the colony Raman spectrum.

Compared with the traditional morphological screening method or the derivative method based on the morphological screening method, the invention carries out fingerprint detection on the bacterial colony based on the Raman spectrum of the bacterial colony, realizes more accurate identification of the bacterial colony, and can obviously shorten the time consumption of bacterial selection and promote the bacterial colony selection efficiency on the premise of comprehensively and accurately taking the bacterial selection into account compared with the bacterial colony selection without deviation in the traditional bacterial selection process.

The invention provides a novel quick selection method for colony detection so as to improve colony selection efficiency, and the method is scientific and concise in design and suitable for popularization.

Drawings

FIG. 1 is a block diagram of the flow scheme of a colony Raman spectrum rapid selection method based on HCA+.

FIG. 2 is a graph of a record of colony Raman spectrum information from a "HCA+" based rapid colony Raman spectrum selection method of the present invention.

FIG. 3 is a visual result of spectral projection after nonlinear dimension reduction of colony spectral data of a colony Raman spectrum rapid selection method based on HCA+.

FIG. 4 shows the final clustering results of the "HCA" portion of a "HCA+" based colony Raman spectroscopy rapid selection method of the present invention.

FIG. 5 is a collection of "HCA" partial picking spectra of a "HCA+" based rapid selection method of colony Raman spectra in accordance with the present invention.

FIG. 6 shows Kmeans clustering results of a colony Raman spectrum quick selection method based on HCA+.

FIG. 7 is a collection of picked colony numbers for a "HCA+" based rapid selection method of colony Raman spectroscopy in accordance with the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Referring to fig. 1 to 7, a colony raman spectrum rapid picking method based on "hca+", comprising:

step one: inputting colony Raman spectrum information;

as one illustration, the colony raman spectral information includes, but is not limited to: colony culture dish information, colony information, and the like.

As an illustration, as shown in fig. 2, the colony culture dish information includes, but is not limited to: number of culture dish (such as A2-1, A2-2, C2 …), number of colony of culture dish, colony coordinates (colony) of culture dish, etc.;

the colony information includes, but is not limited to: colony numbers (No. 1, no.2, no.3 and …), colony extraction position coordinates (loc 3), the number of Raman spectra collected at colony picking positions (9), and the like.

Step two: spectral projection and colony distribution visualization;

step1: spectral pretreatment;

firstly, removing cosmic rays from the Raman spectrum of each colony, namely removing background signals of the Raman spectrum;

secondly, performing self-adaptive baseline correction;

finally, the normalization treatment is carried out to realize the dimensionless treatment of the spectrum intensity.

Further, the method of removing cosmic rays may use: a median fitting algorithm, a second-order difference derivation method and the like;

as an illustration, applying the airPLS algorithm based to remove background signals of raman spectra, lambda=100 is set, maximum number of iterations itermaxair pls=15.

As an example, the normalization process includes: normalization methods and/or normalization methods.

As an example, raman spectrum (SCRS) data for each colony was normalized using Min-Max, and the spectrum was projected between [0,1 ].

Step2: performing displacement calibration;

Step3: spectral projection;

As an illustration, the T distribution refers to the nonlinear dimension reduction of colony spectrum data after the pretreatment and the displacement calibration treatment by using T-SNE (T-SNE: T-Distribution Stochastic Neighbour Embedding, T distribution random nearest neighbor embedding), and the visualization result after the spectrum projection is shown in fig. 3.

Step three: "HCA+" picking protocol;

an "HCA" moiety;

input:

(1) Colony Raman spectrum data set D= { x ₁₁ ,x ₁₂ ,...,x _ab A represents the number of colonies, where the number of colonies is 7, b represents the number of raman spectra collected for each colony, where the number of raman spectra collected for a colony is 9, and the number of datasets n=a×b=63;

(2) The number of clusters is K ₁ ＝3，K ₁ ≤a；

(3) Colony selection threshold N ₁ ＝7,N ₁ ≤b；

(4) Colony raman spectral space distance measurement

Wherein: p and q are the spectral spatial positions of two colonies;

and (3) outputting:

picking a colony number set: c (C) _t ＝{C _NO.1 ,C _NO.2 ,…C _NO.t }；

According to the current clustering result, K is adjusted ₁ Value, re-executing Step1-3; the final clustering result is shown in fig. 4, and 3 different shape points represent 3 clustering results.

Step4: statistics of intra-cluster C _i The spectrum number of each colony is sequenced from big to small, and whether the spectrum number of the largest colony is larger than or equal to N is judged ₁ If satisfied, the maximum colony number C _NO.i Store to C _t In the set, otherwise, traversing the next cluster until all K is traversed ₁ Clustering clusters;

according to the current C _t Statistics result, adjust N ₁ And (5) carrying out Step4 again, wherein fig. 5 shows a bacterial picking spectrum set of the HCA part, and the colony numbers corresponding to the set are used as output and fed back to the bacterial picking equipment for bacterial picking.

The "+" section, comprising the steps of:

input:

(1) The colony Raman spectrum data set is D;

(2) Number of clusters K ₂ ，K ₂ ≤a；

(3) Colony selection threshold N ₂ ，N ₂ ≤b；

(4) Picking colony number set C outputted by HCA _t ；

Clustering:

s1: selecting Kmeans clustering method of non-HCA distance measurement dist (p, q), carrying out clustering statistics on colony Raman spectrum dataset, and adjusting K according to statistical result ₂ =4 values, readjusting the clustering result; kmeans clustering results are shown in FIG. 6;

S2: counting the spectrum number of each colony in each cluster from large to smallSequencing, judging whether the spectrum number of the maximum colony is greater than or equal to N ₂ =5, and if satisfied, the maximum colony number C is determined _NO.i Whether or not it belongs to C _t Aggregate, if not, C _NO.i Store to C _t A collection;

according to "HCA+" two-part C _t Final statistics, N is adjusted ₂ Value, re-executing step S2, and outputting final picked colony number set C _t The method comprises the steps of carrying out a first treatment on the surface of the The final picking result is shown in 7.

Step four: picking colony information;

picking colony number set C returned according to the HCA+ picking scheme of step three _t As shown in FIG. 7, C _t The corresponding information of the colony culture dish and the colony coordinate information are sent to colony picking equipment for colony picking.

The foregoing description of the preferred embodiments of the present invention has been presented only to facilitate the understanding of the principles of the invention and its core concepts, and is not intended to limit the scope of the invention in any way, however, any modifications, equivalents, etc. which fall within the spirit and principles of the invention should be construed as being included in the scope of the invention.

Claims

1. A colony raman spectrum rapid selection method based on hca+ "comprising:

step one: inputting colony Raman spectrum information;

step two: spectral projection and colony distribution visualization;

step three: "HCA+" picking protocol;

an "HCA" moiety;

input:

(2) the number of clusters is K ₁ ,K ₁ ≤a；

(3) Colony selection threshold of N ₁ ,N ₁ ≤b；

(4) Colony raman spectral space distance measurement

Wherein: p and q are the spectral spatial positions of two colonies;

and (3) outputting:

picking a colony number set: c (C) _t ＝{C _NO.1 ,C _NO.2 ,…C _NO.t }；

according to the current timeFront C _t Statistics result, adjust N ₁ Value, re-execute Step4, determine "HCA" part final C _t ；

The "+" section includes the steps of:

input:

(1) the colony Raman spectrum data set is D;

(2) number of clusters K ₂ ，K ₂ ≤a；

(3) Colony selection threshold N ₂ ，N ₂ ≤b；

(4) Picking colony number set C outputted by HCA _t ；

Clustering:

Step four: picking colony information;

picking colony number set C returned according to the HCA+ picking scheme of step three _t C is carried out by _t The corresponding information of the colony culture dish and the colony coordinate information are sent to colony picking equipment for colony picking.

2. The rapid colony raman spectrum selection method based on hca+ according to claim 1, wherein the spectrum pretreatment and displacement calibration process are performed before the spectrum projection, comprising the following operations:

step1: spectral pretreatment;

first, removing cosmic rays for raman spectra of each colony;

step2: performing displacement calibration;

step3: spectral projection;

then, the probability between the data points in the high-dimensional space and the simulated data points in the corresponding low-dimensional space is used for replacing the conditional probability of random neighborhood embedding, and a distribution strategy is used in the low-dimensional space, so that the problem of data point crowding in the low-dimensional space is solved.

3. A colony raman spectroscopy rapid picking method based on "hca+" according to claim 2, wherein the colony raman spectroscopy information comprises: colony culture dish information and colony information;

the colony culture dish information includes: the culture dish number, the number of culture dish colonies and the culture dish colony coordinates;

the colony information includes: the number of the colony, the coordinates of the extraction position of the colony and the number of the Raman spectrum collected by the picking position of the colony.

4. The method for rapid selection of colony raman spectrum based on hca+ according to claim 2, wherein the removing cosmic rays can be performed by: a median fitting algorithm and/or a second order differential derivative method.

5. A colony raman spectrum rapid picking method based on "hca+" according to claim 2, wherein the adaptive baseline correction is performed by: an adaptive iterative re-weighting penalty least squares algorithm and/or an ALS alternating least squares algorithm.

6. A colony raman spectroscopy rapid picking method based on "hca+" according to claim 2, wherein the normalization process comprises: normalization methods and/or normalization methods project spectra between [0,1 ].

7. A colony raman spectrum rapid picking method based on "hca+" according to claim 2, wherein said fitting and interpolating operations comprise: polynomial fitting interpolation and/or spline interpolation.

8. A colony raman spectroscopy rapid picking method based on "hca+" according to claim 2, wherein said spatial distance comprises: one of euclidean distance, manhattan distance, min Shi distance, or entropy of information.

9. The method of claim 2, wherein the distribution strategy is: t distribution and/or normal distribution.

10. The rapid colony raman spectrum selection method based on hca+ according to claim 1, wherein the clustering statistical method comprises: a partition-based cluster, a hierarchical-based cluster, a density-based cluster, and/or a grid-based cluster.