CN114993891A - Raman particulate matter detection method based on cosine similarity - Google Patents
Raman particulate matter detection method based on cosine similarity Download PDFInfo
- Publication number
- CN114993891A CN114993891A CN202210829708.2A CN202210829708A CN114993891A CN 114993891 A CN114993891 A CN 114993891A CN 202210829708 A CN202210829708 A CN 202210829708A CN 114993891 A CN114993891 A CN 114993891A
- Authority
- CN
- China
- Prior art keywords
- spectrum
- characteristic peak
- vector
- raman
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001069 Raman spectroscopy Methods 0.000 title claims abstract description 47
- 238000001514 detection method Methods 0.000 title claims abstract description 46
- 239000013618 particulate matter Substances 0.000 title claims description 32
- 238000001237 Raman spectrum Methods 0.000 claims abstract description 57
- 238000000034 method Methods 0.000 claims abstract description 53
- 230000003595 spectral effect Effects 0.000 claims abstract description 41
- 239000002245 particle Substances 0.000 claims abstract description 40
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 238000001228 spectrum Methods 0.000 claims description 172
- 239000013598 vector Substances 0.000 claims description 127
- 238000004364 calculation method Methods 0.000 claims description 34
- 238000010606 normalization Methods 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000010998 test method Methods 0.000 claims description 3
- 238000002759 z-score normalization Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 abstract description 8
- 238000011156 evaluation Methods 0.000 abstract description 7
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000002347 injection Methods 0.000 description 5
- 239000007924 injection Substances 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000003908 quality control method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000008187 granular material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000790 scattering method Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N15/00—Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/62—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
- G01N21/63—Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
- G01N21/65—Raman scattering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Dispersion Chemistry (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
Abstract
The invention provides a particle Raman detection method based on cosine similarity by combining a Raman spectrum technology with a characteristic peak position extraction and similarity evaluation method. The problem of calculating applicability of cosine similarity of different spectral pixel lengths is solved, spectral characteristic peak positions of different resolutions or different correction methods are calibrated through a peak position correction system, and spectral similarity matching accuracy based on characteristic peaks is improved.
Description
Technical Field
The invention relates to the technical field of Raman detection, in particular to a Raman detection method for particles based on cosine similarity.
Background
Insoluble particles (also called granules) are an important detection index in pharmacopeia specifications, and medical safety problems caused by the insoluble particles in injection are concerned by a plurality of researchers; due to the limitation of the traditional detection method, the detection judgment of insoluble particles in the injection is low, and the quality control of the injection is adversely affected.
The Raman spectrum of the particles in the injection refers to the fingerprint spectrum of the particles, comprises molecular information of known components and unknown components, and has the characteristics of large information amount, strong characteristics and the like; chemical information (relative peak position) reflected by the particle Raman spectrum has high specificity, and is an effective method for realizing particle identification, safety evaluation and quality control; in the prior art, a commonly used raman spectral characteristic peak detection method is a Python's scipy.signal.find _ peaks function, which is particularly sensitive to spectral signals containing noise, especially sharp peaks in noise data, and if a wavelet window or a relative intensity threshold is improperly set, the sharp noise may change the position of a local maximum value and cannot better adapt to spectral peaks of different shapes; in addition, the function cannot effectively identify the spectral boundary, and the spectral boundary is always contained in the extracted peak position sequence, so that the Python's scale.
The particle Raman detection based on cosine similarity measures the similarity between the particles by comparing the cosine values of included angles of inner product spaces among 2 particle spectrum column vectors, and when the cosine values of the included angles of the two vectors are equal to 1, the two spectra are completely repeated; when the cosine value of the included angle is close to 1, the two spectrums are similar; the smaller the cosine of the angle, the more dissimilar the two spectra. In the prior art, the length (namely the number of pixel points) of two matching spectrum column vectors is required to be equal by Raman detection based on cosine similarity, and the method is strictly limited in applicability and is not suitable for wide-range application and popularization.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a particle Raman detection method based on cosine similarity, and the Raman spectrum detection method is combined with the characteristic peak extraction and similarity evaluation method through two-part architecture design of a particle Raman spectrum characteristic peak extraction method and a cosine similarity calculation method based on the characteristic peak, so that a particle detection and identification method is provided, and the progress of an injection quality detection method is greatly improved.
A particle Raman detection method based on cosine similarity comprises the following steps: the method comprises a particle Raman spectrum characteristic peak extraction method and a cosine similarity calculation method based on the characteristic peak, wherein the method comprises the following steps:
the first method comprises the following steps of: the method can effectively identify sharp spectral noise and spectral boundaries, detect characteristic peak position information from different scales and amplitudes, improve the identification rate of spectral characteristic peaks, and improve the matching precision of a cosine similarity calculation method based on the characteristic peaks, and comprises the following specific steps of:
step 1.1, calculating the Raman spectrum resolution of the particles, and setting a wavelet window width threshold according to the Raman spectrum resolution of the particles;
further, the Raman spectrum resolution of the particulate matter can be represented by the Raman shift difference between two adjacent pixel points of the spectrum;
further, the wavelet window width threshold is inversely proportional to the raman spectral resolution of the particulate matter, and the higher the raman spectral resolution of the particulate matter is, the smaller the wavelet window width threshold is set;
further, the wavelet window width threshold is in direct proportion to the width of the raman spectrum characteristic peak of the particulate matter (i.e. the raman shift difference of the characteristic peak), and the wider the raman spectrum characteristic peak of the particulate matter is, the larger the wavelet window width threshold is set;
further, the wavelet window width threshold is inversely proportional to the particle Raman spectrum characteristic peak concentration; the denser the characteristic peak of the Raman spectrum of the particulate matter is, the smaller the width threshold of the wavelet window is set;
as an illustration, the wavelet window width threshold variable range is: 5-30 pixel points span the width;
step 1.2, setting a wavelet window height threshold according to the relative intensity of the Raman spectrum of the particulate matter;
as an illustration, the relative intensity of the raman spectrum refers to the relative height between characteristic peaks after the raman spectrum normalization process;
as an illustration, the normalization process includes: min-max normalization, z-score normalization, or decimal scaling normalization, etc.;
as an illustration, the wavelet window height threshold may be set between 5% -20% of the maximum peak height.
Step 1.3, according to the wavelet window width threshold and the wavelet window height threshold set in the step 1.1 and the step 1.2, carrying out characteristic peak position detection on the particle matching spectrum and the database spectrum by using continuous wavelet transform, wherein characteristic peak positions of the matching spectrum and the database spectrum detection are respectively stored in a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the characteristic peak position refers to a spectral characteristic peak index, that is, a raman shift pixel point index sequence corresponding to elements in P1 and P2, and the raman shift unit may be a pixel index sequence, a wave number (cm) -1 ) Wavelength (nm), etc.;
by way of illustration, the elements in the matching spectrum characteristic peak vector P1 represent the characteristic peak positions of the matching spectrum, and the length P1, that is, the number of elements included in the vector, represents the number of characteristic peaks of the matching spectrum;
by way of illustration, the elements in the database spectrum characteristic peak vector P2 represent the database spectrum characteristic peak positions, and the length P2, i.e. the vector, contains the number of elements, which represents the number of database spectrum characteristic peaks;
as an example, the method performs optimization processing on the detection of the characteristic peak position on the basis of a scipy, signal, find, peaks function;
further, based on the influence of sharp noise peaks and spectrum boundaries on the function of scale, signal, find _ peaks, the present invention secondarily corrects the characteristic peak vector corresponding to the characteristic peak position by fitting the local maximum of all characteristic peak positions of the spectrum and specifying the relative intensity and width of the spectrum, as detailed in step 1.4.
Step 1.4, extracting a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, performing secondary calibration, eliminating interference information faced in spectrum burr noise (step 1.3), solving the problem that the spectrum boundary cannot be effectively identified by the scipy.signal.find _ peaks function, and determining a final characteristic peak vector;
directly removing the first and last elements of the scipy, signal, find and peak functions to extract a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, and updating the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2;
because the elements of the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2 detected in the step 1.3 are a monotone increasing sequence, the difference between two continuous elements is judged to be more than 5 pixel points according to the Raman spectrum characteristics of the particulate matters;
further, traversing all elements P1 and P2, comparing the intensities of two adjacent peak positions of less than 5 pixel points, and selecting a larger peak position as a vector element; or fitting adjacent peak positions, taking the extreme points as vector elements, and updating a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the fitting method may be a lorentzian fit, a gaussian fit, or a polynomial fit;
as an example, the method for extracting the characteristic peak of the raman spectrum of the particulate matter mainly includes the operations of peak searching (step 1.3) and correction (step 1.4) of the raman spectrum after the normalization processing in the step 1.2, decomposing the raman spectrum of the particulate matter according to the width and the height of a window, determining the position of the characteristic peak of the spectrum based on wavelet transformation, and setting a threshold value to perform secondary calibration on the position of the characteristic peak so as to improve the recognition rate of the characteristic peak of the spectrum.
The second method is a cosine similarity calculation method based on characteristic peaks, and an applicable cosine similarity matching method is developed aiming at matching spectrum characteristic peak vectors P1 and database spectrum characteristic peak vectors P2 at different characteristic peak positions and is used for calculating the similarity of the matching spectrum characteristic peak vectors P1 and the database spectrum characteristic peak vectors P2, and the method comprises the following specific steps:
step 2.1, converting the index of the characteristic peak into wave number, wherein the detection of the characteristic peak position in the step 1.3 is carried out based on pixel points of a spectrum column vector, and the index sequence of the pixel of the characteristic peak corresponding to the characteristic peak position needs to be converted into a wave numerical value;
further, the pixel index sequence is used as an increasing sequence starting from 1, and the index sequence can be directly and uniquely mapped to increasing wave number values;
step 2.2, calibrating the characteristic peak position in the matching spectrum characteristic peak vector P1 in the step 1.3 and the database spectrum characteristic peak vector P2;
further, sequentially traversing all characteristic peak positions (elements) of the P1 and the P2, and when the difference between the peak positions of the P1 and the P2 is smaller than a certain threshold value, calibrating the P2 by using a temporarily matched forced conversion strategy by taking the P1 as a standard, and updating the P2;
the forced conversion of the temporary matching forced conversion strategy to the database spectrum only acts on the matching of the current matching spectrum and the database spectrum, and does not change the original characteristic peak position and the characteristic peak quantity of the database;
as an illustration, the certain threshold is set according to the spectral resolution;
as an example, the certain threshold is 5 spectral pixels or 5 × PN, and PN is the span wave number length of two pixels of the raman spectrum;
step 2.3, based on the cosine similarity calculation of the characteristic peak, developing a cosine similarity matching method with applicability for P1 and P2 with different vector lengths after the step 2.2;
furthermore, the similarity calculation method based on the characteristic peak can use vector included angle cosine, total quantity statistical matrix similarity, hypothesis test method, Euclidean distance, Mahalanobis distance and other methods;
furthermore, the invention uses the cosine of the included angle of the vector to calculate the similarity of the characteristic peak, and develops a cosine similarity calculation method based on the characteristic peak, and the cosine similarity calculation formula of the conventional characteristic peak is as follows:
the characteristic peak refers to a column vector of the raman spectrum, that is, an intensity value corresponding to a raman shift pixel point, where a matching spectrum column vector X ═ X 1 ,x 2 ,…,x N ) In, x i Representing the ith element, the database spectrum column vector Y ═ Y 1 ,y 2 ,…,y N ) In, y i Denotes the ith element, where X and Y are required to be both N in length;
however, under the existing conditions, due to the fact that calibration standards, pixel resolutions and the like of different spectrometer manufacturers are different, the lengths of column vectors of a matched spectrum and a database spectrum cannot be guaranteed to be consistent, and the applicability of the cosine similarity calculation formula (1) based on the characteristic peak is limited;
furthermore, the vectors X and Y in the formula (1) are replaced by the characteristic peak vectors P1 and P2, so that the condition that the formula (1) cannot be used due to different lengths of the column vectors of the matched spectrum and the database spectrum can be effectively avoided, and the cosine similarity calculation formula based on the characteristic peaks is replaced by:
wherein A and B are respectively a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak position vector P2, a i b i The ith vector elements are P1 and P2 respectively, M represents the number of the same elements contained in P1 and P2 after the peak position calibration in step 2.2, and the cosine similarity of the characteristic peak of the matched spectrum and the database spectrum is calculated according to the formula (2).
As an example, the cosine similarity calculation method based on the characteristic peak mainly develops a cosine similarity matching method with applicability for two characteristic vectors with different lengths, and is used for evaluating the similarity between a detected matching spectrum and a database spectrum.
The invention has the beneficial effects that:
the invention aims to provide a Raman detection method of particulate matters based on cosine similarity, which comprises the steps of optimizing a scipy.signal.find _ peaks peak position detection function, providing a Raman spectrum characteristic peak position extraction method of the particulate matters, identifying sharp spectrum noise and spectrum boundaries, detecting characteristic peak position information from different scales and amplitudes, and improving the identification rate of a spectrum characteristic peak; secondly, in the cosine similarity calculation method based on the characteristic peak, a vector cosine similarity calculation method of different characteristic peak numbers and peak positions is provided, the problem of applicability of cosine similarity calculation of different spectral pixel lengths is solved, spectral characteristic peak positions of different resolutions or different correction methods are calibrated through a peak position correction system, and spectral similarity matching precision based on the characteristic peak is improved.
The invention provides a particle detection and identification method by combining a Raman spectrum technology with a feature extraction and similarity evaluation method, and is an effective method for realizing particle identification, safety evaluation and quality control.
Drawings
FIG. 1 is a schematic diagram of the overall process design of a Raman particle detection method based on cosine similarity according to the present invention
FIG. 2 is a schematic diagram of the peak position of the matched spectrum characteristic of the Raman detection method of particles based on cosine similarity in the present invention
FIG. 3 is a schematic diagram of a characteristic peak position of a spectrum of a database of the Raman scattering method of particles based on cosine similarity.
Detailed Description
Preferred embodiments of the present invention will be described in detail with reference to fig. 1 to 3.
A particle Raman detection method based on cosine similarity comprises the following steps: the method 100 for extracting the characteristic peak of the Raman spectrum of the particulate matter and the method 200 for calculating the cosine similarity based on the characteristic peak are disclosed, wherein:
the first method is a particle Raman spectrum characteristic peak extraction method 100: the method can effectively identify sharp spectral noise and spectral boundaries, detect characteristic peak position information from different scales and amplitudes, improve the identification rate of spectral characteristic peaks, and improve the matching precision of a cosine similarity calculation method based on the characteristic peaks, and comprises the following specific steps of:
step 1.1, calculating the Raman spectrum resolution of the particles, and setting a wavelet window width threshold 101 according to the Raman spectrum resolution of the particles;
FIG. 2 shows the matched spectrum S1, FIG. 3 shows the database spectrum S2, the spectral resolutions of the matched spectrum S1 and the database spectrum S2 are uniformly distributed, and both are 2cm -1 。
The wavelet window width threshold is set as: 16 × PN, PN being the spectral resolution of S1 or S2;
further, the Raman spectrum resolution of the particulate matter can be represented by the Raman shift difference between two adjacent pixel points of the spectrum;
further, the wavelet window width threshold is inversely proportional to the particle raman spectrum resolution 105, and the wavelet window width threshold is set to be smaller when the particle raman spectrum resolution is higher;
further, the wavelet window width threshold is directly proportional to the particulate matter raman spectrum characteristic peak width 106 (i.e. the raman shift difference of the characteristic peak), and the wider the particulate matter raman spectrum characteristic peak is, the larger the wavelet window width threshold is set;
further, the wavelet window width threshold is inversely proportional to the particle raman spectrum characteristic peak concentration 107; the denser the characteristic peak of the Raman spectrum of the particulate matter is, the smaller the width threshold of the wavelet window is set;
as an illustration, the wavelet window width threshold variable range is: 5-30 pixel points span the width;
step 1.2, setting a wavelet window height threshold value 102 according to the relative intensity of the Raman spectrum of the particulate matter; 102 where the wavelet window height threshold is set to 0.08, smaller characteristic peaks in the database spectrum S2 and the matched spectrum S1 can be monitored;
as an illustration, the relative intensity of the raman spectrum refers to the relative height between characteristic peaks after the raman spectrum normalization process;
as an illustration, the normalization process includes: min-max normalization, z-score normalization, or decimal scaling normalization, etc.;
as an illustration, the wavelet window height threshold may be set between 5% -20% of the maximum peak height.
Step 1.3, according to the wavelet window width threshold and the wavelet window height threshold set in the step 1.1 and the step 1.2, using continuous wavelet transform to perform characteristic peak position detection 103 on the particle matching spectrum and the database spectrum, wherein characteristic peak positions detected by the matching spectrum S1 and the database spectrum S2 are respectively stored in a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the characteristic peak position refers to a spectral characteristic peak index, that is, a raman shift pixel point index sequence corresponding to elements in P1 and P2, and the raman shift unit may be a pixel index sequence, a wave number (cm) -1 ) Wavelength (nm), etc.;
by way of illustration, the elements in the matching spectrum characteristic peak vector P1 represent the characteristic peak positions of the matching spectrum, and the length P1, that is, the number of elements included in the vector, represents the number of characteristic peaks of the matching spectrum;
by way of illustration, the elements in the database spectrum characteristic peak vector P2 represent the database spectrum characteristic peak positions, and the length P2, i.e. the vector, contains the number of elements, which represents the number of database spectrum characteristic peaks;
as an example, the method performs optimization processing on the detection of the characteristic peak position on the basis of a scipy, signal, find, peaks function;
further, based on the influence of sharp noise peaks and spectrum boundaries on the scipy.signal.find _ peaks function, the present invention secondarily corrects the characteristic peak vector corresponding to the characteristic peak position by fitting the local maximum values of all characteristic peak positions of the spectrum and specifying the relative intensity and width of the spectrum, which is detailed in step 1.4.
Step 1.4, extracting a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, performing secondary calibration, eliminating interference information faced in spectrum burr noise (step 1.3), solving the problem that the spectrum boundary cannot be effectively identified by the scipy.signal.find _ peaks function, and determining a final characteristic peak vector 104;
directly removing the first and last elements of the scipy, signal, find and peak functions to extract a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, and updating the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2;
because the elements of the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2 detected in the step 1.3 are a monotone increasing sequence, the difference between two continuous elements is judged to be more than 5 pixel points according to the Raman spectrum characteristics of the particulate matters;
furthermore, traversing all elements P1 and P2, comparing the intensities of two adjacent peak positions of less than 5 pixel points, and selecting a larger peak position as a vector element; or fitting adjacent peak positions, taking the extreme points as vector elements, and updating a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the fitting method may be a lorentzian fit, a gaussian fit, or a polynomial fit;
as an example, the method for extracting a characteristic peak of a raman spectrum of a particulate matter mainly includes operations of peak searching (step 1.3) and correction (step 1.4) on the raman spectrum after the normalization processing in step 1.2, decomposing the raman spectrum of the particulate matter according to the window width and height, determining a characteristic peak position of the spectrum based on wavelet transformation, and setting a threshold value to perform secondary calibration on the characteristic peak position so as to improve the identification rate of the characteristic peak of the spectrum.
The second method 200 is a cosine similarity calculation method based on characteristic peaks, and develops an applicable cosine similarity matching method aiming at matching spectrum characteristic peak vectors P1 and database spectrum characteristic peak vectors P2 of different characteristic peak positions, wherein the cosine similarity matching method is used for calculating the similarity of the matching spectrum characteristic peak vectors P1 and the database spectrum characteristic peak vectors P2, and comprises the following specific steps:
step 2.1, converting the characteristic peak index into wave number 201, wherein the detection of the characteristic peak position in the step 1.3 is performed based on pixel points of the spectral column vector, and the characteristic peak pixel index sequence corresponding to the characteristic peak position needs to be converted into a wave numerical value;
further, the pixel index sequence is used as an increasing sequence from 1, and the index sequence can be directly and uniquely mapped to increasing wave number values;
step 2.2, calibrating 202 the characteristic peak position in the matching spectrum characteristic peak vector P1 in the step 1.3 and the database spectrum characteristic peak vector P2;
further, sequentially traversing all characteristic peak positions (elements) of the P1 and the P2, searching for a difference between the P1 and the P2 which is smaller than a certain threshold, calibrating the P2 by using a temporary matching forced conversion strategy by taking the P1 as a standard, and updating the P2;
the forced conversion of the temporary matching forced conversion strategy to the database spectrum only acts on the matching of the current matching spectrum and the database spectrum, and does not change the original characteristic peak position and the characteristic peak quantity of the database;
as an illustration, the certain threshold is set according to the spectral resolution;
as an example, the certain threshold is 5 spectral pixels or 5 × PN, and PN is the span wave number length of two pixels of the raman spectrum;
step 2.3, calculating 203 based on the cosine similarity of the characteristic peak, and developing a cosine similarity matching method with applicability according to P1 and P2 with different vector lengths after the step 2.2;
furthermore, the similarity calculation method based on the characteristic peak can use vector included angle cosine, total quantity statistical matrix similarity, hypothesis test method, Euclidean distance, Mahalanobis distance and other methods;
furthermore, the invention uses the cosine of the included angle of the vector to calculate the similarity of the characteristic peak, and develops a cosine similarity calculation method based on the characteristic peak, and the cosine similarity calculation formula of the conventional characteristic peak is as follows:
the characteristic peak refers to a column vector of the raman spectrum, that is, an intensity value corresponding to the raman shift pixel point, wherein a matching spectrum column vector X ═ (X ═ X) 1 ,x 2 ,…,x N ) In, x i Denotes the ith element, numberDatabase spectrum column vector Y ═ Y 1 ,y 2 ,…,y N ) In, y i Denotes the ith element, where X and Y are both required to be N in length;
however, under the existing conditions, due to the fact that calibration standards, pixel resolutions and the like of different spectrometer manufacturers are different, the lengths of column vectors of a matched spectrum and a database spectrum cannot be guaranteed to be consistent, and the applicability of the cosine similarity calculation formula (1) based on the characteristic peak is limited;
furthermore, vectors X and Y in the formula (1) are replaced by characteristic peak vectors P1 and P2, so that the condition that the formula (1) cannot be used due to different lengths of column vectors of a matched spectrum and a database spectrum can be effectively avoided, and the cosine similarity calculation formula based on characteristic peaks is replaced by:
wherein A and B are respectively a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak position vector P2, a i b i The ith vector elements are P1 and P2 respectively, M represents the number of the same elements contained in P1 and P2 after the peak position calibration in step 2.2, and the cosine similarity of the characteristic peak of the matched spectrum and the database spectrum is calculated according to the formula (2).
As an example, the cosine similarity calculation method based on the characteristic peak mainly develops a cosine similarity matching method with applicability for two characteristic vectors with different lengths, and is used for evaluating the similarity between a detected matching spectrum and a database spectrum.
The invention aims to provide a Raman detection method of particulate matters based on cosine similarity, which comprises the steps of optimizing a scipy.signal.find _ peaks peak position detection function, providing a Raman spectrum characteristic peak position extraction method of the particulate matters, identifying sharp spectrum noise and spectrum boundaries, detecting characteristic peak position information from different scales and amplitudes, and improving the identification rate of a spectrum characteristic peak; secondly, in the cosine similarity calculation method based on the characteristic peak, a vector cosine similarity calculation method of different characteristic peak numbers and peak positions is provided, the problem of calculation applicability of cosine similarities of different spectral pixel lengths is solved, spectral characteristic peak positions of different resolutions or different correction methods are calibrated through a peak position correction system, and spectral similarity matching precision based on the characteristic peak is improved.
The invention provides a particle detection and identification method by combining a Raman spectrum technology with a feature extraction and similarity evaluation method, and is an effective method for realizing particle identification, safety evaluation and quality control.
The above embodiments are only preferred embodiments of the present invention, and it should be understood that the above embodiments are only for assisting understanding of the method and the core idea of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalents and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. The Raman particulate matter detection method based on cosine similarity is characterized by comprising the following steps: the method comprises a particle Raman spectrum characteristic peak extraction method and a cosine similarity calculation method based on the characteristic peak, wherein the method comprises the following steps:
the first method comprises the following steps of: the method can effectively identify sharp spectral noise and spectral boundaries, detect characteristic peak position information from different scales and amplitudes, improve the identification rate of spectral characteristic peaks, and improve the matching precision of a cosine similarity calculation method based on the characteristic peaks, and comprises the following specific steps of:
step 1.1, calculating the Raman spectrum resolution of the particles, and setting a wavelet window width threshold according to the Raman spectrum resolution of the particles;
the Raman spectrum resolution of the particulate matter can be represented by the Raman shift difference between two adjacent pixel points of the spectrum;
step 1.2, setting a wavelet window height threshold according to the relative intensity of the Raman spectrum of the particulate matter; the relative intensity of the Raman spectrum refers to the relative height between characteristic peaks after the Raman spectrum is subjected to standardization treatment;
step 1.3, according to the wavelet window width threshold and the wavelet window height threshold set in the step 1.1 and the step 1.2, carrying out characteristic peak position detection on the particle matching spectrum and the database spectrum by using continuous wavelet transform, wherein characteristic peak positions of the matching spectrum and the database spectrum detection are respectively stored in a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
elements in the matching spectrum characteristic peak vector P1 represent the characteristic peak position of the matching spectrum, the length of P1, namely the vector contains the number of the elements, and the number of the characteristic peaks of the matching spectrum is represented;
elements in the database spectrum characteristic peak vector P2 represent the database spectrum characteristic peak position, and the length of P2, namely the vector contains the number of the elements, which represents the number of the database spectrum characteristic peaks;
based on the influence of sharp noise peaks and spectrum boundaries on the scipy.signal.find _ peaks function, secondarily correcting characteristic peak vectors corresponding to characteristic peak positions by fitting local maximum values of all characteristic peak positions of a spectrum and specifying relative intensity and width of the spectrum;
step 1.4, extracting a matched spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, performing secondary calibration, eliminating interference information in spectrum burr noise, solving the problem that the spectrum boundary cannot be effectively identified by the scipy.signal.find _ peaks function, and determining a final characteristic peak vector; directly removing the first and last elements of the scipy, signal, find and peak functions to extract a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, and updating the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2;
because the elements of the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2 detected in the step 1.3 are a monotone increasing sequence, the difference between two continuous elements is judged to be more than 5 pixel points according to the Raman spectrum characteristics of the particulate matters;
traversing all elements P1 and P2, comparing the intensities of two adjacent peak positions of less than 5 pixel points, and selecting a larger peak position as a vector element; or fitting adjacent peak positions, taking extreme points as vector elements, and updating a matched spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
the second method is a cosine similarity calculation method based on characteristic peaks, and a cosine similarity matching method with applicability is developed aiming at matching spectrum characteristic peak vectors P1 and database spectrum characteristic peak vectors P2 of different characteristic peak positions and is used for calculating the similarity of the matching spectrum characteristic peak vectors P1 and the database spectrum characteristic peak vectors P2, and the method comprises the following specific steps:
step 2.1, converting the index of the characteristic peak into wave number, wherein the detection of the characteristic peak position in the step 1.3 is carried out based on pixel points of a spectrum column vector, and the index sequence of the pixel of the characteristic peak corresponding to the characteristic peak position needs to be converted into a wave numerical value;
the pixel index sequence is used as an increasing sequence starting from 1, and the index sequence can be directly and uniquely mapped to increasing wave number values;
step 2.2, calibrating the characteristic peak position in the matching spectrum characteristic peak vector P1 in the step 1.3 and the database spectrum characteristic peak vector P2;
sequentially traversing all characteristic peak positions of P1 and P2, searching that the difference between the peak positions of P1 and P2 is smaller than a certain threshold value, calibrating P2 by using a temporary matching forced conversion strategy by taking P1 as a standard, and updating P2;
the forced conversion of the temporary matching forced conversion strategy to the database spectrum only acts on the matching of the current matching spectrum and the database spectrum, and does not change the original characteristic peak position and the characteristic peak quantity of the database;
step 2.3, based on the cosine similarity calculation of the characteristic peak, developing a cosine similarity matching method with applicability for P1 and P2 with different vector lengths after the step 2.2;
the similarity calculation method based on the characteristic peak can use vector included angle cosine, total quantity statistical matrix similarity, hypothesis test method, Euclidean distance, Mahalanobis distance and other methods;
the invention uses the cosine of the included angle of the vector to calculate the similarity of the characteristic peak and develops a cosine similarity calculation method based on the characteristic peak, and the cosine similarity calculation formula of the conventional characteristic peak is as follows:
the characteristic peak refers to a column vector of the raman spectrum, that is, an intensity value corresponding to the raman shift pixel point, wherein a matching spectrum column vector X ═ (X ═ X) 1 ,x 2 ,…,x N ) In, x i Representing the ith element, the database spectrum column vector Y ═ Y 1 ,y 2 ,…,y N ) In, y i Denotes the ith element, where X and Y are required to be both N in length;
however, under the existing conditions, due to the fact that calibration standards, pixel resolutions and the like of different spectrometer manufacturers are different, the lengths of column vectors of a matched spectrum and a database spectrum cannot be guaranteed to be consistent, and the applicability of the cosine similarity calculation formula (1) based on the characteristic peak is limited;
the vectors X and Y in the formula (1) are replaced by the characteristic peak vectors P1 and P2, the condition that the formula (1) cannot be used due to different lengths of the matched spectrum and the database spectrum column vectors can be effectively avoided, and the cosine similarity calculation formula based on the characteristic peaks is replaced by the following steps:
wherein A and B are respectively a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak position vector P2, a i b i The ith vector elements are P1 and P2 respectively, M represents the number of the same elements contained in P1 and P2 after the peak position calibration in step 2.2, and the cosine similarity of the characteristic peak of the matched spectrum and the database spectrum is calculated according to the formula (2).
2. The Raman particulate detection method based on cosine similarity as claimed in claim 1, wherein the wavelet window width threshold is inversely proportional to the Raman spectral resolution of the particulate matter, and the higher the Raman spectral resolution of the particulate matter is, the smaller the wavelet window width threshold is set.
3. The cosine similarity-based Raman particulate detection method according to claim 1, wherein the wavelet window width threshold is proportional to the Raman spectral characteristic peak width of the particulate, and the wider the Raman spectral characteristic peak of the particulate, the larger the wavelet window width threshold is set.
4. The Raman particulate detection method based on cosine similarity as claimed in claim 1, wherein the wavelet window width threshold is inversely proportional to the Raman spectral feature peak intensity of the particulate matter; the denser the characteristic peak of the Raman spectrum of the particulate matter is, the smaller the threshold value of the width of the wavelet window is set.
5. The Raman detection method of particulate matter based on cosine similarity according to claim 1, wherein the normalization process comprises: min-max normalization, z-score normalization, or fractional scaling normalization.
6. The Raman detection method for particulate matter based on cosine similarity according to claim 1, wherein the wavelet window height threshold can be set to be between 5% and 20% of the maximum peak height; the variable range of the wavelet window width threshold is as follows: 5-30 pixels span the width.
7. The Raman detection method for particulate matter based on cosine similarity as claimed in claim 1, wherein the characteristic peak position refers to a spectral characteristic peak index, that is, a Raman shift pixel point index sequence corresponding to elements in P1 and P2, and the Raman shift unit can be a pixel index sequence, a wave number cm and a cm -1 Wavelength nm.
8. The Raman detection method for particulate matter based on cosine similarity as claimed in claim 1, wherein the detection of the characteristic peak position is optimized based on a scipy.
9. The Raman detection method of particulate matter based on cosine similarity according to claim 1, wherein the fitting method can be Lorentzian fitting, Gaussian fitting or polynomial fitting.
10. The Raman detection method for particulate matter based on cosine similarity according to claim 1, wherein the certain threshold is set according to spectral resolution; the certain threshold is 5 spectrum pixel points or 5 × PN, and the PN is the span wave number length of two pixel points of the Raman spectrum.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210829708.2A CN114993891B (en) | 2022-07-14 | 2022-07-14 | Particle Raman detection method based on cosine similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210829708.2A CN114993891B (en) | 2022-07-14 | 2022-07-14 | Particle Raman detection method based on cosine similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114993891A true CN114993891A (en) | 2022-09-02 |
CN114993891B CN114993891B (en) | 2024-04-19 |
Family
ID=83022103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210829708.2A Active CN114993891B (en) | 2022-07-14 | 2022-07-14 | Particle Raman detection method based on cosine similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114993891B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116216943A (en) * | 2023-02-14 | 2023-06-06 | 江苏博凌环境科技有限公司 | Biochemical ecological integration circulation flow-making platform equipment control system |
CN116713892A (en) * | 2023-08-10 | 2023-09-08 | 北京特思迪半导体设备有限公司 | Endpoint detection method and apparatus for wafer film grinding |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104777143A (en) * | 2014-01-15 | 2015-07-15 | 中国人民解放军第二军医大学 | Method for similarity identification of expired drugs based on Raman spectroscopy |
CN108918499A (en) * | 2018-06-28 | 2018-11-30 | 华南师范大学 | The method of Raman baseline drift is removed in Raman map |
CN110243806A (en) * | 2019-07-30 | 2019-09-17 | 江南大学 | Component of mixture recognition methods under Raman spectrum based on similarity |
US20200397353A1 (en) * | 2019-06-18 | 2020-12-24 | Samsung Electronics Co., Ltd. | Apparatus and method for measuring raman spectrum |
US20210364441A1 (en) * | 2020-05-19 | 2021-11-25 | Jiangnan University | Method for improving identification accuracy of mixture components by using known mixture raman spectrum |
CN114330411A (en) * | 2021-11-16 | 2022-04-12 | 安徽中科赛飞尔科技有限公司 | Self-adaptive windowed Raman spectrum identification method based on similarity |
-
2022
- 2022-07-14 CN CN202210829708.2A patent/CN114993891B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104777143A (en) * | 2014-01-15 | 2015-07-15 | 中国人民解放军第二军医大学 | Method for similarity identification of expired drugs based on Raman spectroscopy |
CN108918499A (en) * | 2018-06-28 | 2018-11-30 | 华南师范大学 | The method of Raman baseline drift is removed in Raman map |
US20200397353A1 (en) * | 2019-06-18 | 2020-12-24 | Samsung Electronics Co., Ltd. | Apparatus and method for measuring raman spectrum |
CN110243806A (en) * | 2019-07-30 | 2019-09-17 | 江南大学 | Component of mixture recognition methods under Raman spectrum based on similarity |
US20210364441A1 (en) * | 2020-05-19 | 2021-11-25 | Jiangnan University | Method for improving identification accuracy of mixture components by using known mixture raman spectrum |
CN114330411A (en) * | 2021-11-16 | 2022-04-12 | 安徽中科赛飞尔科技有限公司 | Self-adaptive windowed Raman spectrum identification method based on similarity |
Non-Patent Citations (1)
Title |
---|
史永刚;王国民;李华峰;刘毅;梅林;: "激光拉曼光谱相似性测度方法", 现代科学仪器, no. 04, 15 August 2011 (2011-08-15) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116216943A (en) * | 2023-02-14 | 2023-06-06 | 江苏博凌环境科技有限公司 | Biochemical ecological integration circulation flow-making platform equipment control system |
CN116216943B (en) * | 2023-02-14 | 2023-10-27 | 江苏博凌环境科技有限公司 | Biochemical ecological integration circulation flow-making platform equipment control system |
CN116713892A (en) * | 2023-08-10 | 2023-09-08 | 北京特思迪半导体设备有限公司 | Endpoint detection method and apparatus for wafer film grinding |
CN116713892B (en) * | 2023-08-10 | 2023-11-10 | 北京特思迪半导体设备有限公司 | Endpoint detection method and apparatus for wafer film grinding |
Also Published As
Publication number | Publication date |
---|---|
CN114993891B (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114993891B (en) | Particle Raman detection method based on cosine similarity | |
CN110243806B (en) | Mixture component identification method based on similarity under Raman spectrum | |
US20220383979A1 (en) | Nucleic acid mass spectrum numerical processing method | |
KR101538843B1 (en) | Yield management system and method for root cause analysis using manufacturing sensor data | |
CN112557332B (en) | Spectrum segmentation and spectrum comparison method based on spectrum peak-splitting fitting | |
US20230243744A1 (en) | Method and system for automatically detecting and reconstructing spectrum peaks in near infrared spectrum analysis of tea | |
Yang et al. | Spectral feature extraction based on continuous wavelet transform and image segmentation for peak detection | |
CN113109317A (en) | Raman spectrum quantitative analysis method and system based on background subtraction extraction peak area | |
US20240219299A1 (en) | Ir spectra matching systems and methods | |
CN114155200B (en) | Remote sensing image change detection method based on convolutional neural network | |
CN105528580A (en) | Hyperspectral curve matching method based on absorption peak characteristic | |
CN105718723B (en) | Spectrum peak position detection method in a kind of mass spectrometric data processing | |
CN114609319B (en) | Spectral peak identification method and system based on noise estimation | |
Barburiceanu et al. | An improved feature extraction method for texture classification with increased noise robustness | |
CN115420726A (en) | Method for rapidly identifying target object by using reconstructed SERS spectrum | |
CN109283153B (en) | Method for establishing quantitative analysis model of soy sauce | |
CN108764097B (en) | High-spectrum remote sensing image target identification method based on segmented sparse representation | |
CN114330411A (en) | Self-adaptive windowed Raman spectrum identification method based on similarity | |
CN111292346B (en) | Method for detecting contour of casting box body in noise environment | |
CN115078281B (en) | Water body substance component detection and calculation method based on picture spectral similarity | |
CN114170145B (en) | Heterogeneous remote sensing image change detection method based on multi-scale self-coding | |
CN109359678B (en) | High-precision classification recognition algorithm for liquor atlas | |
El_Tokhy | Rapid and robust radioisotopes identification algorithms of X-Ray and gamma spectra | |
CN108007913B (en) | Spectrum processing device, method and medicine authenticity judging system | |
CN118010649B (en) | Pollution detection method for food |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |