CN114993891A - Raman particulate matter detection method based on cosine similarity - Google Patents

Raman particulate matter detection method based on cosine similarity Download PDF

Info

Publication number
CN114993891A
CN114993891A CN202210829708.2A CN202210829708A CN114993891A CN 114993891 A CN114993891 A CN 114993891A CN 202210829708 A CN202210829708 A CN 202210829708A CN 114993891 A CN114993891 A CN 114993891A
Authority
CN
China
Prior art keywords
spectrum
characteristic peak
vector
raman
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210829708.2A
Other languages
Chinese (zh)
Other versions
CN114993891B (en
Inventor
李新立
刘闯
赵银苹
洪喜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changguang Chenying Hangzhou Scientific Instrument Co ltd
Original Assignee
Changguang Chenying Hangzhou Scientific Instrument Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changguang Chenying Hangzhou Scientific Instrument Co ltd filed Critical Changguang Chenying Hangzhou Scientific Instrument Co ltd
Priority to CN202210829708.2A priority Critical patent/CN114993891B/en
Publication of CN114993891A publication Critical patent/CN114993891A/en
Application granted granted Critical
Publication of CN114993891B publication Critical patent/CN114993891B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Dispersion Chemistry (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

The invention provides a particle Raman detection method based on cosine similarity by combining a Raman spectrum technology with a characteristic peak position extraction and similarity evaluation method. The problem of calculating applicability of cosine similarity of different spectral pixel lengths is solved, spectral characteristic peak positions of different resolutions or different correction methods are calibrated through a peak position correction system, and spectral similarity matching accuracy based on characteristic peaks is improved.

Description

Raman particulate matter detection method based on cosine similarity
Technical Field
The invention relates to the technical field of Raman detection, in particular to a Raman detection method for particles based on cosine similarity.
Background
Insoluble particles (also called granules) are an important detection index in pharmacopeia specifications, and medical safety problems caused by the insoluble particles in injection are concerned by a plurality of researchers; due to the limitation of the traditional detection method, the detection judgment of insoluble particles in the injection is low, and the quality control of the injection is adversely affected.
The Raman spectrum of the particles in the injection refers to the fingerprint spectrum of the particles, comprises molecular information of known components and unknown components, and has the characteristics of large information amount, strong characteristics and the like; chemical information (relative peak position) reflected by the particle Raman spectrum has high specificity, and is an effective method for realizing particle identification, safety evaluation and quality control; in the prior art, a commonly used raman spectral characteristic peak detection method is a Python's scipy.signal.find _ peaks function, which is particularly sensitive to spectral signals containing noise, especially sharp peaks in noise data, and if a wavelet window or a relative intensity threshold is improperly set, the sharp noise may change the position of a local maximum value and cannot better adapt to spectral peaks of different shapes; in addition, the function cannot effectively identify the spectral boundary, and the spectral boundary is always contained in the extracted peak position sequence, so that the Python's scale.
The particle Raman detection based on cosine similarity measures the similarity between the particles by comparing the cosine values of included angles of inner product spaces among 2 particle spectrum column vectors, and when the cosine values of the included angles of the two vectors are equal to 1, the two spectra are completely repeated; when the cosine value of the included angle is close to 1, the two spectrums are similar; the smaller the cosine of the angle, the more dissimilar the two spectra. In the prior art, the length (namely the number of pixel points) of two matching spectrum column vectors is required to be equal by Raman detection based on cosine similarity, and the method is strictly limited in applicability and is not suitable for wide-range application and popularization.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a particle Raman detection method based on cosine similarity, and the Raman spectrum detection method is combined with the characteristic peak extraction and similarity evaluation method through two-part architecture design of a particle Raman spectrum characteristic peak extraction method and a cosine similarity calculation method based on the characteristic peak, so that a particle detection and identification method is provided, and the progress of an injection quality detection method is greatly improved.
A particle Raman detection method based on cosine similarity comprises the following steps: the method comprises a particle Raman spectrum characteristic peak extraction method and a cosine similarity calculation method based on the characteristic peak, wherein the method comprises the following steps:
the first method comprises the following steps of: the method can effectively identify sharp spectral noise and spectral boundaries, detect characteristic peak position information from different scales and amplitudes, improve the identification rate of spectral characteristic peaks, and improve the matching precision of a cosine similarity calculation method based on the characteristic peaks, and comprises the following specific steps of:
step 1.1, calculating the Raman spectrum resolution of the particles, and setting a wavelet window width threshold according to the Raman spectrum resolution of the particles;
further, the Raman spectrum resolution of the particulate matter can be represented by the Raman shift difference between two adjacent pixel points of the spectrum;
further, the wavelet window width threshold is inversely proportional to the raman spectral resolution of the particulate matter, and the higher the raman spectral resolution of the particulate matter is, the smaller the wavelet window width threshold is set;
further, the wavelet window width threshold is in direct proportion to the width of the raman spectrum characteristic peak of the particulate matter (i.e. the raman shift difference of the characteristic peak), and the wider the raman spectrum characteristic peak of the particulate matter is, the larger the wavelet window width threshold is set;
further, the wavelet window width threshold is inversely proportional to the particle Raman spectrum characteristic peak concentration; the denser the characteristic peak of the Raman spectrum of the particulate matter is, the smaller the width threshold of the wavelet window is set;
as an illustration, the wavelet window width threshold variable range is: 5-30 pixel points span the width;
step 1.2, setting a wavelet window height threshold according to the relative intensity of the Raman spectrum of the particulate matter;
as an illustration, the relative intensity of the raman spectrum refers to the relative height between characteristic peaks after the raman spectrum normalization process;
as an illustration, the normalization process includes: min-max normalization, z-score normalization, or decimal scaling normalization, etc.;
as an illustration, the wavelet window height threshold may be set between 5% -20% of the maximum peak height.
Step 1.3, according to the wavelet window width threshold and the wavelet window height threshold set in the step 1.1 and the step 1.2, carrying out characteristic peak position detection on the particle matching spectrum and the database spectrum by using continuous wavelet transform, wherein characteristic peak positions of the matching spectrum and the database spectrum detection are respectively stored in a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the characteristic peak position refers to a spectral characteristic peak index, that is, a raman shift pixel point index sequence corresponding to elements in P1 and P2, and the raman shift unit may be a pixel index sequence, a wave number (cm) -1 ) Wavelength (nm), etc.;
by way of illustration, the elements in the matching spectrum characteristic peak vector P1 represent the characteristic peak positions of the matching spectrum, and the length P1, that is, the number of elements included in the vector, represents the number of characteristic peaks of the matching spectrum;
by way of illustration, the elements in the database spectrum characteristic peak vector P2 represent the database spectrum characteristic peak positions, and the length P2, i.e. the vector, contains the number of elements, which represents the number of database spectrum characteristic peaks;
as an example, the method performs optimization processing on the detection of the characteristic peak position on the basis of a scipy, signal, find, peaks function;
further, based on the influence of sharp noise peaks and spectrum boundaries on the function of scale, signal, find _ peaks, the present invention secondarily corrects the characteristic peak vector corresponding to the characteristic peak position by fitting the local maximum of all characteristic peak positions of the spectrum and specifying the relative intensity and width of the spectrum, as detailed in step 1.4.
Step 1.4, extracting a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, performing secondary calibration, eliminating interference information faced in spectrum burr noise (step 1.3), solving the problem that the spectrum boundary cannot be effectively identified by the scipy.signal.find _ peaks function, and determining a final characteristic peak vector;
directly removing the first and last elements of the scipy, signal, find and peak functions to extract a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, and updating the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2;
because the elements of the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2 detected in the step 1.3 are a monotone increasing sequence, the difference between two continuous elements is judged to be more than 5 pixel points according to the Raman spectrum characteristics of the particulate matters;
further, traversing all elements P1 and P2, comparing the intensities of two adjacent peak positions of less than 5 pixel points, and selecting a larger peak position as a vector element; or fitting adjacent peak positions, taking the extreme points as vector elements, and updating a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the fitting method may be a lorentzian fit, a gaussian fit, or a polynomial fit;
as an example, the method for extracting the characteristic peak of the raman spectrum of the particulate matter mainly includes the operations of peak searching (step 1.3) and correction (step 1.4) of the raman spectrum after the normalization processing in the step 1.2, decomposing the raman spectrum of the particulate matter according to the width and the height of a window, determining the position of the characteristic peak of the spectrum based on wavelet transformation, and setting a threshold value to perform secondary calibration on the position of the characteristic peak so as to improve the recognition rate of the characteristic peak of the spectrum.
The second method is a cosine similarity calculation method based on characteristic peaks, and an applicable cosine similarity matching method is developed aiming at matching spectrum characteristic peak vectors P1 and database spectrum characteristic peak vectors P2 at different characteristic peak positions and is used for calculating the similarity of the matching spectrum characteristic peak vectors P1 and the database spectrum characteristic peak vectors P2, and the method comprises the following specific steps:
step 2.1, converting the index of the characteristic peak into wave number, wherein the detection of the characteristic peak position in the step 1.3 is carried out based on pixel points of a spectrum column vector, and the index sequence of the pixel of the characteristic peak corresponding to the characteristic peak position needs to be converted into a wave numerical value;
further, the pixel index sequence is used as an increasing sequence starting from 1, and the index sequence can be directly and uniquely mapped to increasing wave number values;
step 2.2, calibrating the characteristic peak position in the matching spectrum characteristic peak vector P1 in the step 1.3 and the database spectrum characteristic peak vector P2;
further, sequentially traversing all characteristic peak positions (elements) of the P1 and the P2, and when the difference between the peak positions of the P1 and the P2 is smaller than a certain threshold value, calibrating the P2 by using a temporarily matched forced conversion strategy by taking the P1 as a standard, and updating the P2;
the forced conversion of the temporary matching forced conversion strategy to the database spectrum only acts on the matching of the current matching spectrum and the database spectrum, and does not change the original characteristic peak position and the characteristic peak quantity of the database;
as an illustration, the certain threshold is set according to the spectral resolution;
as an example, the certain threshold is 5 spectral pixels or 5 × PN, and PN is the span wave number length of two pixels of the raman spectrum;
step 2.3, based on the cosine similarity calculation of the characteristic peak, developing a cosine similarity matching method with applicability for P1 and P2 with different vector lengths after the step 2.2;
furthermore, the similarity calculation method based on the characteristic peak can use vector included angle cosine, total quantity statistical matrix similarity, hypothesis test method, Euclidean distance, Mahalanobis distance and other methods;
furthermore, the invention uses the cosine of the included angle of the vector to calculate the similarity of the characteristic peak, and develops a cosine similarity calculation method based on the characteristic peak, and the cosine similarity calculation formula of the conventional characteristic peak is as follows:
Figure BDA0003745303170000061
the characteristic peak refers to a column vector of the raman spectrum, that is, an intensity value corresponding to a raman shift pixel point, where a matching spectrum column vector X ═ X 1 ,x 2 ,…,x N ) In, x i Representing the ith element, the database spectrum column vector Y ═ Y 1 ,y 2 ,…,y N ) In, y i Denotes the ith element, where X and Y are required to be both N in length;
however, under the existing conditions, due to the fact that calibration standards, pixel resolutions and the like of different spectrometer manufacturers are different, the lengths of column vectors of a matched spectrum and a database spectrum cannot be guaranteed to be consistent, and the applicability of the cosine similarity calculation formula (1) based on the characteristic peak is limited;
furthermore, the vectors X and Y in the formula (1) are replaced by the characteristic peak vectors P1 and P2, so that the condition that the formula (1) cannot be used due to different lengths of the column vectors of the matched spectrum and the database spectrum can be effectively avoided, and the cosine similarity calculation formula based on the characteristic peaks is replaced by:
Figure BDA0003745303170000062
wherein A and B are respectively a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak position vector P2, a i b i The ith vector elements are P1 and P2 respectively, M represents the number of the same elements contained in P1 and P2 after the peak position calibration in step 2.2, and the cosine similarity of the characteristic peak of the matched spectrum and the database spectrum is calculated according to the formula (2).
As an example, the cosine similarity calculation method based on the characteristic peak mainly develops a cosine similarity matching method with applicability for two characteristic vectors with different lengths, and is used for evaluating the similarity between a detected matching spectrum and a database spectrum.
The invention has the beneficial effects that:
the invention aims to provide a Raman detection method of particulate matters based on cosine similarity, which comprises the steps of optimizing a scipy.signal.find _ peaks peak position detection function, providing a Raman spectrum characteristic peak position extraction method of the particulate matters, identifying sharp spectrum noise and spectrum boundaries, detecting characteristic peak position information from different scales and amplitudes, and improving the identification rate of a spectrum characteristic peak; secondly, in the cosine similarity calculation method based on the characteristic peak, a vector cosine similarity calculation method of different characteristic peak numbers and peak positions is provided, the problem of applicability of cosine similarity calculation of different spectral pixel lengths is solved, spectral characteristic peak positions of different resolutions or different correction methods are calibrated through a peak position correction system, and spectral similarity matching precision based on the characteristic peak is improved.
The invention provides a particle detection and identification method by combining a Raman spectrum technology with a feature extraction and similarity evaluation method, and is an effective method for realizing particle identification, safety evaluation and quality control.
Drawings
FIG. 1 is a schematic diagram of the overall process design of a Raman particle detection method based on cosine similarity according to the present invention
FIG. 2 is a schematic diagram of the peak position of the matched spectrum characteristic of the Raman detection method of particles based on cosine similarity in the present invention
FIG. 3 is a schematic diagram of a characteristic peak position of a spectrum of a database of the Raman scattering method of particles based on cosine similarity.
Detailed Description
Preferred embodiments of the present invention will be described in detail with reference to fig. 1 to 3.
A particle Raman detection method based on cosine similarity comprises the following steps: the method 100 for extracting the characteristic peak of the Raman spectrum of the particulate matter and the method 200 for calculating the cosine similarity based on the characteristic peak are disclosed, wherein:
the first method is a particle Raman spectrum characteristic peak extraction method 100: the method can effectively identify sharp spectral noise and spectral boundaries, detect characteristic peak position information from different scales and amplitudes, improve the identification rate of spectral characteristic peaks, and improve the matching precision of a cosine similarity calculation method based on the characteristic peaks, and comprises the following specific steps of:
step 1.1, calculating the Raman spectrum resolution of the particles, and setting a wavelet window width threshold 101 according to the Raman spectrum resolution of the particles;
FIG. 2 shows the matched spectrum S1, FIG. 3 shows the database spectrum S2, the spectral resolutions of the matched spectrum S1 and the database spectrum S2 are uniformly distributed, and both are 2cm -1
The wavelet window width threshold is set as: 16 × PN, PN being the spectral resolution of S1 or S2;
further, the Raman spectrum resolution of the particulate matter can be represented by the Raman shift difference between two adjacent pixel points of the spectrum;
further, the wavelet window width threshold is inversely proportional to the particle raman spectrum resolution 105, and the wavelet window width threshold is set to be smaller when the particle raman spectrum resolution is higher;
further, the wavelet window width threshold is directly proportional to the particulate matter raman spectrum characteristic peak width 106 (i.e. the raman shift difference of the characteristic peak), and the wider the particulate matter raman spectrum characteristic peak is, the larger the wavelet window width threshold is set;
further, the wavelet window width threshold is inversely proportional to the particle raman spectrum characteristic peak concentration 107; the denser the characteristic peak of the Raman spectrum of the particulate matter is, the smaller the width threshold of the wavelet window is set;
as an illustration, the wavelet window width threshold variable range is: 5-30 pixel points span the width;
step 1.2, setting a wavelet window height threshold value 102 according to the relative intensity of the Raman spectrum of the particulate matter; 102 where the wavelet window height threshold is set to 0.08, smaller characteristic peaks in the database spectrum S2 and the matched spectrum S1 can be monitored;
as an illustration, the relative intensity of the raman spectrum refers to the relative height between characteristic peaks after the raman spectrum normalization process;
as an illustration, the normalization process includes: min-max normalization, z-score normalization, or decimal scaling normalization, etc.;
as an illustration, the wavelet window height threshold may be set between 5% -20% of the maximum peak height.
Step 1.3, according to the wavelet window width threshold and the wavelet window height threshold set in the step 1.1 and the step 1.2, using continuous wavelet transform to perform characteristic peak position detection 103 on the particle matching spectrum and the database spectrum, wherein characteristic peak positions detected by the matching spectrum S1 and the database spectrum S2 are respectively stored in a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the characteristic peak position refers to a spectral characteristic peak index, that is, a raman shift pixel point index sequence corresponding to elements in P1 and P2, and the raman shift unit may be a pixel index sequence, a wave number (cm) -1 ) Wavelength (nm), etc.;
by way of illustration, the elements in the matching spectrum characteristic peak vector P1 represent the characteristic peak positions of the matching spectrum, and the length P1, that is, the number of elements included in the vector, represents the number of characteristic peaks of the matching spectrum;
by way of illustration, the elements in the database spectrum characteristic peak vector P2 represent the database spectrum characteristic peak positions, and the length P2, i.e. the vector, contains the number of elements, which represents the number of database spectrum characteristic peaks;
as an example, the method performs optimization processing on the detection of the characteristic peak position on the basis of a scipy, signal, find, peaks function;
further, based on the influence of sharp noise peaks and spectrum boundaries on the scipy.signal.find _ peaks function, the present invention secondarily corrects the characteristic peak vector corresponding to the characteristic peak position by fitting the local maximum values of all characteristic peak positions of the spectrum and specifying the relative intensity and width of the spectrum, which is detailed in step 1.4.
Step 1.4, extracting a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, performing secondary calibration, eliminating interference information faced in spectrum burr noise (step 1.3), solving the problem that the spectrum boundary cannot be effectively identified by the scipy.signal.find _ peaks function, and determining a final characteristic peak vector 104;
directly removing the first and last elements of the scipy, signal, find and peak functions to extract a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, and updating the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2;
because the elements of the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2 detected in the step 1.3 are a monotone increasing sequence, the difference between two continuous elements is judged to be more than 5 pixel points according to the Raman spectrum characteristics of the particulate matters;
furthermore, traversing all elements P1 and P2, comparing the intensities of two adjacent peak positions of less than 5 pixel points, and selecting a larger peak position as a vector element; or fitting adjacent peak positions, taking the extreme points as vector elements, and updating a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
as an illustration, the fitting method may be a lorentzian fit, a gaussian fit, or a polynomial fit;
as an example, the method for extracting a characteristic peak of a raman spectrum of a particulate matter mainly includes operations of peak searching (step 1.3) and correction (step 1.4) on the raman spectrum after the normalization processing in step 1.2, decomposing the raman spectrum of the particulate matter according to the window width and height, determining a characteristic peak position of the spectrum based on wavelet transformation, and setting a threshold value to perform secondary calibration on the characteristic peak position so as to improve the identification rate of the characteristic peak of the spectrum.
The second method 200 is a cosine similarity calculation method based on characteristic peaks, and develops an applicable cosine similarity matching method aiming at matching spectrum characteristic peak vectors P1 and database spectrum characteristic peak vectors P2 of different characteristic peak positions, wherein the cosine similarity matching method is used for calculating the similarity of the matching spectrum characteristic peak vectors P1 and the database spectrum characteristic peak vectors P2, and comprises the following specific steps:
step 2.1, converting the characteristic peak index into wave number 201, wherein the detection of the characteristic peak position in the step 1.3 is performed based on pixel points of the spectral column vector, and the characteristic peak pixel index sequence corresponding to the characteristic peak position needs to be converted into a wave numerical value;
further, the pixel index sequence is used as an increasing sequence from 1, and the index sequence can be directly and uniquely mapped to increasing wave number values;
step 2.2, calibrating 202 the characteristic peak position in the matching spectrum characteristic peak vector P1 in the step 1.3 and the database spectrum characteristic peak vector P2;
further, sequentially traversing all characteristic peak positions (elements) of the P1 and the P2, searching for a difference between the P1 and the P2 which is smaller than a certain threshold, calibrating the P2 by using a temporary matching forced conversion strategy by taking the P1 as a standard, and updating the P2;
the forced conversion of the temporary matching forced conversion strategy to the database spectrum only acts on the matching of the current matching spectrum and the database spectrum, and does not change the original characteristic peak position and the characteristic peak quantity of the database;
as an illustration, the certain threshold is set according to the spectral resolution;
as an example, the certain threshold is 5 spectral pixels or 5 × PN, and PN is the span wave number length of two pixels of the raman spectrum;
step 2.3, calculating 203 based on the cosine similarity of the characteristic peak, and developing a cosine similarity matching method with applicability according to P1 and P2 with different vector lengths after the step 2.2;
furthermore, the similarity calculation method based on the characteristic peak can use vector included angle cosine, total quantity statistical matrix similarity, hypothesis test method, Euclidean distance, Mahalanobis distance and other methods;
furthermore, the invention uses the cosine of the included angle of the vector to calculate the similarity of the characteristic peak, and develops a cosine similarity calculation method based on the characteristic peak, and the cosine similarity calculation formula of the conventional characteristic peak is as follows:
Figure BDA0003745303170000111
the characteristic peak refers to a column vector of the raman spectrum, that is, an intensity value corresponding to the raman shift pixel point, wherein a matching spectrum column vector X ═ (X ═ X) 1 ,x 2 ,…,x N ) In, x i Denotes the ith element, numberDatabase spectrum column vector Y ═ Y 1 ,y 2 ,…,y N ) In, y i Denotes the ith element, where X and Y are both required to be N in length;
however, under the existing conditions, due to the fact that calibration standards, pixel resolutions and the like of different spectrometer manufacturers are different, the lengths of column vectors of a matched spectrum and a database spectrum cannot be guaranteed to be consistent, and the applicability of the cosine similarity calculation formula (1) based on the characteristic peak is limited;
furthermore, vectors X and Y in the formula (1) are replaced by characteristic peak vectors P1 and P2, so that the condition that the formula (1) cannot be used due to different lengths of column vectors of a matched spectrum and a database spectrum can be effectively avoided, and the cosine similarity calculation formula based on characteristic peaks is replaced by:
Figure BDA0003745303170000121
wherein A and B are respectively a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak position vector P2, a i b i The ith vector elements are P1 and P2 respectively, M represents the number of the same elements contained in P1 and P2 after the peak position calibration in step 2.2, and the cosine similarity of the characteristic peak of the matched spectrum and the database spectrum is calculated according to the formula (2).
As an example, the cosine similarity calculation method based on the characteristic peak mainly develops a cosine similarity matching method with applicability for two characteristic vectors with different lengths, and is used for evaluating the similarity between a detected matching spectrum and a database spectrum.
The invention aims to provide a Raman detection method of particulate matters based on cosine similarity, which comprises the steps of optimizing a scipy.signal.find _ peaks peak position detection function, providing a Raman spectrum characteristic peak position extraction method of the particulate matters, identifying sharp spectrum noise and spectrum boundaries, detecting characteristic peak position information from different scales and amplitudes, and improving the identification rate of a spectrum characteristic peak; secondly, in the cosine similarity calculation method based on the characteristic peak, a vector cosine similarity calculation method of different characteristic peak numbers and peak positions is provided, the problem of calculation applicability of cosine similarities of different spectral pixel lengths is solved, spectral characteristic peak positions of different resolutions or different correction methods are calibrated through a peak position correction system, and spectral similarity matching precision based on the characteristic peak is improved.
The invention provides a particle detection and identification method by combining a Raman spectrum technology with a feature extraction and similarity evaluation method, and is an effective method for realizing particle identification, safety evaluation and quality control.
The above embodiments are only preferred embodiments of the present invention, and it should be understood that the above embodiments are only for assisting understanding of the method and the core idea of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalents and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. The Raman particulate matter detection method based on cosine similarity is characterized by comprising the following steps: the method comprises a particle Raman spectrum characteristic peak extraction method and a cosine similarity calculation method based on the characteristic peak, wherein the method comprises the following steps:
the first method comprises the following steps of: the method can effectively identify sharp spectral noise and spectral boundaries, detect characteristic peak position information from different scales and amplitudes, improve the identification rate of spectral characteristic peaks, and improve the matching precision of a cosine similarity calculation method based on the characteristic peaks, and comprises the following specific steps of:
step 1.1, calculating the Raman spectrum resolution of the particles, and setting a wavelet window width threshold according to the Raman spectrum resolution of the particles;
the Raman spectrum resolution of the particulate matter can be represented by the Raman shift difference between two adjacent pixel points of the spectrum;
step 1.2, setting a wavelet window height threshold according to the relative intensity of the Raman spectrum of the particulate matter; the relative intensity of the Raman spectrum refers to the relative height between characteristic peaks after the Raman spectrum is subjected to standardization treatment;
step 1.3, according to the wavelet window width threshold and the wavelet window height threshold set in the step 1.1 and the step 1.2, carrying out characteristic peak position detection on the particle matching spectrum and the database spectrum by using continuous wavelet transform, wherein characteristic peak positions of the matching spectrum and the database spectrum detection are respectively stored in a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
elements in the matching spectrum characteristic peak vector P1 represent the characteristic peak position of the matching spectrum, the length of P1, namely the vector contains the number of the elements, and the number of the characteristic peaks of the matching spectrum is represented;
elements in the database spectrum characteristic peak vector P2 represent the database spectrum characteristic peak position, and the length of P2, namely the vector contains the number of the elements, which represents the number of the database spectrum characteristic peaks;
based on the influence of sharp noise peaks and spectrum boundaries on the scipy.signal.find _ peaks function, secondarily correcting characteristic peak vectors corresponding to characteristic peak positions by fitting local maximum values of all characteristic peak positions of a spectrum and specifying relative intensity and width of the spectrum;
step 1.4, extracting a matched spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, performing secondary calibration, eliminating interference information in spectrum burr noise, solving the problem that the spectrum boundary cannot be effectively identified by the scipy.signal.find _ peaks function, and determining a final characteristic peak vector; directly removing the first and last elements of the scipy, signal, find and peak functions to extract a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2, and updating the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2;
because the elements of the matching spectrum characteristic peak vector P1 and the database spectrum characteristic peak vector P2 detected in the step 1.3 are a monotone increasing sequence, the difference between two continuous elements is judged to be more than 5 pixel points according to the Raman spectrum characteristics of the particulate matters;
traversing all elements P1 and P2, comparing the intensities of two adjacent peak positions of less than 5 pixel points, and selecting a larger peak position as a vector element; or fitting adjacent peak positions, taking extreme points as vector elements, and updating a matched spectrum characteristic peak vector P1 and a database spectrum characteristic peak vector P2;
the second method is a cosine similarity calculation method based on characteristic peaks, and a cosine similarity matching method with applicability is developed aiming at matching spectrum characteristic peak vectors P1 and database spectrum characteristic peak vectors P2 of different characteristic peak positions and is used for calculating the similarity of the matching spectrum characteristic peak vectors P1 and the database spectrum characteristic peak vectors P2, and the method comprises the following specific steps:
step 2.1, converting the index of the characteristic peak into wave number, wherein the detection of the characteristic peak position in the step 1.3 is carried out based on pixel points of a spectrum column vector, and the index sequence of the pixel of the characteristic peak corresponding to the characteristic peak position needs to be converted into a wave numerical value;
the pixel index sequence is used as an increasing sequence starting from 1, and the index sequence can be directly and uniquely mapped to increasing wave number values;
step 2.2, calibrating the characteristic peak position in the matching spectrum characteristic peak vector P1 in the step 1.3 and the database spectrum characteristic peak vector P2;
sequentially traversing all characteristic peak positions of P1 and P2, searching that the difference between the peak positions of P1 and P2 is smaller than a certain threshold value, calibrating P2 by using a temporary matching forced conversion strategy by taking P1 as a standard, and updating P2;
the forced conversion of the temporary matching forced conversion strategy to the database spectrum only acts on the matching of the current matching spectrum and the database spectrum, and does not change the original characteristic peak position and the characteristic peak quantity of the database;
step 2.3, based on the cosine similarity calculation of the characteristic peak, developing a cosine similarity matching method with applicability for P1 and P2 with different vector lengths after the step 2.2;
the similarity calculation method based on the characteristic peak can use vector included angle cosine, total quantity statistical matrix similarity, hypothesis test method, Euclidean distance, Mahalanobis distance and other methods;
the invention uses the cosine of the included angle of the vector to calculate the similarity of the characteristic peak and develops a cosine similarity calculation method based on the characteristic peak, and the cosine similarity calculation formula of the conventional characteristic peak is as follows:
Figure FDA0003745303160000031
the characteristic peak refers to a column vector of the raman spectrum, that is, an intensity value corresponding to the raman shift pixel point, wherein a matching spectrum column vector X ═ (X ═ X) 1 ,x 2 ,…,x N ) In, x i Representing the ith element, the database spectrum column vector Y ═ Y 1 ,y 2 ,…,y N ) In, y i Denotes the ith element, where X and Y are required to be both N in length;
however, under the existing conditions, due to the fact that calibration standards, pixel resolutions and the like of different spectrometer manufacturers are different, the lengths of column vectors of a matched spectrum and a database spectrum cannot be guaranteed to be consistent, and the applicability of the cosine similarity calculation formula (1) based on the characteristic peak is limited;
the vectors X and Y in the formula (1) are replaced by the characteristic peak vectors P1 and P2, the condition that the formula (1) cannot be used due to different lengths of the matched spectrum and the database spectrum column vectors can be effectively avoided, and the cosine similarity calculation formula based on the characteristic peaks is replaced by the following steps:
Figure FDA0003745303160000032
wherein A and B are respectively a matching spectrum characteristic peak vector P1 and a database spectrum characteristic peak position vector P2, a i b i The ith vector elements are P1 and P2 respectively, M represents the number of the same elements contained in P1 and P2 after the peak position calibration in step 2.2, and the cosine similarity of the characteristic peak of the matched spectrum and the database spectrum is calculated according to the formula (2).
2. The Raman particulate detection method based on cosine similarity as claimed in claim 1, wherein the wavelet window width threshold is inversely proportional to the Raman spectral resolution of the particulate matter, and the higher the Raman spectral resolution of the particulate matter is, the smaller the wavelet window width threshold is set.
3. The cosine similarity-based Raman particulate detection method according to claim 1, wherein the wavelet window width threshold is proportional to the Raman spectral characteristic peak width of the particulate, and the wider the Raman spectral characteristic peak of the particulate, the larger the wavelet window width threshold is set.
4. The Raman particulate detection method based on cosine similarity as claimed in claim 1, wherein the wavelet window width threshold is inversely proportional to the Raman spectral feature peak intensity of the particulate matter; the denser the characteristic peak of the Raman spectrum of the particulate matter is, the smaller the threshold value of the width of the wavelet window is set.
5. The Raman detection method of particulate matter based on cosine similarity according to claim 1, wherein the normalization process comprises: min-max normalization, z-score normalization, or fractional scaling normalization.
6. The Raman detection method for particulate matter based on cosine similarity according to claim 1, wherein the wavelet window height threshold can be set to be between 5% and 20% of the maximum peak height; the variable range of the wavelet window width threshold is as follows: 5-30 pixels span the width.
7. The Raman detection method for particulate matter based on cosine similarity as claimed in claim 1, wherein the characteristic peak position refers to a spectral characteristic peak index, that is, a Raman shift pixel point index sequence corresponding to elements in P1 and P2, and the Raman shift unit can be a pixel index sequence, a wave number cm and a cm -1 Wavelength nm.
8. The Raman detection method for particulate matter based on cosine similarity as claimed in claim 1, wherein the detection of the characteristic peak position is optimized based on a scipy.
9. The Raman detection method of particulate matter based on cosine similarity according to claim 1, wherein the fitting method can be Lorentzian fitting, Gaussian fitting or polynomial fitting.
10. The Raman detection method for particulate matter based on cosine similarity according to claim 1, wherein the certain threshold is set according to spectral resolution; the certain threshold is 5 spectrum pixel points or 5 × PN, and the PN is the span wave number length of two pixel points of the Raman spectrum.
CN202210829708.2A 2022-07-14 2022-07-14 Particle Raman detection method based on cosine similarity Active CN114993891B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210829708.2A CN114993891B (en) 2022-07-14 2022-07-14 Particle Raman detection method based on cosine similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210829708.2A CN114993891B (en) 2022-07-14 2022-07-14 Particle Raman detection method based on cosine similarity

Publications (2)

Publication Number Publication Date
CN114993891A true CN114993891A (en) 2022-09-02
CN114993891B CN114993891B (en) 2024-04-19

Family

ID=83022103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210829708.2A Active CN114993891B (en) 2022-07-14 2022-07-14 Particle Raman detection method based on cosine similarity

Country Status (1)

Country Link
CN (1) CN114993891B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116216943A (en) * 2023-02-14 2023-06-06 江苏博凌环境科技有限公司 Biochemical ecological integration circulation flow-making platform equipment control system
CN116713892A (en) * 2023-08-10 2023-09-08 北京特思迪半导体设备有限公司 Endpoint detection method and apparatus for wafer film grinding

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104777143A (en) * 2014-01-15 2015-07-15 中国人民解放军第二军医大学 Method for similarity identification of expired drugs based on Raman spectroscopy
CN108918499A (en) * 2018-06-28 2018-11-30 华南师范大学 The method of Raman baseline drift is removed in Raman map
CN110243806A (en) * 2019-07-30 2019-09-17 江南大学 Component of mixture recognition methods under Raman spectrum based on similarity
US20200397353A1 (en) * 2019-06-18 2020-12-24 Samsung Electronics Co., Ltd. Apparatus and method for measuring raman spectrum
US20210364441A1 (en) * 2020-05-19 2021-11-25 Jiangnan University Method for improving identification accuracy of mixture components by using known mixture raman spectrum
CN114330411A (en) * 2021-11-16 2022-04-12 安徽中科赛飞尔科技有限公司 Self-adaptive windowed Raman spectrum identification method based on similarity

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104777143A (en) * 2014-01-15 2015-07-15 中国人民解放军第二军医大学 Method for similarity identification of expired drugs based on Raman spectroscopy
CN108918499A (en) * 2018-06-28 2018-11-30 华南师范大学 The method of Raman baseline drift is removed in Raman map
US20200397353A1 (en) * 2019-06-18 2020-12-24 Samsung Electronics Co., Ltd. Apparatus and method for measuring raman spectrum
CN110243806A (en) * 2019-07-30 2019-09-17 江南大学 Component of mixture recognition methods under Raman spectrum based on similarity
US20210364441A1 (en) * 2020-05-19 2021-11-25 Jiangnan University Method for improving identification accuracy of mixture components by using known mixture raman spectrum
CN114330411A (en) * 2021-11-16 2022-04-12 安徽中科赛飞尔科技有限公司 Self-adaptive windowed Raman spectrum identification method based on similarity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
史永刚;王国民;李华峰;刘毅;梅林;: "激光拉曼光谱相似性测度方法", 现代科学仪器, no. 04, 15 August 2011 (2011-08-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116216943A (en) * 2023-02-14 2023-06-06 江苏博凌环境科技有限公司 Biochemical ecological integration circulation flow-making platform equipment control system
CN116216943B (en) * 2023-02-14 2023-10-27 江苏博凌环境科技有限公司 Biochemical ecological integration circulation flow-making platform equipment control system
CN116713892A (en) * 2023-08-10 2023-09-08 北京特思迪半导体设备有限公司 Endpoint detection method and apparatus for wafer film grinding
CN116713892B (en) * 2023-08-10 2023-11-10 北京特思迪半导体设备有限公司 Endpoint detection method and apparatus for wafer film grinding

Also Published As

Publication number Publication date
CN114993891B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN114993891B (en) Particle Raman detection method based on cosine similarity
CN110243806B (en) Mixture component identification method based on similarity under Raman spectrum
US20220383979A1 (en) Nucleic acid mass spectrum numerical processing method
KR101538843B1 (en) Yield management system and method for root cause analysis using manufacturing sensor data
CN112557332B (en) Spectrum segmentation and spectrum comparison method based on spectrum peak-splitting fitting
US20230243744A1 (en) Method and system for automatically detecting and reconstructing spectrum peaks in near infrared spectrum analysis of tea
Yang et al. Spectral feature extraction based on continuous wavelet transform and image segmentation for peak detection
CN113109317A (en) Raman spectrum quantitative analysis method and system based on background subtraction extraction peak area
US20240219299A1 (en) Ir spectra matching systems and methods
CN114155200B (en) Remote sensing image change detection method based on convolutional neural network
CN105528580A (en) Hyperspectral curve matching method based on absorption peak characteristic
CN105718723B (en) Spectrum peak position detection method in a kind of mass spectrometric data processing
CN114609319B (en) Spectral peak identification method and system based on noise estimation
Barburiceanu et al. An improved feature extraction method for texture classification with increased noise robustness
CN115420726A (en) Method for rapidly identifying target object by using reconstructed SERS spectrum
CN109283153B (en) Method for establishing quantitative analysis model of soy sauce
CN108764097B (en) High-spectrum remote sensing image target identification method based on segmented sparse representation
CN114330411A (en) Self-adaptive windowed Raman spectrum identification method based on similarity
CN111292346B (en) Method for detecting contour of casting box body in noise environment
CN115078281B (en) Water body substance component detection and calculation method based on picture spectral similarity
CN114170145B (en) Heterogeneous remote sensing image change detection method based on multi-scale self-coding
CN109359678B (en) High-precision classification recognition algorithm for liquor atlas
El_Tokhy Rapid and robust radioisotopes identification algorithms of X-Ray and gamma spectra
CN108007913B (en) Spectrum processing device, method and medicine authenticity judging system
CN118010649B (en) Pollution detection method for food

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant