CN113610017A - Antler cap type identification method based on mid-infrared spectrum and SVM - Google Patents
Antler cap type identification method based on mid-infrared spectrum and SVM Download PDFInfo
- Publication number
- CN113610017A CN113610017A CN202110918614.8A CN202110918614A CN113610017A CN 113610017 A CN113610017 A CN 113610017A CN 202110918614 A CN202110918614 A CN 202110918614A CN 113610017 A CN113610017 A CN 113610017A
- Authority
- CN
- China
- Prior art keywords
- cap
- antler
- spectrum
- svm
- deer antler
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 210000003056 antler Anatomy 0.000 title claims abstract description 103
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000002329 infrared spectrum Methods 0.000 title claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims abstract description 70
- 238000012706 support-vector machine Methods 0.000 claims abstract description 48
- 230000003595 spectral effect Effects 0.000 claims abstract description 39
- 241000283007 Cervus nippon Species 0.000 claims abstract description 37
- 241000282985 Cervus Species 0.000 claims abstract description 29
- 238000000513 principal component analysis Methods 0.000 claims abstract description 19
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000007637 random forest analysis Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000001035 drying Methods 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000013144 data compression Methods 0.000 claims abstract description 4
- 238000010606 normalization Methods 0.000 claims abstract description 4
- 230000009467 reduction Effects 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 20
- IOLCXVTUBQKXJR-UHFFFAOYSA-M potassium bromide Chemical compound [K+].[Br-] IOLCXVTUBQKXJR-UHFFFAOYSA-M 0.000 claims description 12
- 239000000843 powder Substances 0.000 claims description 12
- 230000002068 genetic effect Effects 0.000 claims description 10
- 230000001186 cumulative effect Effects 0.000 claims description 7
- 230000035772 mutation Effects 0.000 claims description 6
- 238000001276 Kolmogorov–Smirnov test Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 4
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000013506 data mapping Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 claims description 3
- 238000000227 grinding Methods 0.000 claims description 3
- 238000002156 mixing Methods 0.000 claims description 3
- 238000010187 selection method Methods 0.000 claims description 3
- 238000007873 sieving Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000005303 weighing Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 2
- 241000894007 species Species 0.000 claims 2
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000007689 inspection Methods 0.000 abstract 1
- 238000001514 detection method Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 8
- 241000282994 Cervidae Species 0.000 description 6
- 239000000463 material Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 101000812677 Homo sapiens Nucleotide pyrophosphatase Proteins 0.000 description 2
- 101100353526 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) pca-2 gene Proteins 0.000 description 2
- 102100039306 Nucleotide pyrophosphatase Human genes 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 241001416181 Axis axis Species 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 206010061245 Internal injury Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000000762 glandular Effects 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 238000004476 mid-IR spectroscopy Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000008961 swelling Effects 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 201000008827 tuberculosis Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
- G06F2218/16—Classification; Matching by matching signal segments
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/3563—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Chemical & Material Sciences (AREA)
- Immunology (AREA)
- Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses a antler cap type identification method based on a mid-infrared spectrum and an SVM (support vector machine), and relates to the technical field of antler cap identification. The method comprises the steps of drying, crushing and flaking a sample, collecting diffuse reflection spectra, carrying out multiple data processing on the collected original spectra, and classifying the sample by using a K-S inspection method; then, performing high-dimensional data compression and main characteristic component extraction on the spectral data by adopting a normalization and principal component analysis dimensionality reduction method; and finally, respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obvious difference wave band as input variables, and establishing an identification model of the SVM, the ELM and the RF sika deer antler cap and the red deer antler cap. According to the method, efficient, accurate and lossless antler cap variety identification of the sika deer antler cap and the red deer antler cap is realized through a mid-infrared spectrum technology, a Support Vector Machine (SVM), a random forest algorithm (RF) and an Extreme Learning Machine (ELM) model establishing method.
Description
Technical Field
The invention belongs to the technical field of antler cap identification, and particularly relates to a antler cap type identification method based on a mid-infrared spectrum and an SVM.
Background
The antler cap is a platform-shaped horn plate left on the head of a male deer after the spotted deer or the red deer adopts antler, the horn plate is gradually ossified, and when the horn is changed, the horn handle and the horn plate are separated and broken to form a fallen horn, so the antler cap has higher medicinal value, can be used for treating sore pain and swelling toxin, pain caused by blood stasis, internal injury caused by deficiency tuberculosis and the like, and has obvious effect on various glandular inflammations. With the development of deer medicinal materials in China, deer-horn caps begin to get hot in the market. Although the sika deer antler cap and the red deer antler cap are all deer medicinal materials regulated by the state, the sika deer antler cap is higher than the red deer antler cap in price and medicinal value and has the same appearance and internal molecular structure, so that the problem that the types of the antler caps are not consistent in the market becomes one of common problems, and how to find the efficient, online and low-cost antler cap type identification method is the most important.
At present, scholars at home and abroad mainly carry out detection and identification research on the antler, and although a plurality of scholars have better results on the detection and identification research on the antler, the detection and identification of the medicinal antler cap are very few. The infrared spectrum technology has the advantages of high efficiency, rapidness, low cost, no damage and the like, and has good prospects in the aspects of agriculture and forestry product detection and identification. At present, the research on the detection and identification of the antler cap has strong limitation, a test sample can be identified as the quality antler cap just like a standard map, otherwise, other detection comparisons are carried out, and the types of the antler cap cannot be distinguished. Therefore, the efficient, rapid and accurate antler cap variety identification method is found based on the advantages of the mid-infrared spectrum technology and the wide application combined with mathematical modeling.
Disclosure of Invention
The invention aims to provide a deer antler cap type identification method based on a mid-infrared spectrum and an SVM (support vector machine), and solves the problems that in the prior art, sika deer antler caps and red deer antler caps cannot be identified efficiently, accurately and nondestructively, and further a new thought and a new method cannot be provided for solving the problem of deer antler cap type and quality detection through a mid-infrared spectrum technology, a support vector machine SVM, a random forest algorithm RF and an extreme learning machine ELM (extreme learning machine model).
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a antler cap type identification method based on a mid-infrared spectrum and an SVM (support vector machine), which comprises the following steps of:
step 1: crushing the sika deer antler cap and the red deer antler cap to be detected respectively, sieving by a 200-mesh sieve to obtain sika deer antler cap and red deer antler cap powder, and drying the sika deer antler cap powder and potassium bromide in a constant-temperature drying oven at 60 ℃ for 8-12 h;
step 2: accurately weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, placing the ground powder in an infrared tabletting mold to be pressed into tablets to obtain sika deer antler cap tablets, obtaining red deer antler cap tablets in the same way, placing the sika deer antler cap tablets and the red deer antler cap tablets on a medium-infrared spectrometer respectively, and collecting diffuse reflection spectra of the corresponding tablets respectively;
and step 3: performing multiple data processing on The acquired original spectrum by adopting The Unscrambler X10.4 software to obtain a preprocessed spectrum, and comparing The preprocessed spectrum with The corresponding original spectrum; the multiple data processing comprises Multivariate Scatter Correction (MSC);
adopting the full-band spectral data wave band and only taking out all wave bands with larger difference from the spectral data to carry out the following data analysis and comparison, thereby selecting the optimal wave band;
and 4, step 4: adopting a K-S test method, and setting the number of samples in a training set and a test set as 5: 2, dividing the total number of samples into a plurality of training sets and a plurality of testing sets, wherein the sample ratio of the sika deer antler cap and the red deer antler cap in the training sets to the testing sets is 1: 1;
and 5: the method for performing high-dimensional data compression and main characteristic component extraction on spectral data by adopting a normalization and principal component analysis dimension reduction method mainly comprises the following steps:
step 51: normalizing MSC spectral data by using a mapminmax function in matlab2014b, and setting a data mapping range to be 0-1;
step 52: carrying out principal component analysis on the normalized MSC antler cap spectral data by using Python 3.7 software, and respectively drawing a full-section MSC spectrum and a scatter diagram of the first two principal components in the selected MSC spectrum with obviously different wave bands;
step 6: modeling is carried out by three methods of a Support Vector Machine (SVM), a random forest Radio Frequency (RF) and an Extreme Learning Machine (ELM), and identification models of the SVM, the ELM, the RF sika deer antler cap and the red deer antler cap are established by respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obviously different wave bands as input variables.
Further, the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm < -1 >, the resolution is 4cm < -1 >, the scanning frequency is 16 times, each sample is repeatedly scanned for 3 times, and an average spectrum is obtained;
during spectrum collection, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
Further, the multiple data processing in step 3 further includes standard normal variable transformation SNV, smoothing SG, first derivative and second derivative.
Further, the number selection method of the principal components in the step 5 is a combination of the first method and the second method;
in the first method, the number cumulative contribution rate of the principal components is at least greater than or equal to 85%, and in the second method, the principal component characteristic value is greater than or equal to 1.
Further, the specific method of the support vector machine SVM in step 6 is: firstly, a training set adopts K-CV cross validation and simultaneously supports a vector machine SVM to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
and (3) adopting a network search method, setting the optimal penalty factor c to be 2-15-215, setting the range of the kernel function parameter g to be 2-15-215, setting the step length to be 0.1, and using the radial basis kernel function as the optimal kernel function.
Further, the specific method of random forest RF in step 6 is: the optimal tree and other influence parameters are searched by adopting a genetic algorithm, the number of variables to be optimized in the genetic algorithm is set to be 2, the number of individuals is set to be 20, the maximum genetic algebra is set to be 200, the binary digit number of the variables is set to be 10, the gully is set to be 0.95, the cross mutation probability is set to be 0.7, and the mutation probability is set to be 0.01.
Further, the number of hidden nodes in the extreme learning machine ELM in step 6 is set to 40-100, and the optimal number of hidden nodes is obtained through comparison.
The invention has the following beneficial effects:
1. the invention utilizes the mid-infrared spectrum technology, the SVM (support vector machine), the RF (random forest algorithm) and the ELM (extreme learning machine) to establish the model, realizes the efficient, accurate and lossless antler cap variety identification of the sika deer antler cap and the red deer antler cap, and provides a new thought and method for solving the antler cap variety and quality detection problems.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a graph of the average spectra of MSC antler caps.
Fig. 2 is a graph of normalized mean MSC spectral data.
Table 1 shows the top 10 principal component eigenvalues and cumulative contribution rates.
FIG. 3 is a scatter plot of the first 2 principal components of the full-segment spectral test set.
FIG. 4 is a scatter plot of the first 2 principal components of the difference spectrum test set.
FIG. 5 is a curve of the parameter-optimizing fitness for full-span spectral grid search.
FIG. 6 shows the fitting results of the full-range spectral test set.
FIG. 7 is a graph of the optimum fitness for searching parameters of the difference spectrum grid
FIG. 8 shows the fitting results of the difference spectrum test set.
FIG. 9 is a full-segment spectrum iterative error variation curve.
FIG. 10 shows the fitting results of the full-range spectral training set.
FIG. 11 shows the results of a full-range spectral test set fit.
Fig. 12 is a variation curve of the iterative error of the difference spectrum.
FIG. 13 shows the fitting results of the difference spectrum training set.
FIG. 14 shows the fitting results of the difference spectrum test set.
Table 2 compares the predicted results of the ELM algorithm.
FIG. 15 shows the results of a full-range spectral training set fitting.
FIG. 16 shows the results of a full-range spectral test set fitting.
FIG. 17 shows the difference spectrum training set fitting results.
FIG. 18 shows the fitting results of the difference spectrum test set.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-18 and tables 1-2, the present invention is a method for identifying antler cap types based on mid-infrared spectroscopy and SVM, comprising the steps of:
step 1: crushing the sika deer antler cap and the red deer antler cap to be detected respectively, sieving by a 200-mesh sieve to obtain sika deer antler cap and red deer antler cap powder, and drying the sika deer antler cap powder and potassium bromide in a constant-temperature drying oven at 60 ℃ for 8-12 h; collecting 42 samples of sika deer antler cap and red deer antler cap respectively, wherein the total number of the samples is 84;
step 2: accurately weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, placing the ground powder in an infrared tabletting mold to be pressed into tablets to obtain sika deer antler cap tablets, obtaining red deer antler cap tablets in the same way, placing the sika deer antler cap tablets and the red deer antler cap tablets on a medium-infrared spectrometer respectively, and collecting diffuse reflection spectra of the corresponding tablets respectively;
and step 3: the invention adopts The Unscrambler X10.4 software to carry out multiple data processing on The acquired original spectrum to obtain a preprocessed spectrum, and then compares The preprocessed spectrum with The corresponding original spectrum; the multiple data processing comprises multivariate scattering correction MSC, standard normal variable transformation SNV, smooth SG, first derivative and second derivative; comparing the spectra after various pretreatments, it can be seen that the differences of the spectra after the multivariate scatter correction MSC treatment are more obvious, as shown in FIG. 1. As can be seen from FIG. 1, the deer antler cap and the red deer antler cap are obviously different in the wave bands 740-;
the characteristic peak in the spectrum data is the main factor for judging the spectrum difference, and the full-band spectrum data wave band is adopted and all wave bands with larger difference are only taken out from the spectrum data for later data analysis and comparison, so that the optimal wave band is selected;
and 4, step 4: the K-S test is a method for rapidly detecting the division of a training set and a test set based on an accumulative distribution function, and adopts a K-S test method, wherein the number of samples in the training set and the test set is 5: 2, dividing 84 samples into 60 training sets and 24 testing sets, wherein the sample ratio of the sika deer antler cap and the red deer antler cap in the training sets to the testing sets is 1: 1; 30 parts of each of sika deer antler caps and red deer antler caps are trained, and 12 parts of each of sika deer antler caps and red deer antler caps are tested;
and 5: the mid-infrared spectrum band range is 4000-400cm-1The method has the characteristics of multiple wave bands, large data volume and strong redundancy, adopts a normalization and principal component analysis dimensionality reduction method to perform high-dimensional data compression and main characteristic component extraction on spectral data, and mainly comprises the following steps:
step 51: normalizing MSC spectral data by using a mapminmax function in matlab2014b, and setting a data mapping range to be 0-1, as shown in FIG. 2;
step 52: principal component analysis is carried out on the normalized MSC antler-hat spectral data by using Python 3.7 software, and the first 10 principal component characteristic values and the accumulated contribution rates of the full-segment spectrum and the spectrum with the selected distinct difference waveband are respectively shown in Table 1, wherein the contribution rate of PCA1 in the full-segment MSC spectrum is the largest and is 55.62909%, the contribution rate of PCA2 is 20.63217%, the accumulated contribution rate of the first 3 PCs is 84.15974%, the accumulated contribution rate of the first 8 PCs is 97.16765%, the contribution rate of each subsequent PC is less than 1%, and the increasing speed of the accumulated contribution rate is gradually reduced. In the MSC spectrum with the obviously different waveband, the PCA1 has the largest contribution rate of 59.52195%, the PCA2 has the largest contribution rate of 29.89027%, the cumulative contribution rate of the first 3 PCs is 92.68794%, the cumulative contribution rate of the first 6 PCs is 97.92108%, the subsequent PC contribution rates are all smaller than 1%, and the increasing speed of the cumulative contribution rates is gradually reduced;
respectively drawing the full-segment MSC spectrum and a scatter diagram of the first two principal components in the selected MSC spectrum with the obviously different wave bands, as shown in FIGS. 3-4;
step 6: modeling is carried out by three methods of a Support Vector Machine (SVM), a random forest Radio Frequency (RF) and an Extreme Learning Machine (ELM), and identification models of the SVM, the ELM, the RF sika deer antler cap and the red deer antler cap are established by respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obviously different wave bands as input variables.
Preferably, the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm < -1 >, the resolution is 4cm < -1 >, the scanning frequency is 16 times, each sample is repeatedly scanned for 3 times, and an average spectrum is obtained;
during spectrum collection, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
Preferably, the number selection method of the main components in the step 5 is a combination of the first method and the second method;
the number cumulative contribution rate of the principal components is at least greater than or equal to 85% in the method one, and the principal component characteristic value is greater than or equal to 1 in the method two; and selecting the first 8 principal components on the full-section spectral data to form principal component dimension-reduced spectral data, and selecting the first 6 principal components on the spectral data with obviously different wave bands to form principal component dimension-reduced spectral data.
Preferably, support vector machine SVM is one of the best supervised learning algorithms that model the limits of the problem of linearity and non-linearity based on an inner kernel, can solve the support classification and regression problem, and is also very feasible for "overfitting", especially in small samples. The specific method of the support vector machine SVM in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously supports a vector machine SVM to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
a network search method is adopted, the optimal penalty factor c is set to be 2-15-215, the range of the kernel function parameter g is set to be 2-15-215, the step length is 0.1, and a radial basis kernel function is used as an optimal kernel function;
SVM modeling comparison, based on the whole SMC spectrum, and MSC spectrum different model training set with principal component analysis after selecting obvious difference wave band, the test set recognition effect and the determined c, g are shown in figures 5-8. From fig. 5-8, the recognition rates of the model training set and the prediction set established by the full-segment SMC spectrum and the MSC spectrum selected with the significant difference band and subjected to principal component analysis are both 100%, which shows that the SVM has a good effect on identifying the types of the antler cap.
Preferably, the random forest RF is a very flexible and practical method, with excellent accuracy, which can evaluate the importance of each feature on the classification problem, and can also obtain good results for the default value problem, and in the BF model, the tree of the setup tree affects the quality of the final result. The specific method of the random forest RF in the step 6 is as follows: searching an optimal tree and other influence parameters by adopting a genetic algorithm, wherein the number of variables to be optimized in the genetic algorithm is set to be 2, the number of individuals is set to be 20, the maximum genetic algebra is set to be 200, the binary digit number of the variables is set to be 10, the gully is set to be 0.95, the cross mutation probability is set to be 0.7, and the mutation probability is set to be 0.01;
establishing a random forest model, searching the mesh number of the tree by using a genetic algorithm, wherein the mesh number of the whole SMC spectral tree is 500, the mesh number of the MSC spectral tree with the main component analysis after the obvious difference wave band is selected is 600, and the modeling result is shown in figures 9-14. As can be seen from FIGS. 9-14, in the full-segment SMC spectral modeling, the recognition rate of the training set is 100%, the recognition rate of the test set is 95.8333%, and 1 sika deer antler cap has a recognition error. In the MSC spectral modeling of the principal component analysis after the obvious difference wave band is selected, the recognition rate of a training set is 100%, the recognition rate of a testing set is 87.5%, 3 sika deer antler caps are wrongly recognized, the situation that the established model is over-fitted is shown, and the recognition rate of the whole SMC spectral modeling is higher than that of the MSC spectral modeling of the principal component analysis after the obvious difference wave band is selected.
Preferably, the extreme learning machine ELM is a novel fast learning algorithm, learning does not need to adjust hidden layer nodes, that is, weights of the hidden layer nodes of the ELM network are randomly generated or artificially defined, the learning process only needs to calculate output weights, and the method has the advantages of few training parameters, high learning speed and strong generalization capability. The setting number of hidden nodes in the ELM model is directly related to the accuracy of the training set and the testing set, and meanwhile, the time consumed by the algorithm is prolonged due to the increase of the number of the hidden nodes, so that after the number of the hidden nodes is greater than the number of the training data sets according to the selection of relevant documents, the accuracy is not obviously increased and fluctuates, the number of the hidden nodes in the extreme learning machine ELM in the step 6 is set to be 40-100, and the optimal number of the hidden nodes is obtained through comparison;
in the ELM model, sigmoidal function is selected as the activation function, the number of hidden nodes is set to 40-100, the step size is 1, and comparison is performed, as shown in Table 2 (to avoid data being too dense, only the modeling result of the number of hidden nodes from 40 by the step size of 10 is shown here) and FIGS. 15-18. From table 2, when the number of hidden nodes in the full-segment SMC spectral modeling is 60, the recognition rate of the training set is 100%, the recognition rate of the test set is 95%, and a sika deer antler cap has a recognition error; when the MSC spectral modeling hidden nodes with obvious difference wave bands are selected, the number of the main component analysis MSC spectral modeling hidden nodes is 50, the recognition rate of the training set is 100%, the recognition rate of the test set is 95.833%, and a sika deer antler cap is wrongly recognized. Overall, the ELM algorithm has a stable recognition effect in 2 modeling cases.
As can be seen from the RF modeling effect, the recognition rate of the prediction set by using the full-range spectrum modeling in the extraction of the spectral data features is obviously higher than the recognition rate of the test set by using the difference spectrum modeling by 100% and 87.5%. It can also be seen from the ELM model that the recognition rate of the prediction set of the full-range spectral modeling is 100% which is slightly higher than the recognition rate of the prediction set of the differential spectral modeling of 95.8333%. Therefore, the modeling effect of the full-section spectrum is generally superior to that of the difference spectrum. Compared with a modeling method, the average recognition rate of the SVM model is 100%, the average recognition rate of the RF model is 93.75%, and the average recognition rate of the ELM model is 97.91665%. The SVM can be obtained to have good recognition effect no matter in a full-range spectrum or a difference spectrum, and is very suitable for analyzing infrared spectrum data in a small sample. The mid-infrared full-range spectral data is combined with SVM modeling to realize the best identification effect, and a new thought and method are provided for solving the problem of antler cap type and quality detection.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (7)
1. A antler cap type identification method based on mid-infrared spectrum and SVM is characterized in that: the method comprises the following steps:
step 1: crushing the sika deer antler cap and the red deer antler cap to be detected respectively, sieving by a 200-mesh sieve to obtain sika deer antler cap and red deer antler cap powder, and drying the sika deer antler cap powder and potassium bromide in a constant-temperature drying oven at 60 ℃ for 8-12 h;
step 2: accurately weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, placing the ground powder in an infrared tabletting mold to be pressed into tablets to obtain sika deer antler cap tablets, obtaining red deer antler cap tablets in the same way, placing the sika deer antler cap tablets and the red deer antler cap tablets on a medium-infrared spectrometer respectively, and collecting diffuse reflection spectra of the corresponding tablets respectively;
and step 3: performing multiple data processing on The acquired original spectrum by adopting The Unscrambler X10.4 software to obtain a preprocessed spectrum, and comparing The preprocessed spectrum with The corresponding original spectrum; the multiple data processing comprises Multivariate Scatter Correction (MSC);
adopting the full-band spectral data wave band and only taking out all wave bands with larger difference from the spectral data to carry out the following data analysis and comparison, thereby selecting the optimal wave band;
and 4, step 4: adopting a K-S test method, and setting the number of samples in a training set and a test set as 5: 2, dividing the total number of samples into a plurality of training sets and a plurality of testing sets, wherein the sample ratio of the sika deer antler cap and the red deer antler cap in the training sets to the testing sets is 1: 1;
and 5: the method for performing high-dimensional data compression and main characteristic component extraction on spectral data by adopting a normalization and principal component analysis dimension reduction method mainly comprises the following steps:
step 51: normalizing MSC spectral data by using a mapminmax function in matlab2014b, and setting a data mapping range to be 0-1;
step 52: carrying out principal component analysis on the normalized MSC antler cap spectral data by using Python 3.7 software, and respectively drawing a full-section MSC spectrum and a scatter diagram of the first two principal components in the selected MSC spectrum with obviously different wave bands;
step 6: modeling is carried out by three methods of a Support Vector Machine (SVM), a random forest Radio Frequency (RF) and an Extreme Learning Machine (ELM), and identification models of the SVM, the ELM, the RF sika deer antler cap and the red deer antler cap are established by respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obviously different wave bands as input variables.
2. The method for identifying antler cap types based on mid-infrared spectrum and SVM as claimed in claim 1, wherein the wave number range of the diffuse reflection spectrum collection software in the step 2 is 4000-400cm-1Resolution of 4cm-1The scanning times are 16, each sample is repeatedly scanned for 3 times, and an average spectrum is taken;
during spectrum collection, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
3. The method for identifying antler cap species based on mid-infrared spectrum and SVM according to claim 1, wherein the multiple data processing in step 3 further comprises standard normal variate transformation (SNV), smoothing SG, first-order derivative and second-order derivative.
4. The method for identifying the types of the antler caps based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the number selection method of the main components in the step 5 is a combination of the first method and the second method;
in the first method, the number cumulative contribution rate of the principal components is at least greater than or equal to 85%, and in the second method, the principal component characteristic value is greater than or equal to 1.
5. The method for identifying the antler cap types based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the specific method of the support vector machine SVM in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously supports a vector machine SVM to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
adopting a network search method, setting the optimal penalty factor c to be 2-15~215The range of the kernel function parameter g is set to 2-15~215The step size is 0.1, and the radial basis kernel function is used as the optimal kernel function.
6. The antler cap species identification method based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the specific method of the random forest RF in the step 6 is as follows: the optimal tree and other influence parameters are searched by adopting a genetic algorithm, the number of variables to be optimized in the genetic algorithm is set to be 2, the number of individuals is set to be 20, the maximum genetic algebra is set to be 200, the binary digit number of the variables is set to be 10, the gully is set to be 0.95, the cross mutation probability is set to be 0.7, and the mutation probability is set to be 0.01.
7. The antler cap kind identification method based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the number of hidden nodes in the extreme learning machine ELM in the step 6 is set to 40-100, and the comparison results in the optimal number of hidden nodes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918614.8A CN113610017B (en) | 2021-08-11 | 2021-08-11 | Deer horn cap type identification method based on mid-infrared spectrum and SVM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918614.8A CN113610017B (en) | 2021-08-11 | 2021-08-11 | Deer horn cap type identification method based on mid-infrared spectrum and SVM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113610017A true CN113610017A (en) | 2021-11-05 |
CN113610017B CN113610017B (en) | 2024-02-02 |
Family
ID=78340265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110918614.8A Active CN113610017B (en) | 2021-08-11 | 2021-08-11 | Deer horn cap type identification method based on mid-infrared spectrum and SVM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113610017B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2478823A1 (en) * | 2002-03-11 | 2003-09-18 | Allan L. Schaefer | Method for the evaluation of velvet antler |
US20050153359A1 (en) * | 2002-03-11 | 2005-07-14 | Schaefer Allan L. | Method for the evaluation of velvet antler |
CN109034261A (en) * | 2018-08-10 | 2018-12-18 | 武汉工程大学 | A kind of Near Infrared Spectroscopy Data Analysis based on support vector machines |
CN109374573A (en) * | 2018-10-12 | 2019-02-22 | 乐山师范学院 | Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis |
CN110765962A (en) * | 2019-10-29 | 2020-02-07 | 刘秀萍 | Plant identification and classification method based on three-dimensional point cloud contour dimension values |
CN111024645A (en) * | 2020-01-08 | 2020-04-17 | 山东金璋隆祥智能科技有限责任公司 | Quantitative research method for Dong' a donkey-hide gelatin, Fujiao and antler gelatin |
CN111458306A (en) * | 2019-01-22 | 2020-07-28 | 吉林农业大学 | Method for identifying different cartialgenous flowers by combining infrared spectrum with P L S-DA |
-
2021
- 2021-08-11 CN CN202110918614.8A patent/CN113610017B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2478823A1 (en) * | 2002-03-11 | 2003-09-18 | Allan L. Schaefer | Method for the evaluation of velvet antler |
US20050153359A1 (en) * | 2002-03-11 | 2005-07-14 | Schaefer Allan L. | Method for the evaluation of velvet antler |
CN109034261A (en) * | 2018-08-10 | 2018-12-18 | 武汉工程大学 | A kind of Near Infrared Spectroscopy Data Analysis based on support vector machines |
CN109374573A (en) * | 2018-10-12 | 2019-02-22 | 乐山师范学院 | Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis |
CN111458306A (en) * | 2019-01-22 | 2020-07-28 | 吉林农业大学 | Method for identifying different cartialgenous flowers by combining infrared spectrum with P L S-DA |
CN110765962A (en) * | 2019-10-29 | 2020-02-07 | 刘秀萍 | Plant identification and classification method based on three-dimensional point cloud contour dimension values |
CN111024645A (en) * | 2020-01-08 | 2020-04-17 | 山东金璋隆祥智能科技有限责任公司 | Quantitative research method for Dong' a donkey-hide gelatin, Fujiao and antler gelatin |
Non-Patent Citations (3)
Title |
---|
冯国红;朱玉杰;李耀翔;: "中红外光谱的进口木材树种识别方法", 光谱学与光谱分析, no. 07 * |
刘瑶;谭克竹;陈月华;王志朋;谢红;王立国;: "基于分段主成分分析和高光谱技术的大豆品种识别", 大豆科学, no. 04 * |
王璞;何明霞;李萌;曲秋红;刘锐;陈永德;: "太赫兹光谱技术在生物活性肽检测中应用研究", 光谱学与光谱分析, no. 09 * |
Also Published As
Publication number | Publication date |
---|---|
CN113610017B (en) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020073737A1 (en) | Quantitative spectroscopic data analysis and processing method based on deep learning | |
CN102305772A (en) | Method for screening characteristic wavelength of near infrared spectrum features based on heredity kernel partial least square method | |
CN101738373A (en) | Method for distinguishing varieties of crop seeds | |
CN109870421A (en) | It is a kind of based on visible light/near-infrared spectrum analysis incrementally timber varieties of trees classifying identification method | |
Puttipipatkajorn et al. | Development of calibration models for rapid determination of moisture content in rubber sheets using portable near-infrared spectrometers | |
CN108344701A (en) | Paraffin grade qualitative classification based on hyperspectral technique and quantitative homing method | |
CN103743705A (en) | Rapid detection method for sorghum halepense and similar species | |
Wu et al. | Variety identification of Chinese cabbage seeds using visible and near-infrared spectroscopy | |
CN113610017A (en) | Antler cap type identification method based on mid-infrared spectrum and SVM | |
CN113310943A (en) | Lotus root starch adulteration identification method based on machine learning | |
Huang et al. | Optimal wavelength selection for hyperspectral scattering prediction of apple firmness and soluble solids content | |
Song et al. | Rapid identification of adulterated rice based on data fusion of near-infrared spectroscopy and machine vision | |
Liu et al. | Research on the online rapid sensing method of moisture content in famous green tea spreading | |
Xu et al. | Detection of apple varieties by near‐infrared reflectance spectroscopy coupled with SPSO‐PFCM | |
CN114062306B (en) | Near infrared spectrum data segmentation preprocessing method | |
CN113063754B (en) | Leaf phosphorus content detection method based on weighted environment variable clustering | |
CN114971259A (en) | Method for analyzing quality consistency of formula product by using near infrared spectrum | |
CN111595802A (en) | Construction method and application of Clinacanthus nutans seed source place classification model based on NIR (near infrared spectroscopy) | |
CN113049526A (en) | Corn seed moisture content determination method based on terahertz attenuated total reflection | |
Bin et al. | On-line detection of Cerasus humilis fruit based on VIS/NIR spectroscopy combined with variable selection methods and GA-BP model. | |
Yong et al. | A novel interval integer genetic algorithm used for simultaneously selecting wavelengths and pre-processing methods | |
Chen et al. | Classification of wheat grain varieties using terahertz spectroscopy and convolutional neural network | |
CN111487219A (en) | Method for rapidly detecting content of bergamot pear lignin based on near infrared spectrum technology | |
Zhang et al. | Apple identity recognition based on SVM model parameter optimization and near infrared hyperspectral. | |
He et al. | Research on the upgrade and maintenance method of apple soluble solids content models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |