CN113610017A - Antler cap type identification method based on mid-infrared spectrum and SVM - Google Patents

Antler cap type identification method based on mid-infrared spectrum and SVM Download PDF

Info

Publication number
CN113610017A
CN113610017A CN202110918614.8A CN202110918614A CN113610017A CN 113610017 A CN113610017 A CN 113610017A CN 202110918614 A CN202110918614 A CN 202110918614A CN 113610017 A CN113610017 A CN 113610017A
Authority
CN
China
Prior art keywords
cap
antler
spectrum
svm
deer antler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110918614.8A
Other languages
Chinese (zh)
Other versions
CN113610017B (en
Inventor
武海巍
杨承恩
胡俊海
袁月明
付辰琦
邵海龙
刘浩
周建宇
玉苏甫·阿布拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Agricultural University
Original Assignee
Jilin Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Agricultural University filed Critical Jilin Agricultural University
Priority to CN202110918614.8A priority Critical patent/CN113610017B/en
Publication of CN113610017A publication Critical patent/CN113610017A/en
Application granted granted Critical
Publication of CN113610017B publication Critical patent/CN113610017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • G06F2218/16Classification; Matching by matching signal segments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a antler cap type identification method based on a mid-infrared spectrum and an SVM (support vector machine), and relates to the technical field of antler cap identification. The method comprises the steps of drying, crushing and flaking a sample, collecting diffuse reflection spectra, carrying out multiple data processing on the collected original spectra, and classifying the sample by using a K-S inspection method; then, performing high-dimensional data compression and main characteristic component extraction on the spectral data by adopting a normalization and principal component analysis dimensionality reduction method; and finally, respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obvious difference wave band as input variables, and establishing an identification model of the SVM, the ELM and the RF sika deer antler cap and the red deer antler cap. According to the method, efficient, accurate and lossless antler cap variety identification of the sika deer antler cap and the red deer antler cap is realized through a mid-infrared spectrum technology, a Support Vector Machine (SVM), a random forest algorithm (RF) and an Extreme Learning Machine (ELM) model establishing method.

Description

Antler cap type identification method based on mid-infrared spectrum and SVM
Technical Field
The invention belongs to the technical field of antler cap identification, and particularly relates to a antler cap type identification method based on a mid-infrared spectrum and an SVM.
Background
The antler cap is a platform-shaped horn plate left on the head of a male deer after the spotted deer or the red deer adopts antler, the horn plate is gradually ossified, and when the horn is changed, the horn handle and the horn plate are separated and broken to form a fallen horn, so the antler cap has higher medicinal value, can be used for treating sore pain and swelling toxin, pain caused by blood stasis, internal injury caused by deficiency tuberculosis and the like, and has obvious effect on various glandular inflammations. With the development of deer medicinal materials in China, deer-horn caps begin to get hot in the market. Although the sika deer antler cap and the red deer antler cap are all deer medicinal materials regulated by the state, the sika deer antler cap is higher than the red deer antler cap in price and medicinal value and has the same appearance and internal molecular structure, so that the problem that the types of the antler caps are not consistent in the market becomes one of common problems, and how to find the efficient, online and low-cost antler cap type identification method is the most important.
At present, scholars at home and abroad mainly carry out detection and identification research on the antler, and although a plurality of scholars have better results on the detection and identification research on the antler, the detection and identification of the medicinal antler cap are very few. The infrared spectrum technology has the advantages of high efficiency, rapidness, low cost, no damage and the like, and has good prospects in the aspects of agriculture and forestry product detection and identification. At present, the research on the detection and identification of the antler cap has strong limitation, a test sample can be identified as the quality antler cap just like a standard map, otherwise, other detection comparisons are carried out, and the types of the antler cap cannot be distinguished. Therefore, the efficient, rapid and accurate antler cap variety identification method is found based on the advantages of the mid-infrared spectrum technology and the wide application combined with mathematical modeling.
Disclosure of Invention
The invention aims to provide a deer antler cap type identification method based on a mid-infrared spectrum and an SVM (support vector machine), and solves the problems that in the prior art, sika deer antler caps and red deer antler caps cannot be identified efficiently, accurately and nondestructively, and further a new thought and a new method cannot be provided for solving the problem of deer antler cap type and quality detection through a mid-infrared spectrum technology, a support vector machine SVM, a random forest algorithm RF and an extreme learning machine ELM (extreme learning machine model).
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a antler cap type identification method based on a mid-infrared spectrum and an SVM (support vector machine), which comprises the following steps of:
step 1: crushing the sika deer antler cap and the red deer antler cap to be detected respectively, sieving by a 200-mesh sieve to obtain sika deer antler cap and red deer antler cap powder, and drying the sika deer antler cap powder and potassium bromide in a constant-temperature drying oven at 60 ℃ for 8-12 h;
step 2: accurately weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, placing the ground powder in an infrared tabletting mold to be pressed into tablets to obtain sika deer antler cap tablets, obtaining red deer antler cap tablets in the same way, placing the sika deer antler cap tablets and the red deer antler cap tablets on a medium-infrared spectrometer respectively, and collecting diffuse reflection spectra of the corresponding tablets respectively;
and step 3: performing multiple data processing on The acquired original spectrum by adopting The Unscrambler X10.4 software to obtain a preprocessed spectrum, and comparing The preprocessed spectrum with The corresponding original spectrum; the multiple data processing comprises Multivariate Scatter Correction (MSC);
adopting the full-band spectral data wave band and only taking out all wave bands with larger difference from the spectral data to carry out the following data analysis and comparison, thereby selecting the optimal wave band;
and 4, step 4: adopting a K-S test method, and setting the number of samples in a training set and a test set as 5: 2, dividing the total number of samples into a plurality of training sets and a plurality of testing sets, wherein the sample ratio of the sika deer antler cap and the red deer antler cap in the training sets to the testing sets is 1: 1;
and 5: the method for performing high-dimensional data compression and main characteristic component extraction on spectral data by adopting a normalization and principal component analysis dimension reduction method mainly comprises the following steps:
step 51: normalizing MSC spectral data by using a mapminmax function in matlab2014b, and setting a data mapping range to be 0-1;
step 52: carrying out principal component analysis on the normalized MSC antler cap spectral data by using Python 3.7 software, and respectively drawing a full-section MSC spectrum and a scatter diagram of the first two principal components in the selected MSC spectrum with obviously different wave bands;
step 6: modeling is carried out by three methods of a Support Vector Machine (SVM), a random forest Radio Frequency (RF) and an Extreme Learning Machine (ELM), and identification models of the SVM, the ELM, the RF sika deer antler cap and the red deer antler cap are established by respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obviously different wave bands as input variables.
Further, the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm < -1 >, the resolution is 4cm < -1 >, the scanning frequency is 16 times, each sample is repeatedly scanned for 3 times, and an average spectrum is obtained;
during spectrum collection, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
Further, the multiple data processing in step 3 further includes standard normal variable transformation SNV, smoothing SG, first derivative and second derivative.
Further, the number selection method of the principal components in the step 5 is a combination of the first method and the second method;
in the first method, the number cumulative contribution rate of the principal components is at least greater than or equal to 85%, and in the second method, the principal component characteristic value is greater than or equal to 1.
Further, the specific method of the support vector machine SVM in step 6 is: firstly, a training set adopts K-CV cross validation and simultaneously supports a vector machine SVM to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
and (3) adopting a network search method, setting the optimal penalty factor c to be 2-15-215, setting the range of the kernel function parameter g to be 2-15-215, setting the step length to be 0.1, and using the radial basis kernel function as the optimal kernel function.
Further, the specific method of random forest RF in step 6 is: the optimal tree and other influence parameters are searched by adopting a genetic algorithm, the number of variables to be optimized in the genetic algorithm is set to be 2, the number of individuals is set to be 20, the maximum genetic algebra is set to be 200, the binary digit number of the variables is set to be 10, the gully is set to be 0.95, the cross mutation probability is set to be 0.7, and the mutation probability is set to be 0.01.
Further, the number of hidden nodes in the extreme learning machine ELM in step 6 is set to 40-100, and the optimal number of hidden nodes is obtained through comparison.
The invention has the following beneficial effects:
1. the invention utilizes the mid-infrared spectrum technology, the SVM (support vector machine), the RF (random forest algorithm) and the ELM (extreme learning machine) to establish the model, realizes the efficient, accurate and lossless antler cap variety identification of the sika deer antler cap and the red deer antler cap, and provides a new thought and method for solving the antler cap variety and quality detection problems.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a graph of the average spectra of MSC antler caps.
Fig. 2 is a graph of normalized mean MSC spectral data.
Table 1 shows the top 10 principal component eigenvalues and cumulative contribution rates.
FIG. 3 is a scatter plot of the first 2 principal components of the full-segment spectral test set.
FIG. 4 is a scatter plot of the first 2 principal components of the difference spectrum test set.
FIG. 5 is a curve of the parameter-optimizing fitness for full-span spectral grid search.
FIG. 6 shows the fitting results of the full-range spectral test set.
FIG. 7 is a graph of the optimum fitness for searching parameters of the difference spectrum grid
FIG. 8 shows the fitting results of the difference spectrum test set.
FIG. 9 is a full-segment spectrum iterative error variation curve.
FIG. 10 shows the fitting results of the full-range spectral training set.
FIG. 11 shows the results of a full-range spectral test set fit.
Fig. 12 is a variation curve of the iterative error of the difference spectrum.
FIG. 13 shows the fitting results of the difference spectrum training set.
FIG. 14 shows the fitting results of the difference spectrum test set.
Table 2 compares the predicted results of the ELM algorithm.
FIG. 15 shows the results of a full-range spectral training set fitting.
FIG. 16 shows the results of a full-range spectral test set fitting.
FIG. 17 shows the difference spectrum training set fitting results.
FIG. 18 shows the fitting results of the difference spectrum test set.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-18 and tables 1-2, the present invention is a method for identifying antler cap types based on mid-infrared spectroscopy and SVM, comprising the steps of:
step 1: crushing the sika deer antler cap and the red deer antler cap to be detected respectively, sieving by a 200-mesh sieve to obtain sika deer antler cap and red deer antler cap powder, and drying the sika deer antler cap powder and potassium bromide in a constant-temperature drying oven at 60 ℃ for 8-12 h; collecting 42 samples of sika deer antler cap and red deer antler cap respectively, wherein the total number of the samples is 84;
step 2: accurately weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, placing the ground powder in an infrared tabletting mold to be pressed into tablets to obtain sika deer antler cap tablets, obtaining red deer antler cap tablets in the same way, placing the sika deer antler cap tablets and the red deer antler cap tablets on a medium-infrared spectrometer respectively, and collecting diffuse reflection spectra of the corresponding tablets respectively;
and step 3: the invention adopts The Unscrambler X10.4 software to carry out multiple data processing on The acquired original spectrum to obtain a preprocessed spectrum, and then compares The preprocessed spectrum with The corresponding original spectrum; the multiple data processing comprises multivariate scattering correction MSC, standard normal variable transformation SNV, smooth SG, first derivative and second derivative; comparing the spectra after various pretreatments, it can be seen that the differences of the spectra after the multivariate scatter correction MSC treatment are more obvious, as shown in FIG. 1. As can be seen from FIG. 1, the deer antler cap and the red deer antler cap are obviously different in the wave bands 740-;
the characteristic peak in the spectrum data is the main factor for judging the spectrum difference, and the full-band spectrum data wave band is adopted and all wave bands with larger difference are only taken out from the spectrum data for later data analysis and comparison, so that the optimal wave band is selected;
and 4, step 4: the K-S test is a method for rapidly detecting the division of a training set and a test set based on an accumulative distribution function, and adopts a K-S test method, wherein the number of samples in the training set and the test set is 5: 2, dividing 84 samples into 60 training sets and 24 testing sets, wherein the sample ratio of the sika deer antler cap and the red deer antler cap in the training sets to the testing sets is 1: 1; 30 parts of each of sika deer antler caps and red deer antler caps are trained, and 12 parts of each of sika deer antler caps and red deer antler caps are tested;
and 5: the mid-infrared spectrum band range is 4000-400cm-1The method has the characteristics of multiple wave bands, large data volume and strong redundancy, adopts a normalization and principal component analysis dimensionality reduction method to perform high-dimensional data compression and main characteristic component extraction on spectral data, and mainly comprises the following steps:
step 51: normalizing MSC spectral data by using a mapminmax function in matlab2014b, and setting a data mapping range to be 0-1, as shown in FIG. 2;
step 52: principal component analysis is carried out on the normalized MSC antler-hat spectral data by using Python 3.7 software, and the first 10 principal component characteristic values and the accumulated contribution rates of the full-segment spectrum and the spectrum with the selected distinct difference waveband are respectively shown in Table 1, wherein the contribution rate of PCA1 in the full-segment MSC spectrum is the largest and is 55.62909%, the contribution rate of PCA2 is 20.63217%, the accumulated contribution rate of the first 3 PCs is 84.15974%, the accumulated contribution rate of the first 8 PCs is 97.16765%, the contribution rate of each subsequent PC is less than 1%, and the increasing speed of the accumulated contribution rate is gradually reduced. In the MSC spectrum with the obviously different waveband, the PCA1 has the largest contribution rate of 59.52195%, the PCA2 has the largest contribution rate of 29.89027%, the cumulative contribution rate of the first 3 PCs is 92.68794%, the cumulative contribution rate of the first 6 PCs is 97.92108%, the subsequent PC contribution rates are all smaller than 1%, and the increasing speed of the cumulative contribution rates is gradually reduced;
respectively drawing the full-segment MSC spectrum and a scatter diagram of the first two principal components in the selected MSC spectrum with the obviously different wave bands, as shown in FIGS. 3-4;
step 6: modeling is carried out by three methods of a Support Vector Machine (SVM), a random forest Radio Frequency (RF) and an Extreme Learning Machine (ELM), and identification models of the SVM, the ELM, the RF sika deer antler cap and the red deer antler cap are established by respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obviously different wave bands as input variables.
Preferably, the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm < -1 >, the resolution is 4cm < -1 >, the scanning frequency is 16 times, each sample is repeatedly scanned for 3 times, and an average spectrum is obtained;
during spectrum collection, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
Preferably, the number selection method of the main components in the step 5 is a combination of the first method and the second method;
the number cumulative contribution rate of the principal components is at least greater than or equal to 85% in the method one, and the principal component characteristic value is greater than or equal to 1 in the method two; and selecting the first 8 principal components on the full-section spectral data to form principal component dimension-reduced spectral data, and selecting the first 6 principal components on the spectral data with obviously different wave bands to form principal component dimension-reduced spectral data.
Preferably, support vector machine SVM is one of the best supervised learning algorithms that model the limits of the problem of linearity and non-linearity based on an inner kernel, can solve the support classification and regression problem, and is also very feasible for "overfitting", especially in small samples. The specific method of the support vector machine SVM in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously supports a vector machine SVM to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
a network search method is adopted, the optimal penalty factor c is set to be 2-15-215, the range of the kernel function parameter g is set to be 2-15-215, the step length is 0.1, and a radial basis kernel function is used as an optimal kernel function;
SVM modeling comparison, based on the whole SMC spectrum, and MSC spectrum different model training set with principal component analysis after selecting obvious difference wave band, the test set recognition effect and the determined c, g are shown in figures 5-8. From fig. 5-8, the recognition rates of the model training set and the prediction set established by the full-segment SMC spectrum and the MSC spectrum selected with the significant difference band and subjected to principal component analysis are both 100%, which shows that the SVM has a good effect on identifying the types of the antler cap.
Preferably, the random forest RF is a very flexible and practical method, with excellent accuracy, which can evaluate the importance of each feature on the classification problem, and can also obtain good results for the default value problem, and in the BF model, the tree of the setup tree affects the quality of the final result. The specific method of the random forest RF in the step 6 is as follows: searching an optimal tree and other influence parameters by adopting a genetic algorithm, wherein the number of variables to be optimized in the genetic algorithm is set to be 2, the number of individuals is set to be 20, the maximum genetic algebra is set to be 200, the binary digit number of the variables is set to be 10, the gully is set to be 0.95, the cross mutation probability is set to be 0.7, and the mutation probability is set to be 0.01;
establishing a random forest model, searching the mesh number of the tree by using a genetic algorithm, wherein the mesh number of the whole SMC spectral tree is 500, the mesh number of the MSC spectral tree with the main component analysis after the obvious difference wave band is selected is 600, and the modeling result is shown in figures 9-14. As can be seen from FIGS. 9-14, in the full-segment SMC spectral modeling, the recognition rate of the training set is 100%, the recognition rate of the test set is 95.8333%, and 1 sika deer antler cap has a recognition error. In the MSC spectral modeling of the principal component analysis after the obvious difference wave band is selected, the recognition rate of a training set is 100%, the recognition rate of a testing set is 87.5%, 3 sika deer antler caps are wrongly recognized, the situation that the established model is over-fitted is shown, and the recognition rate of the whole SMC spectral modeling is higher than that of the MSC spectral modeling of the principal component analysis after the obvious difference wave band is selected.
Preferably, the extreme learning machine ELM is a novel fast learning algorithm, learning does not need to adjust hidden layer nodes, that is, weights of the hidden layer nodes of the ELM network are randomly generated or artificially defined, the learning process only needs to calculate output weights, and the method has the advantages of few training parameters, high learning speed and strong generalization capability. The setting number of hidden nodes in the ELM model is directly related to the accuracy of the training set and the testing set, and meanwhile, the time consumed by the algorithm is prolonged due to the increase of the number of the hidden nodes, so that after the number of the hidden nodes is greater than the number of the training data sets according to the selection of relevant documents, the accuracy is not obviously increased and fluctuates, the number of the hidden nodes in the extreme learning machine ELM in the step 6 is set to be 40-100, and the optimal number of the hidden nodes is obtained through comparison;
in the ELM model, sigmoidal function is selected as the activation function, the number of hidden nodes is set to 40-100, the step size is 1, and comparison is performed, as shown in Table 2 (to avoid data being too dense, only the modeling result of the number of hidden nodes from 40 by the step size of 10 is shown here) and FIGS. 15-18. From table 2, when the number of hidden nodes in the full-segment SMC spectral modeling is 60, the recognition rate of the training set is 100%, the recognition rate of the test set is 95%, and a sika deer antler cap has a recognition error; when the MSC spectral modeling hidden nodes with obvious difference wave bands are selected, the number of the main component analysis MSC spectral modeling hidden nodes is 50, the recognition rate of the training set is 100%, the recognition rate of the test set is 95.833%, and a sika deer antler cap is wrongly recognized. Overall, the ELM algorithm has a stable recognition effect in 2 modeling cases.
As can be seen from the RF modeling effect, the recognition rate of the prediction set by using the full-range spectrum modeling in the extraction of the spectral data features is obviously higher than the recognition rate of the test set by using the difference spectrum modeling by 100% and 87.5%. It can also be seen from the ELM model that the recognition rate of the prediction set of the full-range spectral modeling is 100% which is slightly higher than the recognition rate of the prediction set of the differential spectral modeling of 95.8333%. Therefore, the modeling effect of the full-section spectrum is generally superior to that of the difference spectrum. Compared with a modeling method, the average recognition rate of the SVM model is 100%, the average recognition rate of the RF model is 93.75%, and the average recognition rate of the ELM model is 97.91665%. The SVM can be obtained to have good recognition effect no matter in a full-range spectrum or a difference spectrum, and is very suitable for analyzing infrared spectrum data in a small sample. The mid-infrared full-range spectral data is combined with SVM modeling to realize the best identification effect, and a new thought and method are provided for solving the problem of antler cap type and quality detection.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (7)

1. A antler cap type identification method based on mid-infrared spectrum and SVM is characterized in that: the method comprises the following steps:
step 1: crushing the sika deer antler cap and the red deer antler cap to be detected respectively, sieving by a 200-mesh sieve to obtain sika deer antler cap and red deer antler cap powder, and drying the sika deer antler cap powder and potassium bromide in a constant-temperature drying oven at 60 ℃ for 8-12 h;
step 2: accurately weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, placing the ground powder in an infrared tabletting mold to be pressed into tablets to obtain sika deer antler cap tablets, obtaining red deer antler cap tablets in the same way, placing the sika deer antler cap tablets and the red deer antler cap tablets on a medium-infrared spectrometer respectively, and collecting diffuse reflection spectra of the corresponding tablets respectively;
and step 3: performing multiple data processing on The acquired original spectrum by adopting The Unscrambler X10.4 software to obtain a preprocessed spectrum, and comparing The preprocessed spectrum with The corresponding original spectrum; the multiple data processing comprises Multivariate Scatter Correction (MSC);
adopting the full-band spectral data wave band and only taking out all wave bands with larger difference from the spectral data to carry out the following data analysis and comparison, thereby selecting the optimal wave band;
and 4, step 4: adopting a K-S test method, and setting the number of samples in a training set and a test set as 5: 2, dividing the total number of samples into a plurality of training sets and a plurality of testing sets, wherein the sample ratio of the sika deer antler cap and the red deer antler cap in the training sets to the testing sets is 1: 1;
and 5: the method for performing high-dimensional data compression and main characteristic component extraction on spectral data by adopting a normalization and principal component analysis dimension reduction method mainly comprises the following steps:
step 51: normalizing MSC spectral data by using a mapminmax function in matlab2014b, and setting a data mapping range to be 0-1;
step 52: carrying out principal component analysis on the normalized MSC antler cap spectral data by using Python 3.7 software, and respectively drawing a full-section MSC spectrum and a scatter diagram of the first two principal components in the selected MSC spectrum with obviously different wave bands;
step 6: modeling is carried out by three methods of a Support Vector Machine (SVM), a random forest Radio Frequency (RF) and an Extreme Learning Machine (ELM), and identification models of the SVM, the ELM, the RF sika deer antler cap and the red deer antler cap are established by respectively taking the full-section SMC spectrum after principal component analysis and the MSC spectrum after the principal component analysis with the selected obviously different wave bands as input variables.
2. The method for identifying antler cap types based on mid-infrared spectrum and SVM as claimed in claim 1, wherein the wave number range of the diffuse reflection spectrum collection software in the step 2 is 4000-400cm-1Resolution of 4cm-1The scanning times are 16, each sample is repeatedly scanned for 3 times, and an average spectrum is taken;
during spectrum collection, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
3. The method for identifying antler cap species based on mid-infrared spectrum and SVM according to claim 1, wherein the multiple data processing in step 3 further comprises standard normal variate transformation (SNV), smoothing SG, first-order derivative and second-order derivative.
4. The method for identifying the types of the antler caps based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the number selection method of the main components in the step 5 is a combination of the first method and the second method;
in the first method, the number cumulative contribution rate of the principal components is at least greater than or equal to 85%, and in the second method, the principal component characteristic value is greater than or equal to 1.
5. The method for identifying the antler cap types based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the specific method of the support vector machine SVM in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously supports a vector machine SVM to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
adopting a network search method, setting the optimal penalty factor c to be 2-15~215The range of the kernel function parameter g is set to 2-15~215The step size is 0.1, and the radial basis kernel function is used as the optimal kernel function.
6. The antler cap species identification method based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the specific method of the random forest RF in the step 6 is as follows: the optimal tree and other influence parameters are searched by adopting a genetic algorithm, the number of variables to be optimized in the genetic algorithm is set to be 2, the number of individuals is set to be 20, the maximum genetic algebra is set to be 200, the binary digit number of the variables is set to be 10, the gully is set to be 0.95, the cross mutation probability is set to be 0.7, and the mutation probability is set to be 0.01.
7. The antler cap kind identification method based on the mid-infrared spectrum and the SVM as claimed in claim 1, wherein the number of hidden nodes in the extreme learning machine ELM in the step 6 is set to 40-100, and the comparison results in the optimal number of hidden nodes.
CN202110918614.8A 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM Active CN113610017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110918614.8A CN113610017B (en) 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110918614.8A CN113610017B (en) 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM

Publications (2)

Publication Number Publication Date
CN113610017A true CN113610017A (en) 2021-11-05
CN113610017B CN113610017B (en) 2024-02-02

Family

ID=78340265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110918614.8A Active CN113610017B (en) 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM

Country Status (1)

Country Link
CN (1) CN113610017B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2478823A1 (en) * 2002-03-11 2003-09-18 Allan L. Schaefer Method for the evaluation of velvet antler
US20050153359A1 (en) * 2002-03-11 2005-07-14 Schaefer Allan L. Method for the evaluation of velvet antler
CN109034261A (en) * 2018-08-10 2018-12-18 武汉工程大学 A kind of Near Infrared Spectroscopy Data Analysis based on support vector machines
CN109374573A (en) * 2018-10-12 2019-02-22 乐山师范学院 Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis
CN110765962A (en) * 2019-10-29 2020-02-07 刘秀萍 Plant identification and classification method based on three-dimensional point cloud contour dimension values
CN111024645A (en) * 2020-01-08 2020-04-17 山东金璋隆祥智能科技有限责任公司 Quantitative research method for Dong' a donkey-hide gelatin, Fujiao and antler gelatin
CN111458306A (en) * 2019-01-22 2020-07-28 吉林农业大学 Method for identifying different cartialgenous flowers by combining infrared spectrum with P L S-DA

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2478823A1 (en) * 2002-03-11 2003-09-18 Allan L. Schaefer Method for the evaluation of velvet antler
US20050153359A1 (en) * 2002-03-11 2005-07-14 Schaefer Allan L. Method for the evaluation of velvet antler
CN109034261A (en) * 2018-08-10 2018-12-18 武汉工程大学 A kind of Near Infrared Spectroscopy Data Analysis based on support vector machines
CN109374573A (en) * 2018-10-12 2019-02-22 乐山师范学院 Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis
CN111458306A (en) * 2019-01-22 2020-07-28 吉林农业大学 Method for identifying different cartialgenous flowers by combining infrared spectrum with P L S-DA
CN110765962A (en) * 2019-10-29 2020-02-07 刘秀萍 Plant identification and classification method based on three-dimensional point cloud contour dimension values
CN111024645A (en) * 2020-01-08 2020-04-17 山东金璋隆祥智能科技有限责任公司 Quantitative research method for Dong' a donkey-hide gelatin, Fujiao and antler gelatin

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯国红;朱玉杰;李耀翔;: "中红外光谱的进口木材树种识别方法", 光谱学与光谱分析, no. 07 *
刘瑶;谭克竹;陈月华;王志朋;谢红;王立国;: "基于分段主成分分析和高光谱技术的大豆品种识别", 大豆科学, no. 04 *
王璞;何明霞;李萌;曲秋红;刘锐;陈永德;: "太赫兹光谱技术在生物活性肽检测中应用研究", 光谱学与光谱分析, no. 09 *

Also Published As

Publication number Publication date
CN113610017B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
WO2020073737A1 (en) Quantitative spectroscopic data analysis and processing method based on deep learning
CN102305772A (en) Method for screening characteristic wavelength of near infrared spectrum features based on heredity kernel partial least square method
CN101738373A (en) Method for distinguishing varieties of crop seeds
CN109870421A (en) It is a kind of based on visible light/near-infrared spectrum analysis incrementally timber varieties of trees classifying identification method
Puttipipatkajorn et al. Development of calibration models for rapid determination of moisture content in rubber sheets using portable near-infrared spectrometers
CN108344701A (en) Paraffin grade qualitative classification based on hyperspectral technique and quantitative homing method
CN103743705A (en) Rapid detection method for sorghum halepense and similar species
Wu et al. Variety identification of Chinese cabbage seeds using visible and near-infrared spectroscopy
CN113610017A (en) Antler cap type identification method based on mid-infrared spectrum and SVM
CN113310943A (en) Lotus root starch adulteration identification method based on machine learning
Huang et al. Optimal wavelength selection for hyperspectral scattering prediction of apple firmness and soluble solids content
Song et al. Rapid identification of adulterated rice based on data fusion of near-infrared spectroscopy and machine vision
Liu et al. Research on the online rapid sensing method of moisture content in famous green tea spreading
Xu et al. Detection of apple varieties by near‐infrared reflectance spectroscopy coupled with SPSO‐PFCM
CN114062306B (en) Near infrared spectrum data segmentation preprocessing method
CN113063754B (en) Leaf phosphorus content detection method based on weighted environment variable clustering
CN114971259A (en) Method for analyzing quality consistency of formula product by using near infrared spectrum
CN111595802A (en) Construction method and application of Clinacanthus nutans seed source place classification model based on NIR (near infrared spectroscopy)
CN113049526A (en) Corn seed moisture content determination method based on terahertz attenuated total reflection
Bin et al. On-line detection of Cerasus humilis fruit based on VIS/NIR spectroscopy combined with variable selection methods and GA-BP model.
Yong et al. A novel interval integer genetic algorithm used for simultaneously selecting wavelengths and pre-processing methods
Chen et al. Classification of wheat grain varieties using terahertz spectroscopy and convolutional neural network
CN111487219A (en) Method for rapidly detecting content of bergamot pear lignin based on near infrared spectrum technology
Zhang et al. Apple identity recognition based on SVM model parameter optimization and near infrared hyperspectral.
He et al. Research on the upgrade and maintenance method of apple soluble solids content models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant