CN113610017B - Deer horn cap type identification method based on mid-infrared spectrum and SVM - Google Patents

Deer horn cap type identification method based on mid-infrared spectrum and SVM Download PDF

Info

Publication number
CN113610017B
CN113610017B CN202110918614.8A CN202110918614A CN113610017B CN 113610017 B CN113610017 B CN 113610017B CN 202110918614 A CN202110918614 A CN 202110918614A CN 113610017 B CN113610017 B CN 113610017B
Authority
CN
China
Prior art keywords
spectrum
caps
deer antler
svm
cap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110918614.8A
Other languages
Chinese (zh)
Other versions
CN113610017A (en
Inventor
武海巍
杨承恩
胡俊海
袁月明
付辰琦
邵海龙
刘浩
周建宇
玉苏甫·阿布拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Agricultural University
Original Assignee
Jilin Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Agricultural University filed Critical Jilin Agricultural University
Priority to CN202110918614.8A priority Critical patent/CN113610017B/en
Publication of CN113610017A publication Critical patent/CN113610017A/en
Application granted granted Critical
Publication of CN113610017B publication Critical patent/CN113610017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • G06F2218/16Classification; Matching by matching signal segments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a deer horn cap type identification method based on mid-infrared spectrum and SVM, and relates to the technical field of deer horn cap identification. The method comprises the steps of firstly drying, crushing and tabletting samples, collecting diffuse reflection spectra, performing multiple data processing on the collected original spectra, and classifying the samples by using a K-S (K-S) test method; then adopting a normalization and principal component analysis dimension reduction method to carry out high-dimensional data compression and extraction of principal characteristic components on the optical data; and finally, respectively taking the total section SMC spectrum after the main component analysis and the MSC spectrum after the main component analysis with obvious difference wave bands as input variables, and establishing an identification model of SVM, ELM, RF sika deer antler caps and red deer antler caps. The invention realizes the efficient, accurate and nondestructive identification of deer horn caps and deer horn cap varieties by a mid-infrared spectrum technology and a method for establishing a model by a support vector machine SVM, a random forest algorithm RF and an extreme learning machine ELM.

Description

Deer horn cap type identification method based on mid-infrared spectrum and SVM
Technical Field
The invention belongs to the technical field of identification of deer horn caps, and particularly relates to a deer horn cap type identification method based on mid-infrared spectrum and SVM.
Background
The deer horn cap is a platform-shaped horn disc left on the head of a male deer after the head of the sika deer or the red deer is used for picking the antler, is gradually ossified, and when the horn is changed, the horn and the horn disc are separated and broken, and the fallen horn has higher medicinal value, can be used for treating sore and swelling toxin, pain caused by blood stasis, internal injury caused by deficiency tuberculosis and the like, and has obvious effect on various gland inflammations. Along with the development of deer medicinal materials in China, deer caps begin to be heated in the market. Although the sika deer antler caps and the red deer antler caps are all the deer medicinal materials regulated by the state, the sika deer antler caps are higher than the red deer antler caps in price and medicinal value, and the appearance and the internal molecular structure of the sika deer antler caps are almost the same, so that the type of the deer antler caps in the market is not consistent, one of the common problems is caused, and how to find an efficient, online and low-cost deer antler cap type identification method is important.
At present, scholars at home and abroad mainly conduct detection and identification research on the antler, although many scholars have better results on detection and identification research on the antler, the detection and identification of medicinal antler caps are very few. The infrared spectrum technology is effective, quick, low in cost, nondestructive and the like, and has good prospect in the aspects of agriculture and forestry product detection and identification. The detection and identification research of the deer horn caps has stronger limitation, the test sample can be identified as the deer horn caps of the genuine products just like the standard patterns, otherwise, other detection and comparison are needed, and the types of the deer horn caps cannot be distinguished. Therefore, the invention finds a high-efficiency, rapid and accurate deer horn cap variety identification method based on the advantages of the mid-infrared spectrum technology and the combination of extensive application and mathematical modeling.
Disclosure of Invention
The invention aims to provide a deer horn cap type identification method based on a mid-infrared spectrum and SVM, which solves the problems that the deer horn cap and the deer horn cap cannot be identified efficiently, accurately and nondestructively in the prior art and further cannot provide a new thought and method for solving the type and quality detection problem of the deer horn cap by a mid-infrared spectrum technology and a method for establishing a model by a support vector machine SVM, a random forest algorithm RF and an extreme learning machine ELM.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention discloses a deer horn cap type identification method based on mid-infrared spectrum and SVM, which comprises the following steps:
step 1: respectively crushing the deer antler caps to be detected and the deer antler caps to be detected, sieving the crushed deer antler caps and the deer antler caps by a 200-mesh sieve to obtain powder of the deer antler caps and the deer antler caps, and then placing the deer antler caps powder and potassium bromide into a constant-temperature drying oven at 60 ℃ for drying for 8-12 hours;
step 2: precisely weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, putting the ground powder into an infrared tabletting mold, and tabletting to obtain sika deer antler cap tablets, and respectively putting the sika deer antler cap tablets and the sika deer antler cap tablets on a mid-infrared spectrometer in the same way to respectively acquire diffuse reflection spectrums of corresponding tablets;
step 3: adopting The Unscrambler X10.4.4 software to perform multiple data processing on the collected original spectrum to obtain a preprocessed spectrum, and comparing the preprocessed spectrum with a corresponding original spectrum; the multiple data processing comprises a multi-element scattering correction MSC;
the method comprises the steps of adopting full-band spectrum data wave bands and only taking out all wave bands with larger difference from spectrum data to carry out subsequent data analysis and comparison, so as to select an optimal wave band;
step 4: the K-S test method is adopted, and the number of samples in the training set and the test set is 5:2, dividing the total sample number into a plurality of training sets and a plurality of test sets, wherein the sample ratio of the sika deer antler caps to the red deer antler caps in the training sets and the test sets is 1:1;
step 5: the method for reducing dimension by adopting normalization and principal component analysis is used for carrying out high-dimensional data compression and extraction of main characteristic components on the optical data, and mainly comprises the following steps:
step 51: carrying out normalization processing on MSC spectrum data by adopting a mapmamax function in matlab2014b, and setting a data mapping range to be 0-1;
step 52: performing principal component analysis on the normalized MSC deer horn cap spectrum data by using Python 3.7 software, and respectively drawing a full-segment MSC spectrum and selecting a scatter diagram of the first two principal components in the MSC spectrum with obvious difference wave bands;
step 6: modeling by using three methods of a Support Vector Machine (SVM), a Random Forest (RF) and an Extreme Learning Machine (ELM), respectively taking a full-segment SMC spectrum after principal component analysis and an MSC spectrum after principal component analysis after a distinct difference wave band is selected as input variables, and establishing an identification model of the SVM, the ELM, the RF sika deer antler cap and the sika deer antler cap.
Further, the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm < -1 >, the resolution is 4cm < -1 >, the scanning times are 16, each sample is repeatedly scanned for 3 times, and an average spectrum is obtained;
in the spectrum acquisition process, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
Further, the multiple data processing in the step 3 further includes standard normal variable transformation SNV, smoothing SG, first derivative and second derivative.
Further, the number selection method of the main components in the step 5 is a combination of a first method and a second method;
the number accumulation contribution rate of the first main component is at least more than or equal to 85%, and the characteristic value of the second main component is more than or equal to 1.
Further, the specific method of the support vector machine SVM in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously a support vector machine SVM needs to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
and setting the optimal penalty factor c to be 2-15-215 by adopting a network searching method, setting the range of the kernel function parameter g to be 2-15-215, setting the step length to be 0.1, and using a radial basis kernel function as an optimal kernel function.
Further, the specific method of the random forest RF in the step 6 is as follows: searching an optimal tree and other influencing parameters by adopting a genetic algorithm, wherein the number of variables to be optimized in the genetic algorithm is set to 2, the number of individuals is set to 20, the maximum genetic algebra is set to 200, the binary number of the variables is set to 10, the code is set to 0.95, the cross mutation probability is set to 0.7, and the mutation probability is set to 0.01.
Further, the number of hidden nodes in the extreme learning machine ELM in the step 6 is set to 40-100, and the optimal number of hidden nodes is obtained through comparison.
The invention has the following beneficial effects:
1. the invention utilizes the mid-infrared spectrum technology and the method for establishing the model by the support vector machine SVM, the random forest algorithm RF and the extreme learning machine ELM to realize the efficient, accurate and nondestructive identification of the deer antler cap variety and provide a new thought and method for solving the deer antler cap variety and quality detection problem.
Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an average spectrum of MSC deer horn caps.
Fig. 2 is a graph of normalized mean MSC spectrum data.
Table 1 shows the first 10 principal component eigenvalues and cumulative contribution rates.
Fig. 3 is a plot of the first 2 principal components of the full-segment spectral test set.
Fig. 4 is a plot of the first 2 principal components of the difference spectrum test set.
FIG. 5 is a graph of the overall spectral grid search parameter optimizing fitness.
Fig. 6 is a graph showing the fit of the full-segment spectral test set.
FIG. 7 is a graph of the search parameter optimization fitness of the differential spectrum grid
Fig. 8 is a difference spectrum test set fitting result.
Fig. 9 is a graph of the full-band spectral iteration error variation.
Fig. 10 is a graph of the fit of a full-segment spectral training set.
Fig. 11 is a graph showing the fit of the full-segment spectral test set.
Fig. 12 is a plot of the variance of the iterative error of the difference spectrum.
Fig. 13 is a difference spectrum training set fitting result.
Fig. 14 is a graph showing the result of the difference spectrum test set fitting.
Table 2 is a comparison of ELM algorithm predictions.
Fig. 15 is a graph showing the fit of the full-segment spectral training set.
Fig. 16 is a graph showing the fit of the full-segment spectral test set.
Fig. 17 is a difference spectrum training set fitting result.
Fig. 18 is a graph showing the result of the difference spectrum test set fitting.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-18 and table 1-2, the invention discloses a deer cap type identification method based on mid-infrared spectrum and SVM, comprising the following steps:
step 1: respectively crushing the deer antler caps to be detected and the deer antler caps to be detected, sieving the crushed deer antler caps and the deer antler caps by a 200-mesh sieve to obtain powder of the deer antler caps and the deer antler caps, and then placing the deer antler caps powder and potassium bromide into a constant-temperature drying oven at 60 ℃ for drying for 8-12 hours; collecting 42 samples of each of the sika deer antler caps and the sika deer antler caps, and obtaining 84 samples;
step 2: precisely weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, putting the ground powder into an infrared tabletting mold, and tabletting to obtain sika deer antler cap tablets, and respectively putting the sika deer antler cap tablets and the sika deer antler cap tablets on a mid-infrared spectrometer in the same way to respectively acquire diffuse reflection spectrums of corresponding tablets;
step 3: the spectrum information is easily affected by high-frequency random noise, baseline drift, sample itself, light scattering and the like, the original spectrum is required to be preprocessed, and interference of the factors is reduced; the multiple data processing comprises a multi-element scattering correction MSC, a standard normal variable transformation SNV, a smooth SG, a first derivative and a second derivative; by comparing the spectra after various pretreatments, it can be seen that the spectrum differences after the treatment of the MSCs with the multiple scattering correction are more obvious, as shown in fig. 1. It can be seen from fig. 1 that the sika deer antler caps and the red deer antler caps are significantly different in the wave bands 740-840, 1260-1360, 1420-1540, 1900-3320, 3700-4000;
the characteristic peak in the spectrum data is a main factor for judging spectrum distinction, wherein all bands with large difference are taken out from the spectrum data by adopting the full-band spectrum data bands for subsequent data analysis and comparison, so that the optimal band is selected;
step 4: the K-S test is a method for rapidly detecting the division of a training set and a test set based on an accumulated distribution function, and adopts a K-S test method, wherein the number of samples of the training set and the test set is 5:2, dividing 84 samples into 60 training sets and 24 test sets, wherein the sample ratio of the sika deer antler caps to the red deer antler caps in the training sets and the test sets is 1:1; training 30 parts of sika deer and sika deer antler caps respectively, and testing 12 parts of sika deer and sika deer antler caps respectively;
step 5: mid-infrared spectrum band range of 4000-400cm -1 The method has the characteristics of multiple wave bands, large data volume and strong redundancy, adopts a normalization and principal component analysis dimension reduction method to carry out high-dimensional data compression and extraction of principal characteristic components on the spectrum data, and mainly comprises the following steps:
step 51: normalization processing is carried out on MSC spectrum data by adopting a mapmamax function in matlab2014b, and the data mapping range is set to be 0-1, as shown in figure 2;
step 52: principal component analysis was performed on the normalized MSC deer horn cap spectrum data using Python 3.7 software, and the characteristic values and cumulative contribution rates of the first 10 principal components showing the full-segment spectrum and the spectrum selected to have the distinct difference band are shown in table 1, where the contribution rate of PCA1 is the largest in the full-segment MSC spectrum, 55.62909%, the contribution rate of PCA2 is 20.63217%, the cumulative contribution rate of the first 3 PCs is 84.15974%, until the cumulative contribution rate of the first 8 PCs is 97.16765%, and then each PC contribution rate is less than 1% and the cumulative contribution rate increases gradually. The contribution rate of PCA1 in MSC spectra with obvious difference bands is maximum and is 59.52195%, the contribution rate of PCA2 is 29.89027%, the cumulative contribution rate of the first 3 PCs is 92.68794%, until the cumulative contribution rate of the first 6 PCs is 97.92108%, the contribution rate of each PC is less than 1%, and the increasing speed of the cumulative contribution rate is gradually reduced;
and respectively drawing a full-segment MSC spectrum and selecting a scatter diagram of the first two main components in the MSC spectrum with obvious difference wave bands, as shown in figures 3-4;
step 6: modeling by using three methods of a Support Vector Machine (SVM), a Random Forest (RF) and an Extreme Learning Machine (ELM), respectively taking a full-segment SMC spectrum after principal component analysis and an MSC spectrum after principal component analysis after a distinct difference wave band is selected as input variables, and establishing an identification model of the SVM, the ELM, the RF sika deer antler cap and the sika deer antler cap.
Preferably, the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm < -1 >, the resolution is 4cm < -1 >, the scanning times are 16, each sample is repeatedly scanned for 3 times, and the average spectrum is obtained;
in the spectrum acquisition process, the indoor temperature is set to 25 ℃, and the humidity is set to 35%.
Preferably, the number selection method of the main component in the step 5 is a combination of a first method and a second method;
the number accumulation contribution rate of the first main component is at least more than or equal to 85%, and the characteristic value of the second main component is more than or equal to 1; on the full-segment spectrum data, the first 8 main components are selected to form the spectrum data with the main components reduced in dimension, and on the spectrum data with obvious difference wave bands are selected, the first 6 main components are selected to form the spectrum data with the main components reduced in dimension.
Preferably, the support vector machine SVM is one of the best supervised learning algorithms, which models based on the limits of the kernel-based linearity and non-linearity problems, which can solve the support classification and regression problems, which is also very feasible for "overfitting", especially in small samples. The specific method of the Support Vector Machine (SVM) in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously a support vector machine SVM needs to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
the network searching method is adopted, the optimal penalty factor c is set to be 2-15-215, the range of the kernel function parameter g is set to be 2-15-215, the step length is 0.1, and a radial basis kernel function is used as an optimal kernel function;
and (3) SVM modeling comparison, namely selecting MSC spectrum different model training sets with obvious difference wave bands and principal component analysis based on the whole segment SMC spectrum, wherein the test set identification effect and the determined c, g are shown in figures 5-8. From fig. 5-8, the recognition rate of the training set and the prediction set modeled by the full-segment SMC spectrum and the MSC spectrum with obvious difference wave band after the principal component analysis is selected is 100%, which shows that the SVM has good effect on identifying the type of the deer horn cap.
Preferably, the random forest RF is a very flexible and practical method, has excellent accuracy, can evaluate the importance of each feature on the classification problem, can obtain very good results for the default problem, and in the BF model, the quality of the final result is affected by setting the tree. The specific method of the random forest RF in the step 6 is as follows: searching an optimal tree and other influencing parameters by adopting a genetic algorithm, wherein the number of variables to be optimized in the genetic algorithm is set to 2, the number of individuals is set to 20, the maximum genetic algebra is set to 200, the binary digit number of the variables is set to 10, the code is set to 0.95, the cross mutation probability is set to 0.7, and the mutation probability is set to 0.01;
establishing a random forest model, searching the mesh number of the tree by using a genetic algorithm, and selecting the mesh number of the MSC spectrum tree with obvious difference wave bands for principal component analysis to be 600 based on the mesh number of the whole segment SMC spectrum tree, wherein the modeling result is shown in figures 9-14. As can be seen from fig. 9-14, the training set recognition rate is 100% and the test set recognition rate is 95.8333% in the full-segment SMC spectrum modeling, and there are 1 spotted deer antler cap recognition errors. In MSC spectrum modeling of principal component analysis after a distinct difference wave band is selected, the recognition rate of a training set is 100%, the recognition rate of a test set is 87.5%, and 3 sika deer antler caps are wrong in recognition, so that the situation that the built model is over-fitted is indicated, and the recognition rate of the whole-section SMC spectrum modeling is higher than that of the MSC spectrum modeling of principal component analysis after the distinct difference wave band is selected.
Preferably, the extreme learning machine ELM is a novel rapid learning algorithm, the hidden layer node does not need to be adjusted in learning, that is to say, the weight of the hidden layer node of the ELM network is randomly generated or manually defined, and the learning process only needs to calculate the output weight, so that the extreme learning machine ELM has the advantages of few training parameters, high learning speed and strong generalization capability. The number of hidden nodes in the ELM model is directly related to the correct rate of the training set and the test set, and the algorithm time consuming time is prolonged due to the increase of the number of hidden nodes, so that after the number of hidden nodes is larger than the number of training data sets according to the selection of related documents, the increase of the correct rate is not obvious and has fluctuation, and the number of hidden nodes in the ELM of the extreme learning machine in the step 6 is set to be 40-100, and the optimal number of hidden nodes is obtained through comparison;
in the ELM model, a sigmoidal function is selected as the activation function, the number of hidden nodes is set to 40-100, and the comparison is made with a step size of 1, resulting in table 2 (only modeling results with a step size of 10 from 40 are shown here to avoid data being too dense) and fig. 15-18. Table 2 shows that when the number of hidden nodes in the full-segment SMC spectrum modeling is 60, the recognition rate of the training set is 100%, the recognition rate of the test set is 95%, and the identification of the deer horn cap is wrong; when the number of the MSC spectrum modeling hidden nodes with obvious difference wave bands and subjected to principal component analysis is 50, the recognition rate of the training set is 100%, the recognition rate of the test set is 95.833%, and the identification errors of the deer horn caps of the sika deer are generated. Overall, ELM algorithms have a stable recognition effect in 2 modeling situations.
From the RF modeling effect, the identification rate of the prediction set by adopting full-segment spectrum modeling on the extraction of the spectrum data features is 100% which is obviously higher than the identification rate of the test set by using differential spectrum modeling by 87.5%. The ELM model also shows that the recognition rate of the full-segment spectrum modeling prediction set is 100% which is slightly higher than the recognition rate of the prediction set of the difference spectrum modeling 95.8333%. Thus, the full-segment spectrum modeling effect is generally better than the differential spectrum modeling effect. Compared with a modeling method, the average recognition rate of the SVM model is 100%, the average recognition rate of the RF model is 93.75%, and the average recognition rate of the ELM model is 97.91665%. The SVM has good identification effect in both full-range spectrum and difference spectrum, and is very suitable for analyzing infrared spectrum data in small samples. The mid-infrared full-segment spectrum data combines with SVM modeling recognition effect optimally, and provides a new thought and method for solving the problem of detecting the type and quality of the deer horn caps.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (1)

1. A deer horn cap type identification method based on mid-infrared spectrum and SVM is characterized in that: the method comprises the following steps:
step 1: respectively crushing the deer antler caps to be detected and the deer antler caps to be detected, sieving the crushed deer antler caps and the deer antler caps by a 200-mesh sieve to obtain powder of the deer antler caps and the deer antler caps, and then placing the deer antler caps powder and potassium bromide into a constant-temperature drying oven at 60 ℃ for drying for 8-12 hours;
step 2: precisely weighing 1.8mg of dried sika deer antler cap powder and 190mg of potassium bromide, mixing and grinding uniformly, putting the ground powder into an infrared tabletting mold, and tabletting to obtain sika deer antler cap tablets, and respectively putting the sika deer antler cap tablets and the sika deer antler cap tablets on a mid-infrared spectrometer in the same way to respectively acquire diffuse reflection spectrums of corresponding tablets;
step 3: the UnscramblerX 10.4 software is adopted to carry out multiple data processing on The collected original spectrum, a preprocessed spectrum is obtained, and The preprocessed spectrum is compared with The corresponding original spectrum; the multiple data processing comprises a multi-element scattering correction MSC;
the method comprises the steps of adopting full-band spectrum data wave bands and only taking out all wave bands with larger difference from spectrum data to carry out subsequent data analysis and comparison, so as to select an optimal wave band;
step 4: the K-S test method is adopted, and the number of samples in the training set and the test set is 5:2, dividing the total sample number into a plurality of training sets and a plurality of test sets, wherein the sample ratio of the sika deer antler caps to the red deer antler caps in the training sets and the test sets is 1:1;
step 5: the method for reducing dimension by adopting normalization and principal component analysis is used for carrying out high-dimensional data compression and extraction of main characteristic components on the optical data, and mainly comprises the following steps:
step 51: carrying out normalization processing on MSC spectrum data by adopting a mapmamax function in matlab2014b, and setting a data mapping range to be 0-1;
step 52: performing principal component analysis on the normalized MSC deer horn cap spectrum data by using Python 3.7 software, and respectively drawing a full-segment MSC spectrum and selecting a scatter diagram of the first two principal components in the MSC spectrum with obvious difference wave bands;
step 6: modeling by using three methods of a Support Vector Machine (SVM), a Random Forest (RF) and an Extreme Learning Machine (ELM), respectively taking a full-segment SMC spectrum after principal component analysis and an MSC spectrum after principal component analysis after a distinct difference wave band is selected as input variables, and establishing an identification model of the SVM, the ELM, the RF sika deer antler cap and the sika deer antler cap;
the wave number range of the diffuse reflection spectrum acquisition software in the step 2 is 4000-400cm -1 Resolution of 4cm -1 The scanning times are 16 times, each sample is repeatedly scanned for 3 times, and an average spectrum is obtained;
in the spectrum acquisition process, the indoor temperature is set to 25 ℃, and the humidity is set to 35%;
the multiple data processing in the step 3 further comprises standard normal variable transformation SNV, smoothing SG, a first derivative and a second derivative;
the number selection method of the main components in the step 5 is a combination of a first method and a second method;
the number accumulation contribution rate of the first main component is at least more than or equal to 85%, and the characteristic value of the second main component is more than or equal to 1;
the specific method of the Support Vector Machine (SVM) in the step 6 is as follows: firstly, a training set adopts K-CV cross validation and simultaneously a support vector machine SVM needs to determine an optimal penalty factor c, a kernel function parameter g and an optimal kernel function;
the optimal penalty factor c is set to 2 by adopting a network searching method -15 ~2 15 The range of the kernel parameter g is set to 2 -15 ~2 15 The step length is 0.1, and a radial basis function is used as an optimal kernel function;
the specific method of the random forest RF in the step 6 is as follows: searching an optimal tree and other influencing parameters by adopting a genetic algorithm, wherein the number of variables to be optimized in the genetic algorithm is set to 2, the number of individuals is set to 20, the maximum genetic algebra is set to 200, the binary digit number of the variables is set to 10, the code is set to 0.95, the cross mutation probability is set to 0.7, and the mutation probability is set to 0.01;
and (3) setting the number of hidden nodes in the extreme learning machine ELM in the step 6 to 40-100, and comparing to obtain the optimal number of hidden nodes.
CN202110918614.8A 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM Active CN113610017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110918614.8A CN113610017B (en) 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110918614.8A CN113610017B (en) 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM

Publications (2)

Publication Number Publication Date
CN113610017A CN113610017A (en) 2021-11-05
CN113610017B true CN113610017B (en) 2024-02-02

Family

ID=78340265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110918614.8A Active CN113610017B (en) 2021-08-11 2021-08-11 Deer horn cap type identification method based on mid-infrared spectrum and SVM

Country Status (1)

Country Link
CN (1) CN113610017B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2478823A1 (en) * 2002-03-11 2003-09-18 Allan L. Schaefer Method for the evaluation of velvet antler
CN109034261A (en) * 2018-08-10 2018-12-18 武汉工程大学 A kind of Near Infrared Spectroscopy Data Analysis based on support vector machines
CN109374573A (en) * 2018-10-12 2019-02-22 乐山师范学院 Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis
CN110765962A (en) * 2019-10-29 2020-02-07 刘秀萍 Plant identification and classification method based on three-dimensional point cloud contour dimension values
CN111024645A (en) * 2020-01-08 2020-04-17 山东金璋隆祥智能科技有限责任公司 Quantitative research method for Dong' a donkey-hide gelatin, Fujiao and antler gelatin
CN111458306A (en) * 2019-01-22 2020-07-28 吉林农业大学 Method for identifying different cartialgenous flowers by combining infrared spectrum with P L S-DA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050153359A1 (en) * 2002-03-11 2005-07-14 Schaefer Allan L. Method for the evaluation of velvet antler

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2478823A1 (en) * 2002-03-11 2003-09-18 Allan L. Schaefer Method for the evaluation of velvet antler
CN109034261A (en) * 2018-08-10 2018-12-18 武汉工程大学 A kind of Near Infrared Spectroscopy Data Analysis based on support vector machines
CN109374573A (en) * 2018-10-12 2019-02-22 乐山师范学院 Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis
CN111458306A (en) * 2019-01-22 2020-07-28 吉林农业大学 Method for identifying different cartialgenous flowers by combining infrared spectrum with P L S-DA
CN110765962A (en) * 2019-10-29 2020-02-07 刘秀萍 Plant identification and classification method based on three-dimensional point cloud contour dimension values
CN111024645A (en) * 2020-01-08 2020-04-17 山东金璋隆祥智能科技有限责任公司 Quantitative research method for Dong' a donkey-hide gelatin, Fujiao and antler gelatin

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
中红外光谱的进口木材树种识别方法;冯国红;朱玉杰;李耀翔;;光谱学与光谱分析(07);全文 *
基于分段主成分分析和高光谱技术的大豆品种识别;刘瑶;谭克竹;陈月华;王志朋;谢红;王立国;;大豆科学(04);全文 *
太赫兹光谱技术在生物活性肽检测中应用研究;王璞;何明霞;李萌;曲秋红;刘锐;陈永德;;光谱学与光谱分析(09);全文 *

Also Published As

Publication number Publication date
CN113610017A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
Feng et al. Investigation on data fusion of multisource spectral data for rice leaf diseases identification using machine learning methods
Bai et al. Accurate prediction of soluble solid content of apples from multiple geographical regions by combining deep learning with spectral fingerprint features
Dong et al. Discrimination of “Hayward” kiwifruits treated with forchlorfenuron at different concentrations using hyperspectral imaging technology
CN107271372A (en) A kind of Apple Leaves chlorophyll remote sensing estimation method
Liu et al. Discrimination of producing areas of Auricularia auricula using visible/near infrared spectroscopy
CN112098358A (en) Near infrared spectrum parallel fusion quantitative modeling method based on quaternion convolution neural network
Wu et al. Deep convolution neural network with weighted loss to detect rice seeds vigor based on hyperspectral imaging under the sample-imbalanced condition
CN103743705A (en) Rapid detection method for sorghum halepense and similar species
CN113610017B (en) Deer horn cap type identification method based on mid-infrared spectrum and SVM
CN116735527B (en) Near infrared spectrum optimization method, device and system and storage medium
Liu et al. Research on the online rapid sensing method of moisture content in famous green tea spreading
CN114062306B (en) Near infrared spectrum data segmentation preprocessing method
Li et al. HSI combined with CNN model detection of heavy metal Cu stress levels in apple rootstocks
CN112881333B (en) Near infrared spectrum wavelength screening method based on improved immune genetic algorithm
CN113049526B (en) Corn seed moisture content determination method based on terahertz attenuated total reflection
CN112782148B (en) Method for rapidly identifying Arabica and Robertia coffee beans
CN111595802A (en) Construction method and application of Clinacanthus nutans seed source place classification model based on NIR (near infrared spectroscopy)
CN112763448A (en) ATR-FTIR technology-based method for rapidly detecting content of polysaccharides in rice bran
Chen et al. Classification of wheat grain varieties using terahertz spectroscopy and convolutional neural network
CN105651727A (en) Method for discriminating shelf life of apple through near infrared spectroscopy based on JADE and ELM
Zhang et al. Apple identity recognition based on SVM model parameter optimization and near infrared hyperspectral.
CN108982408A (en) A method of organic rice and non-organic rice are distinguished using near-infrared spectrum technique
CN116823576B (en) Evaluation method and system for plant suitable area of original drug
CN110308110B (en) Nondestructive prediction method for long-distance safe yellow tea yellow-smoldering time based on least square support vector machine
Liu et al. ATR‐FTIR Spectroscopy Preprocessing Technique Selection for Identification of Geographical Origins of Gastrodia elata Blume

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant