CN114878509A - Standard sample-free transfer method of tobacco near-infrared quantitative analysis model - Google Patents

Standard sample-free transfer method of tobacco near-infrared quantitative analysis model Download PDF

Info

Publication number
CN114878509A
CN114878509A CN202210491870.8A CN202210491870A CN114878509A CN 114878509 A CN114878509 A CN 114878509A CN 202210491870 A CN202210491870 A CN 202210491870A CN 114878509 A CN114878509 A CN 114878509A
Authority
CN
China
Prior art keywords
training set
samples
quantitative analysis
sample
tobacco
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210491870.8A
Other languages
Chinese (zh)
Inventor
刘雪松
沈欢超
王钧
倪鸿飞
耿莹蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210491870.8A priority Critical patent/CN114878509A/en
Publication of CN114878509A publication Critical patent/CN114878509A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2201/00Features of devices classified in G01N21/00
    • G01N2201/12Circuits of general importance; Signal processing
    • G01N2201/129Using chemometrical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Manufacture Of Tobacco Products (AREA)

Abstract

The invention discloses a standard sample-free transfer method of a tobacco near-infrared quantitative analysis model, which comprises the steps of collecting spectral data of tobacco under different conditions; sampling part of samples in the set as a training set, taking m samples in the training set as a source domain training set, and taking n samples in the training set as a target domain training set; setting sample initial weights of a source domain training set and a target domain training set in sequence, setting iteration conditions of training, and establishing a near-infrared quantitative analysis model by using a weighted extreme learning machine; inputting the front L columns in the corresponding principal component matrix S into the current near-infrared quantitative analysis model to obtain a preliminary predicted value; calculating a prediction error according to a formula; and updating the weight set according to a formula so as to correct the near infrared quantitative analysis model. The standard sample-free transfer method provided by the invention realizes the standard sample-free transfer of the tobacco near-infrared quantitative analysis model under different conditions, and avoids the need of scanning a large number of samples during model recalibration.

Description

Standard sample-free transfer method of tobacco near-infrared quantitative analysis model
Technical Field
The invention relates to the technical field of infrared quantitative analysis, in particular to a non-standard sample transfer method of a tobacco near-infrared quantitative analysis model.
Background
Tobacco is a plant of the genus Nicotiana of the family Solanaceae, has a long history of application in China, is used as a raw material in the tobacco industry, can be used as a pesticide for the whole plant, can also be used for medicine, and can be used as an anesthetic, a sweating agent, a sedative and an emetic agent. The near infrared spectrum analysis technology has the advantages of rapidness, no damage, simple and convenient operation and the like, is a chemical analysis means with great potential, and is widely applied to the fields of agriculture, petrifaction, tobacco, pharmacy and the like. In practical applications, due to the variety of measurement environments (e.g., changes in ambient temperature and humidity) and instruments (although the instruments are from the same manufacturer), the calibration model is often not suitable for new samples or is out of date. Recalibration can be used to solve this problem, but recalibration requires scanning of a large number of samples, which is time consuming and expensive. Model calibration transfer without standards is a judicious choice to reduce recalibration costs without the need for standards, however, there is no calibration transfer without standards for tobacco spectra.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a standard sample-free transfer method of a tobacco near-infrared quantitative analysis model, which adopts the following technical scheme:
in one aspect, the invention provides a standard sample-free transfer method of a tobacco near-infrared quantitative analysis model, which comprises the following steps:
s1, collecting the spectrum data of tobacco under different conditions, preprocessing the spectrum data, reducing the dimension of the data by using a principal component analysis method, determining the number L of the principal components, and recording the data after dimension reduction as X a A mapped value Y of measured properties of the corresponding tobacco a Together forming a sample set { X a ,Y a };
S2, taking part of samples in the sample set as a training set { X k ,Y k Taking m samples in the training set as a source domain training set, and recording the samples as { X ═ 1, …, m + n i S ,Y i S Taking n samples in the training set as a target domain training set, and recording the samples as { X ═ 1, …, m i T ,Y i T },i=1,…,n;
S3, setting the initial weights of the samples of the source domain training set in sequence, and recording the weights as { w i S Setting sample initial weights of the target domain training set in turn as { w ═ 1, …, m j T J-m +1, …, m + n to get the overall weight of the training set w k }={w i S ;w j T Setting the iteration condition of training, and establishing a near-infrared quantitative analysis model by using a weighted extreme learning machine;
s4, mixing { X k Inputting the front L columns in the corresponding principal component matrix S into the current near-infrared quantitative analysis model to obtain a primary predicted value { P } k },k=1,…,m+n;
S5, calculating the prediction error according to the following formula k },k=1,…,m+n,
Figure BDA0003631378620000021
S6, updating the weight set according to the following formula to correct the near infrared quantitative analysis model;
Figure BDA0003631378620000022
wherein, w k Is X k The weight corresponding to the sample;
Figure BDA0003631378620000023
Figure BDA0003631378620000024
Figure BDA0003631378620000025
and S7, circularly executing the steps S4-S6 until the training iteration condition is not met, so as to output the prediction model.
Further, the method also comprises the following steps:
s8, repeating the steps S2-S7 for multiple times, wherein in each execution, the sample data in the source domain training set and/or the target domain training set are not completely the same, so as to obtain a plurality of prediction models;
and S9, processing the spectral data of the tobacco to be tested, inputting the processed spectral data into the plurality of quantitative analysis prediction models respectively to obtain a plurality of predicted values, and taking the average value of the predicted values as the final predicted value.
Further, the remaining samples in the sample set are taken as a test set { X } i ,Y i };
Step S8 further includes:
will { X i Inputting the first L columns of the corresponding principal component matrix S into the corresponding prediction models to obtain prediction values, and inputting the prediction values and the corresponding Y columns of the prediction models i And evaluating the model effect by using the decision coefficient and/or the root mean square error, and if the preset evaluation requirement is met, executing the step S9.
Further, if the preset evaluation requirement is not met, increasing sample data of the training set, re-training, and then carrying out evaluation judgment, until the preset evaluation requirement is met, processing the spectral data of the tobacco to be tested, and then respectively inputting the spectral data into the latest prediction models to obtain a plurality of predicted values, and taking the average value of the predicted values as the final predicted value.
Further, in step S6, if ∈ 0.5 is found, the value of m is decreased and the value of n is increased, and the sample data in the source domain training set and the target domain training set is adjusted accordingly to perform the next loop correction, while keeping the sum of m and n unchanged.
Further, the initial weights of the samples of the source domain training set are all set to be 1/m, and the initial weights of the samples of the target domain training set are all set to be 1/n.
Further, the model iteration condition is that a maximum iteration number I is set, and if the actual iteration number is larger than I, the prediction model is output.
Further, in step S3, the weighted extreme learning machine is used as the base learner, the activation function is set to sigmoid, the number of hidden layer nodes is 30, the initial weight is β, and the calculation formula is as follows:
Figure BDA0003631378620000031
wherein I is the maximum iteration number.
Further, the preprocessing is to process it by using a standard normal variable transformation.
Further, in step S1, the acquiring under different conditions includes: and (3) carrying out near infrared spectrum collection on tobaccos with different properties in different experimental environments.
The technical scheme provided by the invention has the following beneficial effects:
a. the method avoids the need of scanning a large number of samples for model recalibration, and saves manpower and material resources;
b. the method can be directly used for measuring the tobacco under different conditions, and is convenient and quick;
c. and the sample data in different states are utilized to perform transfer learning in the early stage, so that the model accuracy is high.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a basic mechanism of an extreme learning machine in a method for transferring a sample without a standard sample according to an embodiment of the invention;
fig. 2 is a schematic diagram of a mechanism of a tragaboost algorithm in the transfer method without a standard sample according to the embodiment of the present invention;
FIG. 3 is a schematic flow chart of a transfer method without a standard sample according to an embodiment of the present invention;
FIG. 4 is a graph showing the relationship between absorbance and wavenumber without pretreatment measured by different instruments in the method for transfer without standard sample according to the embodiment of the present invention;
FIG. 5 is a graph showing the relationship between absorbance and wavenumber measured by different instruments and pre-processed in the method for transfer without a standard sample according to the embodiment of the present invention;
FIG. 6 is a schematic diagram of the distribution of the first 3 principal components after dimensionality reduction of PCA in the no-standard sample transfer method provided by the embodiment of the present invention;
FIG. 7 is a schematic diagram of principal component contribution rate and cumulative contribution rate in the method for transfer without standard sample according to the embodiment of the present invention;
FIG. 8 is a schematic diagram showing the variation of the coefficients of determination of the migration of different instrument models with the number of samples in the non-standard sample transfer method according to the embodiment of the present invention;
FIG. 9 is a schematic diagram of the variation of the root mean square error of model migration between different instruments with the number of samples in the method for transfer without standard sample provided by the embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.
In one embodiment of the invention, a method for transferring a tobacco near-infrared quantitative analysis model without a standard sample is provided, wherein a weighted extreme learning machine (wELM) and a TrAdaBoost algorithm are combined to construct a wELM-TrAdaBoost algorithm so as to realize the transfer of the model without the standard sample and adapt to the transfer under different conditions.
In this embodiment, a weighted extreme learning machine wlelm is used as a base learner, the basic structure of which is shown in fig. 1, and which is a single hidden layer feedforward neural network, has the advantages of fast learning speed and few adjustable parameters, and the adjustable parameters only need to adjust the number of hidden neurons and the activation function. More precisely, given N training samples (x) i ,t i ) Its mathematical model is H β ═ Τ, | | H β -T | ═ 0, where β is the output weight, T is the target vector, H is the hidden layer output matrix H ═ H (x) in which H is the hidden layer output matrix H ═ H (x) 1 );h(x 2 );...;h(x N )]. Hidden node function h i (x) I 1., Q, also called a feature mapping function, actually maps sample data x from an original data space to a hidden layer space to form a hidden layer with dimension Q (i.e., the number of hidden layer nodes is Q), and outputs a row vector h (x) ([ h ]) 1 (x),...,h L (x)]. A weighted extreme learning method is used to build the calibration transfer model while taking into account the different importance of the samples in the training set. The weighted extreme learning machine can be used to process data having unbalanced class distributions while maintaining the advantages of the original extreme learning machine. Each training sample is assigned an additional weight. Mathematically, a training sample x is defined for each training sample i The associated N × N diagonal matrix W. It can be seen that the weight matrix W ═ diag (W) ii ) N, which plays an important role in the weighted extreme learning machine, and a weighting scheme W may be used later ii =1/N。
The TrAdaBoost algorithm used in combination in the embodiment is an inductive ensemble learning method based on the Boosting algorithm, and integrates the advantages of ensemble learning and transfer learning. It is possible that some information inherent in the source domain may be valuable, while the rest may be useless or even harmful, when establishing a calibration model of the target domain. TrAdaBoost allows the use of a small amount of newly labeled data in combination with old data to build high quality models for new data, enabling efficient transfer of knowledge from old data to new data even if the new data is insufficient to train the models alone. Thus, the TrAdaBoost algorithm attempts to update the weights of each sample in the source and target domain datasets, depending on whether its contribution is positive or negative in each iteration. Referring to fig. 2, for those samples in the target domain, the weight update strategy is the same as the AdaBoost algorithm, while for the weights of the samples in the source domain, a different strategy is employed.
In this embodiment, the two algorithms are combined to obtain a wulm-TrAdaBoost algorithm, which is shown in fig. 3, and a series of calibration transfer quantitative analysis models are established by using the wulm-TrAdaBoost algorithm after sample preprocessing and data dimension reduction. Each sample in the target domain test set is used as an input of each model, and a corresponding predicted value is calculated by adopting a weighted average strategy as a final result.
Specifically, the no-standard sample transfer method comprises the following steps:
s1, collecting the spectrum data of tobacco under different conditions, preprocessing the spectrum data, reducing the dimension of the data by using Principal Component Analysis (PCA), calculating a principal component score matrix S, determining the number L of the principal components according to the cumulative contribution percentage of the principal components, and recording the data after the dimension reduction as X a A mapped value Y of measured properties of the corresponding tobacco a Together forming a sample set { X a ,Y a },a=1,…,c;
The pretreatment is to treat the tobacco by standard normal variable transformation, the collection under different conditions comprises near infrared spectrum collection of the tobacco with different properties under different experimental environments, and the experimental environments comprise a measurement environment and an instrument environment. For example, near infrared spectroscopy is performed on tobacco under different temperature and humidity environments by using different instruments to analyze the information of nicotine of the tobacco.
S2, taking part of samples in the sample set as a training setX k ,Y k Taking the rest samples in the sample set as a test set { X }, k ═ 1, …, m + n i ,Y i },i=1,…,d;
Taking m samples in the training set as a source domain training set, and recording as { X i S ,Y i S Taking n samples in the training set as a target domain training set, and recording the samples as { X ═ 1, …, m i T ,Y i T },i=1,…,n;
S3, setting the initial weights of the samples of the source domain training set in sequence, and recording the weights as { w i S Setting sample initial weights of the target domain training set in turn as { w ═ 1, …, m j T J-m +1, …, m + n to get the overall weight of the training set w k }={w i S ;w j T Setting the iteration condition of training, and establishing a near-infrared quantitative analysis model by using a weighted extreme learning machine;
specifically, the initial weights of the samples in the source domain training set are all set to be 1/m, and the initial weights of the samples in the target domain training set are all set to be 1/n. The model iteration condition is to set a maximum iteration number I, preferably, I is 200, and if the actual iteration number is greater than I, the prediction model is output. Taking a weighted extreme learning machine as a base learning machine, setting an activation function of the weighted extreme learning machine as sigmoid, setting the number of nodes of an implicit layer as 30, but changing the activation function according to actual conditions, wherein the initial weight of the weighted extreme learning machine is beta, and a calculation formula is as follows:
Figure BDA0003631378620000061
s4, mixing { X k Inputting the front L columns in the corresponding principal component matrix S into the current near-infrared quantitative analysis model to obtain a primary predicted value { P } k },k=1,…,m+n;
S5, calculating the prediction error according to the following formula k },k=1,…,m+n,
Figure BDA0003631378620000071
S6, updating the weight set according to the following formula to correct the near infrared quantitative analysis model;
Figure BDA0003631378620000072
wherein, w k Is X k The weight corresponding to the sample;
Figure BDA0003631378620000073
Figure BDA0003631378620000074
Figure BDA0003631378620000075
and S7, circularly executing the steps S4-S6 until the training iteration condition is not met, so as to output the prediction model.
Further comprising the steps of:
s8, repeating the steps S2-S7 for multiple times, wherein in each execution, the sample data in the source domain training set and/or the target domain training set are not completely the same, so as to obtain a plurality of prediction models;
and S9, processing the spectral data of the tobacco to be tested, inputting the processed spectral data into the plurality of quantitative analysis prediction models respectively to obtain a plurality of predicted values, and taking the average value of the predicted values as the final predicted value.
Note that a, c, d, m, n, i, j in the above embodiments are integers, and m + n + d is c.
In order to further evaluate the accuracy of model migration, step S8 further includes: will { X i Inputting the first L columns of the corresponding principal component matrix S into the corresponding prediction models to obtain prediction values P i And corresponding Y i Utilization blockConstant coefficient R 2 And the predicted root mean square error RMSEP is used for evaluating the model effect, and the calculation formula is as follows:
Figure BDA0003631378620000076
Figure BDA0003631378620000077
wherein, Y i Is the true value, P i Is the predicted value and d represents the number of samples in the test set. Generally, the smaller the RMSEP, the closer the predicted value is to the measured value, and the stronger the predictive power of the model. R 2 Reflecting the degree of correlation between the variables. R 2 Closer to 1, the better the model works.
If the predetermined evaluation requirement is satisfied, step S9 is executed. If the preset evaluation requirement is not met, adding or replacing sample data of the training set to enrich the training data, re-training, and then performing evaluation judgment until the preset evaluation requirement is met, and then executing the step S9. In step S6, if it is found that ∈ 0.5, the value of m is decreased and the value of n is increased, and the sample data in the source domain training set and the target domain training set is adjusted accordingly to perform the next loop correction, while keeping the sum of m and n unchanged.
In one embodiment of the present invention, a random experiment of the calibration transfer from an AA spectrometer to a BB spectrometer is illustrated as an experimental protocol. The tobacco spectral data preprocessing employs standard normal variable transformation, which is used to eliminate the influence of solid particle size, surface scattering and optical path variation on the spectrum, see fig. 4 and 5, the preprocessing effect. Principal component analysis is used for data dimensionality reduction, which projects data from a high-dimensional space to a low-dimensional space and preserves as much of the information present in the original data as possible. Since the number of samples of the tobacco spectral data set is often much smaller than the number of wavelengths (variables), the complexity of the data is reduced. For PCA score analysis, since the raw spectrum contains 1609 wavelength points (variables), to reduce the complex of the model and calculationsAnd (3) performing PCA dimensionality reduction by combining 30 samples of the source domain training set, 15 samples of the target domain test set and 40 samples of the target domain training set. Referring to FIG. 6, in the three-dimensional principal component score space, there is a significant difference between the source domain samples and the target domain samples, further illustrating the necessity of a calibration transfer. The amount of principal components was also selected using the above experimental protocol, see fig. 7, with a contribution of 63.46% for the first principal component PC1, 24.15% for the second principal component PC2, and 8.39% for the third principal component PC 3. In order to include as much useful information as possible in the original spectral data, the number of principal components is set to 20, and the cumulative contribution ratio is 99.99%, and it is required to set in advance that the principal method should maintain a contribution ratio of 99.9% or more, and therefore, three components having the largest influence may be selected. Training the processed sample data in the model to obtain a plurality of submodels, inputting the test set into the plurality of submodels to integrally evaluate the accuracy of the submodels, and finding a decision coefficient R with reference to fig. 8 and 9 2 The method has the advantages that the method is more than 97%, the root mean square error RMSEP is less than 3%, the preset evaluation requirement is met, and the accuracy of the migration model is high.
The standard sample-free transfer method of the tobacco near-infrared quantitative analysis model provided by the invention realizes the standard sample-free transfer of the tobacco near-infrared quantitative analysis model under different conditions, avoids the defects that the model needs to be recalibrated and a large number of samples need to be scanned, is time-consuming and expensive, and has the characteristic of being more flexible due to the standard sample-free transfer method.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A non-standard sample transfer method for a tobacco near-infrared quantitative analysis model is characterized by comprising
S1, collecting the spectrum data of tobacco under different conditions, preprocessing the spectrum data, reducing the dimension of the data by using a principal component analysis method, determining the number L of the principal components, and recording the data after dimension reduction as X a Which is measured with respect to the corresponding tobaccoIs mapped to a value Y a Together forming a sample set { X a ,Y a };
S2, taking part of samples in the sample set as a training set { X k ,Y k Taking m samples in the training set as a source domain training set, and recording the samples as { X ═ 1, …, m + n i S ,Y i S Taking n samples in the training set as a target domain training set, and recording the samples as { X ═ 1, …, m i T ,Y i T },i=1,…,n;
S3, setting the initial weights of the samples of the source domain training set in sequence, and recording the weights as { w i S Setting sample initial weights of the target domain training set in turn as { w ═ 1, …, m j T J-m +1, …, m + n to get the overall weight of the training set w k }={w i S ;w j T Setting the iteration condition of training, and establishing a near-infrared quantitative analysis model by using a weighted extreme learning machine;
s4, mixing { X k Inputting the front L columns in the corresponding principal component matrix S into the current near-infrared quantitative analysis model to obtain a primary predicted value { P } k },k=1,…,m+n;
S5, calculating the prediction error according to the following formula k },k=1,…,m+n,
Figure FDA0003631378610000011
S6, updating the weight set according to the following formula to correct the near infrared quantitative analysis model;
Figure FDA0003631378610000012
wherein, w k Is X k The weight corresponding to the sample;
Figure FDA0003631378610000013
Figure FDA0003631378610000014
Figure FDA0003631378610000015
and S7, circularly executing the steps S4-S6 until the training iteration condition is not met, so as to output the prediction model.
2. The transfer method without a standard sample according to claim 1, further comprising the steps of:
s8, repeating the steps S2-S7 for multiple times, wherein in each execution, the sample data in the source domain training set and/or the target domain training set are not completely the same, so as to obtain a plurality of prediction models;
and S9, processing the spectral data of the tobacco to be tested, inputting the processed spectral data into the plurality of quantitative analysis prediction models respectively to obtain a plurality of predicted values, and taking the average value of the predicted values as the final predicted value.
3. The method of claim 2, wherein the remaining samples in the set of samples are taken as a test set { X ™ i ,Y i };
Step S8 further includes:
will { X i Inputting the first L columns of the corresponding principal component matrix S into the corresponding prediction models to obtain prediction values, and inputting the prediction values and the corresponding Y columns of the prediction models i And evaluating the model effect by using the decision coefficient and/or the root mean square error, and if the preset evaluation requirement is met, executing the step S9.
4. The method according to claim 3, wherein if the preset evaluation requirement is not met, the sample data of the training set is added or replaced, the training is performed again, the evaluation judgment is performed again, the spectral data of the tobacco to be tested are processed and then input into the latest prediction models respectively until the preset evaluation requirement is met, so as to obtain a plurality of predicted values, and the average value of the predicted values is used as the final predicted value.
5. The method according to claim 1, wherein in step S6, if ∈ 0.5, the value m is decreased and the value n is increased without changing the sum of m and n, and the sample data in the source domain training set and the target domain training set is adjusted accordingly for the next round of correction.
6. The method according to claim 1, wherein the initial weights of the samples in the source domain training set are all 1/m, and the initial weights of the samples in the target domain training set are all 1/n.
7. The method according to claim 6, wherein the model iteration condition is that a maximum iteration number I is set, and if the actual iteration number is greater than I, the prediction model is output.
8. The method according to claim 7, wherein in step S3, the weighted extreme learning machine is used as the base learner, the activation function is set to sigmoid, the initial weight is β, and the calculation formula is as follows:
Figure FDA0003631378610000031
wherein I is the maximum iteration number.
9. The method of claim 1, wherein the pre-processing is processing using a standard normal variate transformation.
10. The transfer method without standard sample according to claim 1, wherein in step S1, the collecting under different conditions comprises: and (3) carrying out near infrared spectrum collection on tobaccos with different properties in different experimental environments.
CN202210491870.8A 2022-05-07 2022-05-07 Standard sample-free transfer method of tobacco near-infrared quantitative analysis model Pending CN114878509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210491870.8A CN114878509A (en) 2022-05-07 2022-05-07 Standard sample-free transfer method of tobacco near-infrared quantitative analysis model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210491870.8A CN114878509A (en) 2022-05-07 2022-05-07 Standard sample-free transfer method of tobacco near-infrared quantitative analysis model

Publications (1)

Publication Number Publication Date
CN114878509A true CN114878509A (en) 2022-08-09

Family

ID=82673030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210491870.8A Pending CN114878509A (en) 2022-05-07 2022-05-07 Standard sample-free transfer method of tobacco near-infrared quantitative analysis model

Country Status (1)

Country Link
CN (1) CN114878509A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115993344A (en) * 2023-03-23 2023-04-21 苏州斌智科技有限公司 Quality monitoring and analyzing system and method for near infrared spectrum analyzer
CN117809766A (en) * 2023-12-12 2024-04-02 山东临沂烟草有限公司 Tobacco leaf near infrared spectrum chemical component model optimization method based on transfer learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115993344A (en) * 2023-03-23 2023-04-21 苏州斌智科技有限公司 Quality monitoring and analyzing system and method for near infrared spectrum analyzer
CN117809766A (en) * 2023-12-12 2024-04-02 山东临沂烟草有限公司 Tobacco leaf near infrared spectrum chemical component model optimization method based on transfer learning

Similar Documents

Publication Publication Date Title
CN114878509A (en) Standard sample-free transfer method of tobacco near-infrared quantitative analysis model
CN109030388B (en) A kind of iron ore all iron content detection method based on spectroscopic data
JP2000517473A (en) System for monitoring and analyzing manufacturing processes using statistical simulation with single-step feedback
CN110736707B (en) Spectrum detection optimization method for transferring spectrum model from master instrument to slave instrument
UA86820C2 (en) Method for development of independent multi-dimensional calibration models
CN108960193B (en) Cross-component infrared spectrum model transplanting method based on transfer learning
CN104483292B (en) A kind of method that use multiline ratio method improves laser microprobe analysis accuracy
CN110726694A (en) Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm
Chen et al. Quantitative analysis of soil nutrition based on FT-NIR spectroscopy integrated with BP neural deep learning
CN109616161B (en) Fermentation process soft measurement method based on twin support vector regression machine
Mishra et al. Deep chemometrics: Validation and transfer of a global deep near‐infrared fruit model to use it on a new portable instrument
CN114626304B (en) Online prediction soft measurement modeling method for ore pulp copper grade
CN105092519A (en) Sample composition determination method based on increment partial least square method
CN109992861A (en) A kind of near infrared spectrum modeling method
CN116026795A (en) Rice grain quality character nondestructive prediction method based on reflection and transmission spectrum
CN112651173B (en) Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system
CN111999258A (en) Spectral baseline correction-oriented weighting modeling local optimization method
Shao et al. A new approach to discriminate varieties of tobacco using vis/near infrared spectra
CN111838744B (en) Continuous real-time prediction method for moisture in tobacco shred production process based on LSTM (localized surface plasmon resonance) environment temperature and humidity
CN112599194A (en) Method and device for processing methylation sequencing data
CN108120694B (en) Multi-element correction method and system for chemical component analysis of sun-cured red tobacco
Xie et al. Calibration transfer via filter learning
CN112782115A (en) Method for detecting consistency of sensory characteristics of cigarettes based on near infrared spectrum
CN111125629A (en) Domain-adaptive PLS regression model modeling method
CN111597878A (en) BSA-IA-BP-based colony total number prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination