CN112287601A - Method and medium for constructing tobacco leaf quality prediction model by using R language and application - Google Patents

Method and medium for constructing tobacco leaf quality prediction model by using R language and application Download PDF

Info

Publication number
CN112287601A
CN112287601A CN202011141976.2A CN202011141976A CN112287601A CN 112287601 A CN112287601 A CN 112287601A CN 202011141976 A CN202011141976 A CN 202011141976A CN 112287601 A CN112287601 A CN 112287601A
Authority
CN
China
Prior art keywords
model
prediction
data
variable
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011141976.2A
Other languages
Chinese (zh)
Other versions
CN112287601B (en
Inventor
李伟
王攀磊
鲁耀
张静
刘浩
董石飞
杨应明
王超
耿川雄
陈拾华
杨景华
王建新
聂鑫
朱海滨
林昆
杨义
段宗颜
张忠武
严君
邹炳礼
周敏
周绍松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hongyun Honghe Tobacco Group Co Ltd
Institute of Agricultural Environment and Resources of Yunnan Academy of Agricultural Sciences
Original Assignee
Hongyun Honghe Tobacco Group Co Ltd
Institute of Agricultural Environment and Resources of Yunnan Academy of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hongyun Honghe Tobacco Group Co Ltd, Institute of Agricultural Environment and Resources of Yunnan Academy of Agricultural Sciences filed Critical Hongyun Honghe Tobacco Group Co Ltd
Priority to CN202011141976.2A priority Critical patent/CN112287601B/en
Publication of CN112287601A publication Critical patent/CN112287601A/en
Application granted granted Critical
Publication of CN112287601B publication Critical patent/CN112287601B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Mathematical Optimization (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Computational Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Geometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Algebra (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of tobacco leaf quality prediction, and discloses a method, a medium and application for constructing a tobacco leaf quality prediction model by using an R language, wherein data transformation and screening processing are respectively carried out on prediction variables; creating a prediction variable set and a result variable set, and respectively carrying out segmentation and resampling on data; selecting a plurality of regression methods for modeling; using the root mean square error RMSE and the coefficient of determination R2And evaluating the prediction effects of the different models, and selecting the optimal model from the test models according to the model effect. The ecological factor model suitable for predicting the quality of the tobacco leaves can predict the quality wave of single-grade tobacco leaves in different areas of the year according to the change condition of ecological climate in the yearAnd the purchasing grade and quantity of the tobacco leaves are adjusted in a targeted manner according to the dynamic condition, the purchasing grade quantity and proportion of the tobacco leaves are actively adjusted, and the quality stable supply of the tobacco leaves is ensured.

Description

Method and medium for constructing tobacco leaf quality prediction model by using R language and application
Technical Field
The invention belongs to the technical field of tobacco leaf quality prediction, and particularly relates to a method, a medium and application for constructing a tobacco leaf quality prediction model by using an R language.
Background
Currently, tobacco quality is a result of the combined action of genetic factors, ecological environment and cultivation techniques. Numerous researches show that ecological factors such as climate, soil, topography and the like are important factors influencing the agronomic characters, physical characteristics, chemical components, disease rate, aroma substance content and smoking quality of tobacco leaves, particularly the characteristic characteristics of the tobacco leaf quality such as multifactor, polytropy and difficult quantification, the influence of the ecological environment is more prominent, and the tobacco leaf quality in different planting areas and different years has larger difference due to the change of light, temperature, water and gas conditions. Therefore, an ecological factor model for predicting the quality of the tobacco leaves is constructed, and the ecological factors such as climate, soil and cultivation management are used for predicting the quality change of the tobacco leaves, so that the method is very important for improving the quality of the tobacco leaves.
Through the above analysis, the problems and defects of the prior art are as follows: at present, a prediction model mostly focuses on predicting the sensory quality of tobacco leaves by utilizing the internal chemical components of the tobacco leaves, a research method about the correlation between ecological factors and the quality of the tobacco leaves mostly focuses on researching the influence and contribution of the ecological factors on the quality of the tobacco leaves through methods such as principal component regression analysis, grey correlation analysis and the like, finding out key ecological factors, and guiding the production of the tobacco leaves through the regulation and control of the key ecological factors. However, no prediction model is available for predicting the tobacco quality by using external ecological factors of the growth of flue-cured tobacco.
The difficulty in solving the above problems and defects is: on one hand, the construction of a prediction model requires a large amount of complete tobacco leaf quality and corresponding ecological factor data; on the other hand, the data types involved in the invention are complex, and have both continuous type variables and dependent type variables, and the prediction model constructed by each regression method has uncertainty.
The significance of solving the problems and the defects is as follows: therefore, the invention chooses to construct a prediction model using the R language, which can provide a variety of regression methods. The R language is open source software for mathematics and statistical calculation, can provide as many models as possible, carries out relatively complex prediction model construction on mass data, explores the uncertainty of the model through strict training and testing and selects the optimal model. The workload and the cost of tobacco leaf detection are reduced, and the problems of tobacco leaf raw material supply and blending caused by the lag of tobacco leaf quality detection are solved. According to the current year ecological climate condition, the quality of the tobacco leaves is evaluated and predicted by using the prediction model, the stable supply of the grade and the quantity of the tobacco leaf raw materials in the cigarette formula module is ensured, and the stable quality of the cigarette products is ensured.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method, a medium and application for constructing a tobacco leaf quality prediction model by using an R language.
The invention is realized in such a way that the tobacco quality prediction method based on the ecological factor model comprises the following steps:
step one, respectively carrying out data transformation and predictive variable screening processing on predictive variables in tobacco quality prediction;
creating a prediction variable set and a result variable set in the tobacco quality prediction, and respectively carrying out segmentation and resampling on data;
thirdly, selecting a plurality of regression methods to model the data; and obtaining prediction models in the quality prediction of different tobacco leaves.
Step four, adopting the root mean square error RMSE and the decision coefficient R2And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
Further, in the first step, the data transformation performed on the prediction variables includes centralization, normalization and skewness transformation; the centralization is that all variables are subtracted by the mean value, and the result is that the mean value of the transformed variables is 0; the normalized data is the division of each variable by its own standard deviation, normalization forces the standard deviation of the variable to be 1; the skewness transformation can remove distribution skewness, so that right-biased distribution or left-biased distribution is transformed into unbiased distribution, and variables are distributed approximately symmetrically.
Further, in step one, the method for performing data transformation on the predictor variable includes:
(I) constructing a trans function by using a preprocess function in a caret packet, and simultaneously carrying out centralized center, standardized scale and skewness transformation BoxCox processing on data;
(II) after construction of trans function, transformation of the original data using the predict function.
Further, in step one, the method for screening the predictive variable includes:
(1) removing a zero variance variable: the near-zero variance variable to be filtered is detected using the nearzero zerovar function in the caret package: if the display data set has a zero variance variable, the variable needs to be removed;
(2) multiple collinearity variables are removed.
Further, in step (2), the method for removing multiple collinearity variables includes:
1) calculating a correlation coefficient matrix among the prediction variables by using a cor function in a corrplot packet;
2) finding out the pair of predictive variables with the maximum absolute value of the correlation coefficient by using a findCorrelation function, and marking as the predictive variables A and B;
3) calculating the mean value of the correlation coefficients of the A and other prediction variables by using a head function, performing the same calculation on the B, and listing a variable column with high correlation coefficient;
4) if the average correlation coefficient of A is larger, removing A; otherwise, removing B;
5) and repeating the steps 2) -4) until the absolute values of all correlation coefficients are lower than the set GUO value.
Further, in step two, the method for creating a set of predictor variables and a set of result variables includes:
(I) establishing predictor sets for the first 1 to n predictor variable columns in the data set;
and (II) establishing a result variable set result by using the result variable column of the (n + 1) th column in the data set.
Further, in step two, the method for performing segmentation processing on data includes:
(1) randomly selecting a test sample from the samples by using a createdatatation function in the caret packet to construct a training set;
(2) after a training line is obtained, a prediction variable training set TrainPredictor and a result variable training set TrainResult containing the training line are created;
(3) and meanwhile, creating a predictor variable test set TestPredictors and a result variable test set TestResult by using the residual samples.
Further, in step two, the method for resampling data includes: k-fold cross-validation resampling can be achieved using the trackcontrol function in the caret packet.
Further, the K-fold cross-validation method comprises:
1) randomly dividing the samples into k subsets of comparable size, and first fitting the model with all samples except the first subset;
2) predicting the reserved first folded sample by using the model, and evaluating the model by using the result of the prediction;
3) then the first subset is returned to the training set, the second subset is reserved for model evaluation, and the like;
4) calculating the mean value and the standard deviation of the obtained k model evaluation results, and then calculating the relationship between the demodulation optimal parameter and the model performance based on the evaluation results.
Further, in the fourth step, the test model selects a linear regression model, a nonlinear regression model and a regression tree model; the linear regression model comprises a generalized linear model, a stepwise regression linear model and a partial least square regression model; the nonlinear regression model comprises a Support Vector Machine (SVM) model and a K nearest neighbor model; the regression tree models include simple regression trees, regression model trees, random forests, and cubist models.
Further, in step four, the model is predicted and evaluated using the train function in the caret package; the predicted effect of each model was evaluated using the samples function in the caret, and the model results were viewed using sum (samples).
Further, in the model comparison result, RMSE and R can be determined according to each model2Preference model, the smaller the RMSE, the higher the model prediction accuracy, R2The larger the model, the better the degree of simulation.
Another object of the present invention is to provide a computer-readable storage medium storing instructions which, when executed on a computer, cause the computer to perform the method for predicting tobacco quality based on an ecological factor model.
Another object of the present invention is to provide a computer terminal, comprising:
the transformation and screening module is used for respectively carrying out data transformation and predictive variable screening processing on predictive variables in the tobacco quality prediction;
the segmentation resampling module is used for creating a prediction variable set and a result variable set in tobacco quality prediction and respectively segmenting and resampling data;
the prediction model acquisition module is used for selecting multiple regression methods to model the data; and obtaining prediction models in the quality prediction of different tobacco leaves.
An optimal model screening module for using the root mean square error RMSE and the decision coefficient R2Evaluating the prediction effects of the different prediction models, and selecting the prediction models according to the prediction effectsAnd selecting an optimal model.
Another object of the present invention is to provide a computer-readable storage medium storing instructions which, when executed on a computer, cause the computer to perform the method for predicting tobacco quality based on an ecological factor model.
The invention also aims to provide application of the tobacco quality prediction method based on the ecological factor model in the quality detection of the tobacco in agronomic characters, physical characteristics, chemical components, disease rate, aroma substance content, smoking quality, different planting areas and different years.
By combining all the technical schemes, the invention has the advantages and positive effects that: the tobacco quality prediction method based on the ecological factor model provided by the invention constructs an ecological factor optimal model for predicting the tobacco quality by utilizing an R language. The R language is open source software for mathematics and statistical calculation, relatively complex prediction model construction can be carried out by utilizing mass data, each prediction model has uncertainty, the R language can provide as many models as possible, the uncertainty of the models is explored through strict training tests, and the optimal models are selected. The model provided by the invention can predict the quality fluctuation conditions of single-grade tobacco leaves in different areas in the current year according to the ecological climate change condition in the current year, realize targeted adjustment of the purchasing grade and quantity of the tobacco leaves, actively adjust the purchasing grade quantity and proportion of the tobacco leaves, and ensure stable quality supply of the tobacco leaves.
Drawings
FIG. 1 is a flow chart of a tobacco leaf quality prediction method based on an ecological factor model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a tobacco leaf quality prediction method based on an ecological factor model, and the invention is described in detail below with reference to the accompanying drawings.
The tobacco leaf quality prediction method based on the ecological factor model provided by the embodiment of the invention comprises the following steps: respectively carrying out data transformation and predictive variable screening processing on predictive variables in the tobacco quality prediction;
creating a prediction variable set and a result variable set in tobacco quality prediction, and respectively carrying out segmentation and resampling on data;
selecting a plurality of regression methods to model the data; and obtaining prediction models in the quality prediction of different tobacco leaves.
Using the root mean square error RMSE and the coefficient of determination R2And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
Specifically, as shown in fig. 1, the tobacco leaf quality prediction method based on the ecological factor model provided by the embodiment of the present invention includes the following steps:
s101, data preprocessing: and respectively carrying out data transformation and predictive variable screening treatment on the predictive variables.
S102, data division: and creating a prediction variable set and a result variable set, and respectively carrying out segmentation and resampling processing on the data.
S103, data modeling: and selecting a plurality of regression methods to model the data.
S104, model optimization: using the root mean square error RMSE and the coefficient of determination R2And evaluating the prediction effects of the different models, and selecting the optimal model from the test models according to the model effect.
The present invention will be further described with reference to the following examples.
Example 1
1. Data pre-processing
The data preprocessing technology generally refers to addition, deletion or transformation of training set data, and the data are transformed to reduce the influence of data skewness and outliers, so that the performance of the model can be remarkably improved.
1.1 data transformation
Predictive models require the predictive variables to have the same dimension or scale, requiring data transformation of the variables, i.e., centralization, normalization, and skewness transformation. Centralization subtracts the mean value from all variables, resulting in a transformed mean value of the variables of 0. The normalization data divides each variable by its own standard deviation, and the normalization forces the standard deviation of the variables to be 1. The skewness transformation can remove the distribution skewness, transform the right-biased distribution or the left-biased distribution into the unbiased distribution, and approximately symmetrically distribute the variables.
The method uses the preprocess function in the caret package to construct the trans function, and simultaneously performs centralization (center), standardization (scale) and skewness transformation (BoxCox) processing on data, wherein the construction process of the trans function is as follows:
trans<-preProcess(tobacco.numeric,
method=c("BoxCox","center","scale"))
after a trans function is constructed, a predict function is used for converting original data, wherein data is the original data, and transformed data is converted data.
transformed.data<-predict(trans,data)
1.2 predictive variable screening
Some of the predictor variables need to be removed prior to modeling to improve model performance and stability. And the complexity of calculation is reduced by using fewer variables for prediction, and a more concise and easily-explained model is obtained more easily by removing redundant prediction variables.
1.2.1 removing zero variance variables
The zero variance variable refers to a predictive variable with only one value, and the zero variance variable hardly contributes to the model, so that the zero variance variable needs to be distinguished and removed. If the ratio of the number of non-repeated values to the sample size is low (such as 10%), and the ratio of the highest frequency number to the second highest frequency number is high, the variance variable is zero.
The method uses the function nearzero zerovar to be filtered in the caret packet:
nearZeroVar(data)
if there is a zero variance variable in the display dataset, this variable needs to be culled.
1.2.2 removal of multiple collinearity variables
Collinearity refers to a strong correlation between a pair of predictors, and collinearity between multiple predictors is called multicollinearity. Because redundant predictors generally increase the complexity of the model rather than the information content, and the use of highly correlated predictors in linear regression models may give a very unstable model, the predictors should avoid highly correlated variables in the data. The specific algorithm is as follows:
1. calculating a correlation coefficient matrix of the predictive variable;
2. finding out the pair of predictive variables (marked as predictive variables A and B) with the maximum absolute value of the correlation coefficient;
3. calculating the mean value of the correlation coefficients of the A and other prediction variables, and performing the same calculation on the B;
4. if the average correlation coefficient of A is larger, removing A; otherwise, removing B;
5. and repeating the steps 2 to 4 until the absolute values of all correlation coefficients are lower than the set GUO value.
In the following commands, data is a data set, and correlations is a correlation coefficient matrix between every two predictive variables in the data set.
correlations<-cor(data)
After the correlation coefficient is calculated, a predictive variable with a high correlation coefficient is searched for by using a findCorrelation function, wherein correlations are used as a correlation coefficient matrix, highcorr, correlations are used as predictive variables with a screened correlation coefficient of more than 0.75, and cutoff is a set threshold value for screening the correlation coefficient:
highcorr.correlations<-findCorrelation(correlations,cutoff=0.75)
using the head function, the variable columns with high correlation coefficients are listed:
head(highcorr.correlations)
and then removing variable columns with high correlation coefficients, wherein in the following commands, data after the filtered removes multiple co-linear variables:
data.filtered<-data[,-highcorr.correlations]
2. data partitioning
2.1 creating sets of predictor variables and result variables
When a prediction model is constructed, a data structure comprises a plurality of prediction variables and an outcome variable, and independent data sets are required to be established for the prediction variables and the outcome variable respectively.
The following commands establish predictor set predictors for the first 1 to n predictor variable columns in the data dataset:
predictors<-data[,1:n]
establishing a result variable set result for the result variable column of the (n + 1) th column in the data set according to the following commands:
result<-data[,n+1]
thus, prediction variable sets predictors and result variable sets result are respectively established.
2.2 data partitioning
Some models learn noise characteristics specific to each sample while learning data generalization patterns, called overfitting. Overfitting often does not accurately predict the new samples. Inappropriate tuning parameters may result in overfitting of the model, requiring the model parameters to be adjusted through the data to give the best fit prediction. Thus, the data used to evaluate the model is not applied to build or debug the model, which can give an unbiased estimate of the model's effectiveness. When the prediction model is established, a part of samples can be selected to construct the prediction model, and the rest of samples are reserved for model evaluation. The set of samples used for modeling is referred to as the "training set" and the set of samples used for verifying the model performance is referred to as the "test set".
The method can randomly select test samples from the samples by using a createdatatartion function in the caret packet to construct a training set. In the following commands, data is a data set, training represents a randomly drawn sample line divided into a training set, and p 0.8 represents that 80% of the sample line is drawn as the training set
trainningrows<-createDataPartition(data,
p=0.8,
list=FALSE)
After the training line is obtained, a predictive variable training set TrainPredictor and a result variable training set TrainResult containing the training line are created
TrainPredictors<-predictors[trainningrows,]
TrainResult<-result[trainningrows]
Meanwhile, creating a prediction variable test set TestPredicors and a result variable test set TestResult with the remaining samples
TestPredictors<-predictors[-trainningrows,]
TestResult<-result[-trainningrows]
2.3 resampling
The resampling technique is that a sub-sample in a test set is used to fit a model, then the model is evaluated by other samples in the test set, the process is repeated for many times, and then the result is summarized. The resampling method can reasonably evaluate the performance of the model predicted on future samples. The samples may be resampled using a variety of sampling methods.
The method uses a K-fold cross-validation method, and adopts the principle that samples are randomly divided into K subsets with equivalent sizes, a model is fitted by all samples except a first subset (first folding), then the model is used for predicting a reserved first folding sample, the model is evaluated by using the result, then the first subset is returned to a training set, a second subset is reserved for model evaluation, and the like. The k model estimates thus obtained are summed (typically by calculating the mean and standard deviation) and then based on this the relationship between the demodulation parameters and the model performance is determined.
K-fold cross-validation resampling can be achieved using the raincontrol function in the caret packet, and in the following command, raincontrol is the resampling function, where method ═ cv "indicates that K-fold cross-validation is used, and number ═ 10 indicates 10-fold.
trainControl(method="cv",number=10)
3. Data modeling
The method selects a plurality of regression methods to model the data, and selects an optimal model from the test models according to the model effect. The method selects a linear regression model, a non-linear regression model and a regression tree model. The linear regression model comprises a generalized linear model, a stepwise regression linear model and a partial least square regression model; the nonlinear regression model comprises a Support Vector Machine (SVM) model and a K nearest neighbor model; regression tree models include simple regression trees, regression model trees, random forests, and cubist models.
The prediction and evaluation of the above models both use the train function in the caret package, and the general commands are as follows, where fit denotes the model, x denotes the regression method used by the different models (method commands used by the different models are as follows), and trControl specifies the resampling method, which is cross validation by 10.
fit<-train(x=TrainPredictors,y=TrainScore,
method="x",
trControl=trainControl(method="cv",number=10))
Figure BDA0002738544510000071
4. Model optimization
Using Root Mean Square Error (RMSE) and coefficient of determination (R)2) And evaluating the prediction effects of different models. RMSE is a function of the model residual, i.e., the observed value minus the predicted value of the model, which accounts for the average distance between the observed value and the predicted value of the model. Determining the coefficient (R)2) Interpreted as the proportion of the information contained in the data that can be interpreted by the model.
The prediction effect of each model was evaluated using the samples function in the caret, in the following order, sample is the result of each model evaluation, the model results can be viewed using sum (sample), and fit1, fit2, fit3 represent different models.
resample<-resamples(list(fit1,fit2,fit3))
summary(resamp)
In the model comparison results, the RMSE and R of each model can be determined2Preference model, the smaller the RMSE, the higher the model prediction accuracy, R2The larger the model, the better the degree of simulation.
5. Model validation
(1) Model prediction
A predictive model was constructed using the training set above and based on RMSE and R2Better performing models are preferred. The part tests the prediction effect of each optimized model by using the prediction function and the test set data. The commands are as follows, in the following commands, predict is the prediction function, fit is the model under test, and TestPredictors are the prediction variables of the test set.
PredictedResult<-predict(fit,TestPredictors)
(2) Model validation
And obtaining a predicted value PredictedResult according to model prediction, and comparing the predicted value PredictedResult with an observed value TestResult of the test set to measure the model prediction effect. The model quality was measured by the following 2 visualization methods.
1) And (4) understanding the model fitting effect by the observed value and the predicted value scatter diagram. A scatter plot of observed and predicted values is presented using the plot function. The predicted value and the observed value of the ideal model are distributed along an oblique line with the slope of 1, and the closer to the oblique line, the better the model prediction effect is.
plot(TestScore,PredictedResult)
(2) Systematic mode for displaying predicted values by scatter diagram of residual errors and predicted values
The difference between the observed value and the predicted value is the model residual, and is calculated by using the following commands:
residualvalues<-TestResult-Predictedresult
model with no systematic error, the residual should be distributed uniformly around 0, and plot of residual and predicted values can be shown using plot function.
plot(PredictedResult,residualvalues)
(3) RMSE and R for calculating observed and predicted values2
Using RMSE and R2Functional computingAnd (3) the fitting effect of the observed value and the predicted value is ordered as follows:
R2(PredictedResult,TestResult)
RMSE(PredictedResult,TestResult)
in the same way, R2The larger the model prediction effect, the better the fitting between the representative observed value and the predicted value, the smaller the RMSE, the closer the representative predicted value and the observed value are, and the better the model prediction effect is.
Example 2
1. Data pre-processing
Firstly, data required by the model is preprocessed, and the data is transformed to reduce the influence of data skewness and outliers, so that the model performance can be obviously improved.
1.1 data transformation
1.1.1 importing data
library(readxl)
Load data reading packet "readxl" (R packet, R function, is a collection of code and sample data)
tobacco<-read_excel("tobacco.xlsx",col_names=TRUE)
Data are imported and named "tobaco"
1.1.2 data structures and transformations
(1) Viewing data structures
str(tobacco)
The example data includes 595 samples, 51 variables, 50 predictor variables, and 1 result variable. The variables are predicted as 6 symbols and 44 continuous variables.
Character-type variables in the predictive variables need to be converted into factor types. In this example, 6 character-type variables of the variables Area, cutivar, Position, soil type, landform, and transplant were converted into factor types.
tobacco$Area<-factor(tobacco$Area)
tobacco$Cultivar<-factor(tobacco$Cultivar)
tobacco$Position<-factor(tobacco$Position)
tobacco$soiltype<-factor(tobacco$soiltype)
tobacco$landform<-factor(tobacco$landform)
tobacco$transplant<-factor(tobacco$transplant,levels=c("early","middle","late"),ordered=TRUE)
Continuous variables were centered, normalized and biased, in this case TN (total nitrogen), Ni (nicotine), TS (total sugar), RS (reducing sugar), K (potassium), Cl (chlorine), PE (petroleum ether), St (starch), N/Ni (nitrogen-to-base ratio), "RS/Ni '(sugar-to-base ratio), DS (sugar-to-sugar ratio), K/Cl (potassium-to-chlorine ratio), particulate size (soil particle size), elevation, ph, som (soil organic matter), an (soil available nitrogen), ap (soil available phosphorus), ak (soil available potassium), scl (soil chlorine), B (soil boron), growth period, leaf number), fertilization (nitrogen amount), mayrainfanfal (5-month rainfall amount), juneranfaraff (6-month rainfall amount), junmerely rain fall amount, 7-month rainfall amount, 8-month rainfall amount (8-month rainfall amount), and average rainfall amount (8-month rainfall amount) using the propertymet's function of the caret package, Conversion was carried out with 44 continuous variables, maytem (month 5 temperature), junetem (month 6 temperature), julytem (month 7 temperature), augusttem (month 8 temperature), grewthtem (month growth temperature), maysun (month 5 light), junesun (month 6 light), julysun (month 7 light), augustsun (month 8 light), grewthsun (month growth light), mayhumidate (month 5 humidity), junehmidity (month 6 humidity), junyhumdity (month 7 humidity), augusthumidate (month 8 humidity), and grewhmidity (month growth humidity).
library(caret)
# load caret Package
tobacco.numeric<-as.data.frame(tobacco[,c(5:34,38:69)])
# screening digital data and creating a data set
trans<-preProcess(tobacco.numeric,
method=c("BoxCox","center","scale"))
# integrates 3 functions of skewness conversion, centralization and standardized transformation by using a preprocess function to construct a trans function.
tobacco.transformed.numeric<-predict(trans,tobacco.numeric)
# continuous variables were transformed using the trans function.
tobacco.factor<-tobacco[,c(1:4,35:37)]
tobacco.transformed<-cbind(tobacco.factor,tobacco.transformed.numeric)
# integrates a factorial predictor and a continuum predictor.
1.2 predictive variable screening
1.2.1 removing zero variance variables
The near zero variance variable to be filtered is detected using the function nearest zero var in the caret package.
nearZeroVar(tobacco.transformed.numeric)
1.2.2 removal of multiple collinearity variables
Removal of highly multicollinear variations in chemical composition
library(corrplot)
Load dependency coefficient calculation packet
Removing variable with high multiple collinearity in chemical components of tobacco leaves
tobacco leaf chemical component extraction prediction variable tbacacco chemical transformation [,26:37] #
chemical < -cor (tobaco chemical) # calculates correlation coefficient
Chemical < -correlation (correlation. chemical, cutoff is 0.75) # looks for variables with correlation coefficients above 0.75
head (high corr chemical) # lists the variable column with high correlation coefficient, in this example, RS/Ni, TS, Cl are multiple collinearity variables
##[1]10 3 6
Filtered into. tobaco. chemical [, -highcorr. chemical ] # removes multiple collinearity variables
Removing multiple high-collinearity variables in ecological factors
Ecological factor prediction variable extraction from tobaco
correlation coefficient was calculated for correlation between the correlation and the electrical signal of the core (tobaco
highcorr.ecological<-findCorrelation(correlations.ecological,cutoff=0.75)
# search for variables having a correlation coefficient of 0.75 or more
head (higher correlation. ecological) # lists the variable column with high correlation coefficient
##[1]322817291530
In this example, humidity of Mayhumy (Mayhumurity), humidity of June (junehumurity), humidity of July (Julyhumidity), humidity of growing period (growthhumidity), rainfall of July (Julayrainfall), and rainfall of growing period (growthrainfall) are multiple co-linear variables.
Filtered into numerical [, -high corr. technical ] # removes multiple collinearity variables
And integrating the converted and screened variables to serve as a prediction variable data set.
tobacco.filtered<-cbind(tobacco.factor,tobacco.chemical.filtered,tobacco.ecological.filtered)
And the data set is exported, so that the later use is facilitated.
write.csv(tobacco.filtered,"tobacco.filtered.csv",row.names=FALSE,col.names=TRUE)
2. Data set construction
2.1 creating sets of predictor variables and result variables
Importing preprocessed data
tobacco.filtered<-read.csv("tobacco.filtered.csv")
tobacco<-read_excel("tobacco.xlsx",col_names=TRUE)
Creating a set of predicted variables
(1) Creation of prediction variable sets for conventional prediction models, such as linear regression models
predictors<-tobacco.filtered[,-c(4,8:25)]
(2) Creation of prediction variable set suitable for random vector machine, K neighbor and other models
ind.Area<-nnet::class.ind(predictors$Area)
ind.Cultivar<-nnet::class.ind(predictors$Cultivar)
ind.Position<-nnet::class.ind(predictors$Position)
ind.soiltype<-nnet::class.ind(predictors$soiltype)
ind.landform<-nnet::class.ind(predictors$landform)
ind.transplant<-nnet::class.ind(predictors$transplant)
ind<-cbind(ind.Area,ind.Cultivar,ind.Position,ind.soiltype,ind.landform,ind.transplant)
trans.1<-preProcess(ind,method=c("BoxCox","center","scale"))
ind.transformed<-predict(trans.1,ind)
predictors.ind<-cbind(ind.transformed,predictors[,-c(1:6)])
And creating a result variable set, wherein the result variable refers to the sensory quality evaluation score of the tobacco leaves, and the result variable in the example is the sensory quality evaluation total score of the tobacco leaves.
score<-tobacco$SCORE
2.2 data partitioning
2.2.1 data partitioning
Training sets and test sets of predictor variables and outcome variables are created, respectively.
set (222) # sets random number seeds to ensure repeatable results
trainningrows<-createDataPartition(score,
p=0.8,
list=FALSE)
In this example, 80% of the sample rows are randomly selected as training rows, and the rows represent the samples divided into the training set
TrainPredictors<-predictors[trainningrows,]
Train predictors. ind < -predictors. ind, # selects prediction variable samples to training set
Selecting result variable samples from TrainScore < -score [ trainingrows ] # to a training set
TestPredictors<-predictors[-trainningrows,]
Test predictors. ind < -predictors. ind [ -following, ] # samples of predictor variables are taken to test set
Selecting result variable samples from a test score < -score [ -training gases ] # to a test set
2.2.2 resampling
In this example, a 10-fold cross-resampling method is selected. the instructions in the train function are as follows:
trControl=trainControl(method="cv",number=10)
3. data modeling
3.1 Linear regression model
3.1.1 generalized Linear model
Inputting a code:
set.seed(222)
glm1<-train(x=TrainPredictors,y=TrainScore,
method="glm",
trControl=trainControl(method="cv",number=10))
glm1
##GeneralizedLinearModel
and outputting a result:
model #477samples prediction adopted 477sample size
Prediction of #30predictor # model Using 30predictor variables
# sampling Cross-Validated (10fold) # Resampling method: 10fold cross validation
##Summary ofsample sizes:429,430,430,430,429,429,...
# Resampling results
##RMSE Rsquared MAE
##3.001024 0.03957612 2.327293
3.1.2 stepwise regression of Linear models
Inputting a code:
set.seed(222)
glmstep1<-train(x=TrainPredictors,y=TrainScore,
method="glmStepAIC",
trControl=trainControl(method="cv",number=10))
and outputting a result:
Figure BDA0002738544510000121
Figure BDA0002738544510000131
3.1.3 general Linear regression
Inputting a code:
Figure BDA0002738544510000132
Figure BDA0002738544510000141
and outputting a result:
Figure BDA0002738544510000142
3.1.4 partial least squares plsr
Inputting a code:
Figure BDA0002738544510000143
and outputting a result:
Figure BDA0002738544510000144
3.2 nonlinear regression model
3.2.1 support vector machine SVM
Inputting a code:
Figure BDA0002738544510000151
and outputting a result:
Figure BDA0002738544510000152
3.2.2K neighbor
Inputting a code:
Figure BDA0002738544510000153
Figure BDA0002738544510000161
and outputting a result:
Figure BDA0002738544510000162
3.3 regression Tree model
3.3.1 simple regression Tree (Single Tree)
Inputting a code:
Figure BDA0002738544510000163
and outputting a result:
Figure BDA0002738544510000164
Figure BDA0002738544510000171
3.3.2 regression model Tree
Inputting a code:
Figure BDA0002738544510000172
and outputting a result:
Figure BDA0002738544510000173
Figure BDA0002738544510000181
3.3.3 random forest
Inputting a code:
Figure BDA0002738544510000182
and outputting a result:
Figure BDA0002738544510000183
3.3.4cubist
inputting a code:
Figure BDA0002738544510000184
and outputting a result:
Figure BDA0002738544510000185
Figure BDA0002738544510000191
the invention is further described below with reference to specific examples and experimental data.
4. Effect of model fitting
The effect of the fit to each model was evaluated by comparing MAE, RMSE and R2.
Inputting a code:
resamp<-resamples(list(glm=glm1,lm=lm1,plsr=plsr1,SVM=SVM1,knnTune=knnTune,rpartTune=rpartTune1,M
5Tune=M5Tune1,cubist=cubist1,randomforest=randomforest1))
# the models were compared using the resamples function.
And outputting a result:
Figure BDA0002738544510000192
Figure BDA0002738544510000201
note: MAE (mean absolute error) is the average absolute error of the model and is the average value of the absolute error, RMSE (root mean squared error) is the root mean square error and is the square root of the average value of the square difference between the predicted value and the actual observation and is used for measuring the residual error of the model, wherein the residual error is the observed value minus the predicted value of the model, and the RMSE explains the average distance between the observed value and the predicted value of the model. Determining the coefficient (R)2) Interpreted as the proportion of the information contained in the data that can be interpreted by the model.
Model R2Values above 0.26 are preferred, values between 0.13 and 0.26 are medium, and values between 0.02 and 0.13 are poor (Cohen et al, 1988). From the comparison result of the models, the randomfortest model has the lowest MAE and RMSE values, the highest R2 value is close to 0.26, and the prediction effect is the best; and SVM and cubist models are adopted secondly, and the simulation effect of other models is poor.
5. Model predicted effect
Taking a random forest as an example, the prediction and evaluation process is as follows:
(1) prediction
And (3) predicting the test sample by using a prediction function and applying a random forest model:
randomfortest < -predict (randomfortest 1, TestPredictors) # randomfortest 1 is a random forest model, TestPredictors is a test sample, and predictedscore.
(2) MeterRMSE and R for calculating predicted values and observed values2
R2(PredictedScore.randomforest,TestScore)
##[1]0.256195
RMSE(PredictedScore.randomforest,TestScore)
##[1]2.330102
Similarly, 10 models were predicted and evaluated, and the results are shown in the following table:
Figure BDA0002738544510000211
in each model, the absolute error (MAE) and the Root Mean Square Error (RMSE) between the predicted value and the observed value of the random forest model are the smallest, the absolute coefficient (R2) is the largest, and the model prediction result is the best.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The tobacco quality prediction method based on the ecological factor model is characterized by comprising the following steps of:
respectively carrying out data transformation and screening treatment on the prediction variables in the tobacco quality prediction;
creating a prediction variable set and a result variable set in tobacco quality prediction, and respectively carrying out segmentation and resampling on data;
selecting a plurality of regression methods to model the data; different prediction models of the tobacco leaf quality are obtained.
Using the root mean square error RMSE and the coefficient of determination R2And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
2. The ecological factor model-based tobacco leaf quality prediction method of claim 1, wherein the data transformation on the prediction variables includes centralization, normalization and skewness transformation; the centralization is that all variables are subtracted by the mean value, and the result is that the mean value of the transformed variables is 0; the normalized data is the division of each variable by its own standard deviation, normalization forces the standard deviation of the variable to be 1; the skewness transformation can remove distribution skewness, so that right-biased distribution or left-biased distribution is transformed into unbiased distribution, and variables are distributed approximately symmetrically.
3. The ecological factor model-based tobacco leaf quality prediction method of claim 1, wherein the method of data transformation of the prediction variables comprises:
(I) constructing a trans function by using a preprocess function in a caret packet, and simultaneously carrying out centralized center, standardized scale and skewness transformation BoxCox processing on data;
(II) after construction of trans function, transformation of the original data using the predict function.
4. The ecological factor model-based tobacco leaf quality prediction method of claim 1, wherein the predictive variable screening method comprises:
(1) removing a zero variance variable: the near-zero variance variable to be filtered is detected using the nearzero zerovar function in the caret package: if the display data set has a zero variance variable, the variable needs to be removed;
(2) removing multiple collinearity variables;
in the step (2), the method for removing multiple collinearity variables comprises the following steps:
1) calculating a correlation coefficient matrix among the prediction variables by using a cor function in a corrplot packet;
2) finding out the pair of predictive variables with the maximum absolute value of the correlation coefficient by using a findCorrelation function, and marking as the predictive variables A and B;
3) calculating the mean value of the correlation coefficients of the A and other prediction variables by using a head function, performing the same calculation on the B, and listing a variable column with high correlation coefficient;
4) if the average correlation coefficient of A is larger, removing A; otherwise, removing B;
5) and repeating the steps 2) -4) until the absolute values of all correlation coefficients are lower than the set GUO value.
5. The ecological factor model-based tobacco leaf quality prediction method of claim 1, wherein the method of creating a set of predictor variables and a set of outcome variables comprises:
(I) establishing predictor sets for the first 1 to n predictor variable columns in the data set;
(II) establishing a result variable set result for the result variable column of the (n + 1) th column in the data set;
the method for segmenting data comprises the following steps:
(1) randomly picking training lines from the sample using the createdatapartion function in the caret package;
(2) after a training line is obtained, a prediction variable training set TrainPredictor and a result variable training set TrainResult containing the training line are created;
(3) and meanwhile, creating a predictor variable test set TestPredictors and a result variable test set TestResult by using the residual samples.
6. The ecological factor model-based tobacco leaf quality prediction method according to claim 1, wherein the method for resampling the data comprises: resampling by a K-fold cross validation method can be realized by using a trainControl function in a caret packet;
the K-fold cross-validation method comprises the following steps:
1) randomly dividing the samples into k subsets of comparable size, and first fitting the model with all samples except the first subset;
2) predicting the reserved first folded sample by using the model, and evaluating the model by using the result of the prediction;
3) then the first subset is returned to the training set, the second subset is reserved for model evaluation, and the like;
4) calculating the mean value and the standard deviation of the obtained k model evaluation results, and then calculating the relationship between the demodulation optimal parameter and the model performance based on the evaluation results.
7. The ecological factor model-based tobacco leaf quality prediction method of claim 1, wherein the test model selects a linear regression model, a non-linear regression model and a regression tree model; the linear regression model comprises a generalized linear model, a stepwise regression linear model and a partial least square regression model; the nonlinear regression model comprises a Support Vector Machine (SVM) model and a K nearest neighbor model; the regression tree model comprises a simple regression tree, a regression model tree, a random forest and a cubist model;
prediction and evaluation of the model using the train function in the caret package; evaluating the predicted effect of each model by using a resamples function in the caret package;
in the model comparison results, the RMSE and R are determined according to each model2Preference model, the smaller the RMSE, the higher the model prediction accuracy, R2The larger the model, the better the degree of simulation.
8. A computer terminal, characterized in that the computer terminal comprises:
the transformation and screening module is used for respectively carrying out data transformation and predictive variable screening processing on predictive variables in the tobacco quality prediction;
the segmentation resampling module is used for creating a prediction variable set and a result variable set in tobacco quality prediction and respectively segmenting and resampling data;
the prediction model acquisition module is used for selecting multiple regression methods to model the data; obtaining prediction models in the quality prediction of different tobacco leaves;
an optimal model screening module for using the root mean square error RMSE and the decision coefficient R2And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
9. A computer-readable storage medium storing instructions which, when executed on a computer, cause the computer to perform the method of ecological factor model-based tobacco leaf quality prediction according to any one of claims 1 to 7.
10. The ecological factor model-based tobacco leaf quality prediction method according to any one of claims 1 to 7, is applied to detection, evaluation and prediction of tobacco leaf production quality such as economic traits, disease rate, appearance quality, physical characteristics, chemical components, aroma substances, sensory evaluation and the like of tobacco leaves in different planting areas and different years.
CN202011141976.2A 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language Active CN112287601B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011141976.2A CN112287601B (en) 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011141976.2A CN112287601B (en) 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language

Publications (2)

Publication Number Publication Date
CN112287601A true CN112287601A (en) 2021-01-29
CN112287601B CN112287601B (en) 2023-08-01

Family

ID=74424144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011141976.2A Active CN112287601B (en) 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language

Country Status (1)

Country Link
CN (1) CN112287601B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256021A (en) * 2021-06-16 2021-08-13 北京德风新征程科技有限公司 Product quality alarm method and device based on ensemble learning
CN113488113A (en) * 2021-07-12 2021-10-08 浙江中烟工业有限责任公司 Industrial use value identification method of redried strip tobacco
CN115481750A (en) * 2022-09-20 2022-12-16 云南省农业科学院农业环境资源研究所 On-line prediction method and system for nitrate nitrogen in underground water based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107991969A (en) * 2017-12-25 2018-05-04 云南五佳生物科技有限公司 A kind of wisdom tobacco planting management system based on Internet of Things
US20190302069A1 (en) * 2016-07-04 2019-10-03 British American Tobacco (Investments) Limited Apparatus and method for classifying a tobacco sample into one of a predefined set of taste categories
CN110751335A (en) * 2019-10-21 2020-02-04 中国气象局沈阳大气环境研究所 Regional ecological quality annual scene prediction evaluation method and device
CN110990784A (en) * 2019-11-19 2020-04-10 湖北中烟工业有限责任公司 Cigarette ventilation rate prediction method based on gradient lifting regression tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190302069A1 (en) * 2016-07-04 2019-10-03 British American Tobacco (Investments) Limited Apparatus and method for classifying a tobacco sample into one of a predefined set of taste categories
CN107991969A (en) * 2017-12-25 2018-05-04 云南五佳生物科技有限公司 A kind of wisdom tobacco planting management system based on Internet of Things
CN110751335A (en) * 2019-10-21 2020-02-04 中国气象局沈阳大气环境研究所 Regional ecological quality annual scene prediction evaluation method and device
CN110990784A (en) * 2019-11-19 2020-04-10 湖北中烟工业有限责任公司 Cigarette ventilation rate prediction method based on gradient lifting regression tree

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
常安然等: "基于烤烟理化指标构建烤烟感官质量预测模型", 《江西农业学报》 *
常安然等: "基于烤烟理化指标构建烤烟感官质量预测模型", 《江西农业学报》, vol. 29, no. 01, 15 January 2017 (2017-01-15), pages 75 - 79 *
张光辉等: "融合粗糙集和灰色***的烟叶感官质量预测", 《计算机与应用化学》 *
张光辉等: "融合粗糙集和灰色***的烟叶感官质量预测", 《计算机与应用化学》, vol. 34, no. 02, 28 February 2017 (2017-02-28), pages 163 - 166 *
杜国伟: ""重庆植烟区生态条件与烤烟产质量关系的研究"", 《中国优秀博硕士学位论文全文数据库(硕士)农业科技辑》 *
杜国伟: ""重庆植烟区生态条件与烤烟产质量关系的研究"", 《中国优秀博硕士学位论文全文数据库(硕士)农业科技辑》, no. 10, 15 October 2014 (2014-10-15), pages 2 *
胡建军: ""烟叶质量评价方法优选与实证研究"", 《中国博士学位论文全文数据库 农业科技辑》 *
胡建军: ""烟叶质量评价方法优选与实证研究"", 《中国博士学位论文全文数据库 农业科技辑》, no. 8, 19 August 2009 (2009-08-19), pages 1 *
许安定等: "基于CART模型的烤烟评吸质量影响因子研究", 《西南农业学报》 *
许安定等: "基于CART模型的烤烟评吸质量影响因子研究", 《西南农业学报》, vol. 26, no. 04, 28 August 2013 (2013-08-28), pages 1356 *
高若楠等: "基于随机森林模型的天然林立地生产力预测研究", 《中南林业科技大学学报》 *
高若楠等: "基于随机森林模型的天然林立地生产力预测研究", 《中南林业科技大学学报》, no. 04, 10 January 2019 (2019-01-10), pages 45 - 52 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256021A (en) * 2021-06-16 2021-08-13 北京德风新征程科技有限公司 Product quality alarm method and device based on ensemble learning
CN113488113A (en) * 2021-07-12 2021-10-08 浙江中烟工业有限责任公司 Industrial use value identification method of redried strip tobacco
CN113488113B (en) * 2021-07-12 2024-02-23 浙江中烟工业有限责任公司 Industrial use value identification method for redried strip tobacco
CN115481750A (en) * 2022-09-20 2022-12-16 云南省农业科学院农业环境资源研究所 On-line prediction method and system for nitrate nitrogen in underground water based on machine learning

Also Published As

Publication number Publication date
CN112287601B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
Zeng et al. Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest
Liu et al. Plant hydraulics accentuates the effect of atmospheric moisture stress on transpiration
CN112287601A (en) Method and medium for constructing tobacco leaf quality prediction model by using R language and application
CN110674604B (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
Ordway et al. Carbon declines along tropical forest edges correspond to heterogeneous effects on canopy structure and function
Jin et al. Parameter sensitivity analysis of the AquaCrop model based on extended fourier amplitude sensitivity under different agro-meteorological conditions and application
Eknes et al. An ensemble Kalman filter with a 1-D marine ecosystem model
Li et al. A global yield dataset for major lignocellulosic bioenergy crops based on field measurements
Bombi et al. Scaling down distribution maps from atlas data: a test of different approaches with virtual species
Benes et al. Multiscale computational models can guide experimentation and targeted measurements for crop improvement
Vanderwel et al. Climate‐related variation in mortality and recruitment determine regional forest‐type distributions
Meynard et al. The effect of a gradual response to the environment on species distribution modeling performance
He et al. Reference carbon cycle dataset for typical Chinese forests via colocated observations and data assimilation
Kawakita et al. Prediction and parameter uncertainty for winter wheat phenology models depend on model and parameterization method differences
Pagès et al. Links between root length density profiles and models of the root system architecture
Louzis Steady‐state modeling and macroeconomic forecasting quality
CN112612822A (en) Beidou coordinate time series prediction method, device, equipment and storage medium
Daou et al. Quantifying the relationship linking the community‐weighted means of plant traits and soil fertility
Boursiac et al. Phenotyping and modeling of root hydraulic architecture reveal critical determinants of axial water transport
Aboelyazeed et al. A differentiable, physics-informed ecosystem modeling and learning framework for large-scale inverse problems: Demonstration with photosynthesis simulations
Weng et al. Uncertainty analysis of forest carbon sink forecast with varying measurement errors: a data assimilation approach
Tribble et al. Unearthing modes of climatic adaptation in underground storage organs across Liliales
Wu et al. Effect of climate dataset selection on simulations of terrestrial GPP: Highest uncertainty for tropical regions
Andermann et al. The origin and evolution of open habitats in North America inferred by Bayesian deep learning models
Fleisher et al. Cultivar coefficient stability and effects on yield projections in the SPUDSIM model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant