CN116703460B - Method and device for establishing commodity data prediction model, electronic equipment and storage medium - Google Patents

Method and device for establishing commodity data prediction model, electronic equipment and storage medium Download PDF

Info

Publication number
CN116703460B
CN116703460B CN202310488609.7A CN202310488609A CN116703460B CN 116703460 B CN116703460 B CN 116703460B CN 202310488609 A CN202310488609 A CN 202310488609A CN 116703460 B CN116703460 B CN 116703460B
Authority
CN
China
Prior art keywords
feature
commodity
features
prediction model
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310488609.7A
Other languages
Chinese (zh)
Other versions
CN116703460A (en
Inventor
谢方敏
周峰
周勇
伍世志
刘洁莹
黄庆林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fangzhou Information Technology Co ltd
Original Assignee
Guangzhou Fangzhou Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fangzhou Information Technology Co ltd filed Critical Guangzhou Fangzhou Information Technology Co ltd
Priority to CN202310488609.7A priority Critical patent/CN116703460B/en
Publication of CN116703460A publication Critical patent/CN116703460A/en
Application granted granted Critical
Publication of CN116703460B publication Critical patent/CN116703460B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method for establishing a commodity data prediction model, which comprises the following steps: acquiring original commodity characteristics, and carrying out specific characteristic derivation on the original commodity characteristics to obtain specific derived characteristics; cross feature derivatization is carried out on the original commodity features and the specific derivative features to obtain cross derivative features; performing feature screening according to the original commodity features and the cross derivative features to obtain target commodity features; constructing a commodity data prediction model of linear regression through the target commodity characteristics; wherein the particular feature derivation comprises at least one of: inversion feature derivation, quadratic feature derivation, and abnormal feature derivation. Compared with the prior art, the method and the device can ensure the timeliness of the prediction model and improve the prediction accuracy.

Description

Method and device for establishing commodity data prediction model, electronic equipment and storage medium
Technical Field
The present invention relates to the field of commodity data prediction technologies, and in particular, to a method, an apparatus, an electronic device, and a computer readable storage medium for commodity data prediction and for creating a commodity data prediction model.
Background
In the e-commerce industry, future commodity data, such as commodity sales, future pricing, etc., are predicted and predicted according to commodity characteristics, such as commodity price, commodity inventory, commodity performance parameters, etc., and can provide references for operation decisions of operators.
Currently, a deep learning model is commonly used for predicting commodity data. However, the number of commodity features affecting commodity data is very large, which determines the tuning parameters of the deep learning model for the deep learning model, and too large tuning parameters require long training time, which affects the timeliness of the data prediction model.
In order to ensure timeliness of the model, a linear regression model with short training time can be adopted to predict commodity data. But the accuracy of predicting commodity data by a linear regression model is low at present.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a commodity data prediction method which can ensure the timeliness of a prediction model and improve the prediction accuracy.
The invention is realized by the following technical scheme: a method of building a commodity data predictive model, comprising the steps of:
Acquiring original commodity characteristics, and carrying out specific characteristic derivation on the original commodity characteristics to obtain specific derived characteristics;
cross feature derivatization is carried out on the original commodity features and the specific derivative features to obtain cross derivative features;
performing feature screening according to the original commodity features and the cross derivative features to obtain target commodity features;
constructing a commodity data prediction model of linear regression through the target commodity characteristics;
Wherein the particular feature derivation comprises at least one of: inversion feature derivation, quadratic feature derivation, and abnormal feature derivation;
The inversion feature derivation includes the steps of: inverting the original commodity characteristics;
The quadratic characteristic derivation includes the steps of: performing a power calculation on the original commodity characteristics;
The abnormal feature derivation includes the steps of: acquiring the original commodity characteristic data of an evaluation period as an evaluation value; acquiring the same original commodity characteristic data of a period before the evaluation period as a reference value; comparing the evaluation value with the reference value, and obtaining the specific derivative feature according to the comparison result.
Compared with the prior art, the invention leads the original commodity characteristics in inverse proportion relation with the predicted commodity data to be more suitable for the commodity data prediction model of linear regression through inversion characteristic derivation; deriving a variation amplitude of the original commodity features with small variation amplitude through the quadratic features so as to increase the sensitivity of the linear regression commodity data prediction model to the original commodity features; and adding abnormal features through abnormal feature derivation, and marking the abnormal data to assist the commodity data prediction model in distinguishing the abnormal data, so that the prediction accuracy of the commodity data prediction model is improved. Meanwhile, the characteristics with small contribution to commodity data prediction and certain influence can be reserved through cross characteristic derivation, so that the prediction accuracy of the commodity data prediction model is further improved.
Further, feature screening is performed according to the original commodity features and the cross derivative features to obtain target commodity features, and the method comprises the following steps:
taking all the characteristics corresponding to the original commodity characteristics and the cross derivative characteristics as first characteristic combinations, and constructing a first prediction model through the first characteristic combinations;
training the first prediction model;
Performing linear regression processing on a first evaluation set through the trained first prediction model to obtain a first prediction result, and evaluating the first prediction result to obtain a first evaluation value;
Constructing a missing feature combination missing the feature aiming at each feature in the original commodity feature and the cross derivative feature, and constructing a second prediction model through each missing feature combination;
training each second prediction model;
performing linear regression processing on a second evaluation set through the trained second prediction model to obtain a plurality of second prediction results, and evaluating each second prediction result to obtain a plurality of second evaluation values;
and comparing each second evaluation value with the first evaluation value, and determining the target commodity characteristics according to the comparison result.
Further, training the first prediction model includes the steps of:
Acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to the first characteristic combination of the corresponding random time length by taking each group of random starting time as a starting point to acquire a plurality of groups of first historical characteristic data;
Dividing each group of the first historical characteristic data into a first training set, a first test set and a first evaluation set in time sequence;
fitting through the first training set to obtain characteristic parameters of the first prediction model;
Verifying the prediction error of the first prediction model through the first test set, and adjusting the super-parameters of the first prediction model according to the prediction error;
and alternately executing the steps until the optimal first prediction model characteristic parameters are obtained.
Further, training each of the second prediction models includes the steps of:
Acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to missing characteristic combinations corresponding to the random time length by taking each group of random starting time as a starting point to acquire a plurality of second groups of historical characteristic data;
dividing each group of the second historical characteristic data into a second training set, a second testing set and a second evaluating set in time sequence;
fitting through the second training set to obtain characteristic parameters of the second prediction model;
Verifying the prediction error of the second prediction model through the second test set, and adjusting the super-parameters of the second prediction model according to the prediction error;
And alternately executing the steps until the optimal second prediction model characteristic parameters are obtained.
Further, the first evaluation value is a variance of the first predicted result, and the second evaluation value is a variance of the second predicted result;
comparing each second evaluation value with the first evaluation value, and determining the target commodity characteristics according to the comparison result comprises the following steps: and if the second evaluation value is larger than the first evaluation value, determining that the missing feature of the feature missing combination corresponding to the second evaluation value is the target commodity feature.
Further, the dividing ratio of the first training set, the first test set and the first evaluation set is a:b:c, wherein a is not less than 50% of each group of the first historical characteristic data, and b is not less than c;
the dividing ratio of the second training set, the second testing set and the second evaluation set is a:b:c, wherein a is not less than 50% of each group of the second historical characteristic data, and b is not less than c.
Further, before cross feature derivatization of the original merchandise feature and the specific derivative feature, the method comprises the steps of:
The zero values of the multiple zero features in the original commodity feature are transformed to near-zero values.
Based on the same inventive concept, the invention also provides a device for establishing a commodity data prediction model, which comprises:
the specific feature deriving module is used for acquiring the original commodity features, and performing specific feature derivation on the original commodity features to obtain specific derived features;
the cross feature deriving module is used for performing cross feature derivation on the original commodity features and the specific derived features to obtain cross derived features;
The feature screening module is used for carrying out feature screening according to the original commodity features and the cross derivative features to obtain target commodity features;
And the model construction module is used for constructing a commodity data prediction model of linear regression through the target commodity characteristics.
Wherein the specific feature derivation module comprises at least one of: an inverted feature derivation sub-module, a quadratic feature derivation sub-module, and an abnormal feature derivation sub-module;
the inversion characteristic deriving submodule is used for carrying out inversion treatment on the original commodity characteristics;
the secondary characteristic deriving submodule is used for carrying out secondary calculation on the original commodity characteristics;
The abnormal characteristic deriving sub-module is used for acquiring the original commodity characteristic data of the evaluation period as an evaluation value; acquiring the same original commodity characteristic data of a period before the evaluation period as a reference value; comparing the evaluation value with the reference value, and obtaining the specific derivative feature according to the comparison result.
Further, the feature screening module includes:
the first construction submodule is used for constructing a first prediction model through a first feature combination by taking all features corresponding to the original commodity features and the cross derivative features as the first feature combination;
The first training submodule is used for training the first prediction model;
the first evaluation sub-module is used for carrying out linear regression processing on a first evaluation set through the trained first prediction model to obtain a first prediction result, and evaluating the first prediction result to obtain a first evaluation value;
the second construction submodule is used for constructing a missing feature combination missing the feature aiming at each feature in the original commodity feature and the cross derivative feature, and constructing a second prediction model through each missing feature combination;
the second training submodule is used for training each second prediction model;
the second evaluation sub-module is used for carrying out linear regression processing on a second evaluation set through the trained second prediction model to obtain a plurality of second prediction results, and evaluating each second prediction result to obtain a plurality of second evaluation values;
and the evaluation analysis sub-module is used for comparing each second evaluation value with the first evaluation value and determining the target commodity characteristics according to the comparison result.
Based on the same inventive concept, the invention also provides a commodity data prediction model which is used for carrying out linear regression processing on commodity characteristic data to predict and obtain commodity data, and the commodity data is established by the method.
Based on the same inventive concept, the present invention also provides an electronic device, including:
A processor;
A memory for storing a computer program for execution by the processor;
wherein the processor, when executing the computer program, implements the steps of the above method.
Based on the same inventive concept, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the steps of the above-described method.
For a better understanding and implementation, the present invention is described in detail below with reference to the drawings.
Drawings
Fig. 1 is a schematic structural diagram of a device for establishing a commodity data prediction model according to an embodiment;
FIG. 2 is a flow chart of a method for establishing a merchandise data prediction model corresponding to the apparatus of FIG. 1;
FIG. 3 is a schematic diagram of the first training sub-module 32a of the apparatus of FIG. 1;
FIG. 4 is a flow chart of step S32a in the method of FIG. 2;
FIG. 5 is a schematic diagram of a second training sub-module 32b of the apparatus of FIG. 1;
FIG. 6 is a flow chart of step S32b in the method of FIG. 2;
FIG. 7 is a schematic diagram of a model training module 50 of the apparatus of FIG. 1;
fig. 8 is a schematic flow chart of step S5 in the method of fig. 2.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.
Taking a simple linear regression model (y=w 0+w1x1+w2x2+...+wnxn) as an example, when the linear regression model is used for commodity data prediction, y represents commodity data, (x 1,x2,...,xn) represents commodity characteristics, and (w 0,w1,w2,...,wn) represents characteristic parameters. In the implementation of predicting commodity data by adopting a linear regression model, firstly, reasonable commodity characteristics are required to be obtained as target commodity characteristics and used for constructing a linear regression commodity data prediction model; and then, optimizing the characteristic parameters in the commodity data prediction model. In the prediction, feature data corresponding to the target commodity features, such as feature data of commodity price features, including 5-element, 10-element and the like, are input into the commodity data prediction model to obtain predicted commodity data.
According to the invention, research is carried out on the commodity data predicted by the application linear regression model, and the fact that the commodity characteristics of the commodity of the electronic commerce and the commodity data to be predicted are not in a pure linear relation is found, so that the accuracy of the application of the linear regression model to commodity data prediction is low. Therefore, the research of the invention focuses on how to more accurately select the target commodity characteristics with reasonable linear relation with commodity data to be predicted and how to optimize the characteristic parameters of the target commodity characteristics, so that a linear regression model with high accuracy is formed based on the selected target commodity characteristics and the characteristic parameters thereof.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic structural diagram of a device for establishing a merchandise data prediction model according to the present embodiment; fig. 2 is a flow chart of a method for establishing a commodity data prediction model corresponding to the apparatus of this embodiment. The means for building a commodity data predictive model includes a specific feature derivation module 10, a cross feature derivation module 20, a feature screening module 30, a model construction module 40, and a model training module 50.
Specifically, the specific feature deriving module 10 is configured to perform step S1: and obtaining the original commodity characteristics, and carrying out specific characteristic derivation on the original commodity characteristics to obtain specific derived characteristics.
The original commodity features are commodity features which can be directly obtained through data calling, such as commodity price, commodity inventory, commodity performance parameters and the like. Further, the original merchandise features also include periodic features such as holiday features, weekly features, etc.
And carrying out specific feature derivatization on the original commodity features, namely processing the original commodity features in a specific mode to obtain new features, namely specific derivatization features. Optionally, the specific feature derivation module 10 of the present embodiment includes an inverted feature derivation sub-module, a quadratic feature derivation sub-module, and an abnormal feature derivation sub-module.
Wherein the inversion feature derivation submodule is used for executing the steps of: and carrying out inversion treatment on the original commodity characteristics to obtain inversion derivative characteristics. Inversion derivative features are the inverse of the original commodity features. In a specific implementation, the inverted feature derivation can be performed on all original commodity features, or the inverted feature derivation can be performed on a part of specific original commodity features according to actual requirements, the part of specific original commodity features and the predicted commodity data are in an inverse proportion relation, if the predicted commodity data are commodity sales, and if the original features in the inverse proportion relation with the commodity sales have commodity prices, the commodity prices can be used as specific original commodity features needing the inverted feature derivation.
The power feature derivation submodule is used for executing the steps of: and performing secondary calculation on the original commodity characteristics to obtain secondary derivative characteristics. The index calculated by the power can be correspondingly set according to actual requirements. In specific implementation, the secondary characteristic derivation can be performed on all the original commodity characteristics, or the secondary characteristic derivation can be performed on part of the specific original commodity characteristics according to actual requirements, wherein the part of the specific original commodity characteristics are characteristics with small variation amplitude, in order to highlight the variation amplitude of the original commodity characteristics, the secondary calculation is performed on the part of the specific original commodity characteristics, if the commodity price is small in variation amplitude in most cases and has a lifting amplitude of about 1%, the commodity price can be used as the specific original commodity characteristics needing the secondary characteristic derivation.
The abnormal feature deriving submodule is used for executing the steps of: acquiring original commodity characteristic data of an evaluation period as an evaluation value; acquiring the same original commodity characteristic data of a period before the evaluation period as a reference value; comparing the evaluation value with the reference value, and obtaining the abnormal derivative characteristic according to the comparison result. The abnormality derived feature is used to indicate whether the feature data is an abnormal value within the evaluation period. The evaluation period may be any period of time in which whether the evaluation data is abnormal or not is required, and the evaluation period may be any length of a period of time of several hours, one day, several days, or the like. The previous time period of the evaluation time period is a time period of any length immediately before the evaluation time period, and if the lengths of the evaluation time period and the previous time period and the next time period of the evaluation time period are different, when comparing the evaluation value with the reference value, the comparison needs to be performed by using an average value of uniform unit time.
In an alternative embodiment, the ratio or the difference between the evaluation value and the reference value is a comparison result, that is, the ratio is a feature value of the abnormal derivative feature.
In another alternative embodiment, the ratio or the difference between the evaluation value and the reference value is a comparison result, and if the difference between the evaluation value and the reference value is greater than a preset value, the current abnormal derivative feature indicates that the feature data in the evaluation period is an abnormal value; if the difference between the evaluation value and the reference value is smaller than a preset value, the current abnormal derivative characteristic represents that the characteristic data in the evaluation period is a normal value.
Specifically, the step of deriving the abnormal characteristic can be implemented by jumping a sliding window, so as to implement the derivation of the abnormal characteristic on the original commodity characteristic.
Preferably, before deriving the specific characteristic of the original commodity, the method further comprises the steps of: and cleaning the data of the original commodity characteristics. The method comprises the steps of cleaning through data processing such as duplication elimination, missing data elimination, invalid data modification, data information splitting and the like, so as to extract effective original commodity characteristics.
The cross feature derivation module 20 is configured to perform step S2: and performing cross feature derivatization on the original commodity features and the specific derivatization features to obtain cross derivatization features.
Cross feature derivation, for example, cross feature derivation is performed on the original merchandise features and the specific derivative features A, B, C to obtain cross derivative features AB, AC, BC, ABC.
In a preferred embodiment, the method further comprises the step of, prior to cross-feature derivatizing the original merchandise feature and the particular derivative feature: the zero values of the multiple zero features in the original commodity feature are transformed to near zero values.
Wherein, the multi-zero feature is a feature with most zero values in the feature data, such as a backorder feature, a festival feature, and the like. These features, which are often zero values, are easily determined to be not contributing to commodity data prediction during feature screening and are removed. However, in practice, these features have a relatively large effect on some commodity data, such as the effect of a backout on the sales of a commodity, which will be reduced when the commodity assumes a backout condition; as another example, holidays affect sales of goods, during a holiday, consumer shopping demands increase, which increases sales, and therefore, such multiple zero features need to be preserved to further improve accuracy of goods data prediction.
The near zero value is a near zero non-zero value, such as the power of 10 to-4, and may be specifically set according to the actual implementation, which is not limited in this embodiment. Therefore, zero values of the multi-zero features are replaced by near zero values, after the cross features are derived, the multi-zero features can be reserved in the cross derived features and can be reserved in the commodity data prediction model, and therefore accuracy of commodity data prediction is improved.
The feature screening module 30 is configured to perform step S3: and carrying out feature screening according to the original commodity features and the cross derivative features to obtain target commodity features.
And deleting redundant or useless features from the original commodity features and the cross derivative features through feature screening, and reserving feature combinations contributing to commodity data prediction for constructing a commodity data prediction model. Optional feature screening methods include filtration, encapsulation, embedding, and the like.
The conventional feature filtering method is to progressively screen out features, one feature at a time, remove the features from the model if it is determined that the currently reduced features do not contribute to the commodity data prediction, however, different orders of reducing features will result in different target commodity features of the final model, that is, the order of reducing features will affect the determination of the degree of contribution of features, resulting in difficulty in ensuring the accuracy of the commodity data prediction model. In a preferred embodiment, to further improve accuracy of the commodity data prediction model, feature screening module 30 further includes a first construction sub-module 31a, a first training sub-module 32a, a first evaluation sub-module 33a, a second construction sub-module 31b, a second training sub-module 32b, a second evaluation sub-module 33b, and an evaluation analysis sub-module 34.
Wherein, the first constructing sub-module 31a is configured to execute step S31a: taking the corresponding features of all original commodity features and cross derived features as first feature combinations, and constructing a first prediction model through the first feature combinations;
The first training sub-module 32a is configured to perform step S32a: training a first prediction model;
In a specific implementation, the first prediction model is a linear regression model, please refer to fig. 3 and fig. 4 simultaneously in order to increase the long-term validity of the commodity data prediction model, wherein fig. 3 is a schematic structural diagram of the first training sub-module 32a of the present embodiment, and fig. 4 is a schematic flow diagram of step S32a in the method for establishing the commodity data prediction model of the present embodiment. The first training sub-module 32a includes a first data acquisition sub-module 32a1, a first data dividing sub-module 32a2, a first fitting sub-module 32a3, a first verification sub-module 32a4, and a first convergence module 32a5.
The first data acquisition sub-module 32a1 is configured to execute step S32a1: acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to a first characteristic combination corresponding to the random time length by taking each group of random starting time as a starting point to acquire a plurality of groups of first historical characteristic data;
The random starting time is any day of the effective dates of the historical characteristic data, and the random time length can be set according to practical implementation requirements, so that the random time length is not smaller than a fixed value, such as 60 days, in order to ensure that the training set has enough data quantity. The historical feature data is data corresponding to features in the commodity feature combination.
The first data dividing sub-module 32a2 is configured to execute step S32a2: each group of first historical characteristic data is divided into a first training set, a first testing set and a first evaluating set in time sequence.
In a preferred embodiment, when each set of the first historical feature data is divided into the first training set, the first test set and the first evaluation set in time sequence, the division ratio of the first training set, the first test set and the first evaluation set is a:b:c, wherein a is not less than 50% of each set of the historical feature data, and b is not less than c.
Illustratively, for a random start time of 2022, 1 month and 1 day, a random time length of 120 days, the division ratio a: b: c is 2:1:1, the data of the first training set is first historical characteristic data of 2022, 1 month, 1 day, and 2022, 3 month, 1 day, the data of the first test set is first historical characteristic data of 2022, 3 month, 2 day, and 2022, 4 month, 1 day, and the first evaluation set is first historical characteristic data of 2022, 4 month, 2 day, and 2022, 5 month, 1 day.
The first fitting sub-module 32a3 is configured to perform step S32a3: and fitting through a first training set to obtain characteristic parameters of the first prediction model.
The first verification sub-module 32a4 is configured to perform step S32a4: and verifying the prediction error of the first prediction model through the first test set, and adjusting the super-parameters of the first prediction model according to the prediction error.
The first convergence module 32a5 is configured to perform step S32a5: the first fitting sub-module 32a3 and the first verification sub-module 32a4 are run alternately until the optimal first prediction model characteristic parameters are obtained.
The first evaluation sub-module 33a is configured to perform step S32: and carrying out linear regression processing on the first evaluation set through the trained first prediction model to obtain a first prediction result, and evaluating the first prediction result to obtain a first evaluation value.
The second construction sub-module 31b is configured to perform step S31b: and constructing a missing feature combination missing the feature aiming at each feature in the original commodity feature and the cross derivative feature, and constructing a second prediction model through each missing feature combination.
The second training sub-module 32b is configured to perform step S32b: and constructing a missing feature combination missing the feature aiming at each feature in the original commodity feature and the cross derivative feature, constructing a second prediction model through each missing feature combination, and training each second prediction model.
Referring to fig. 5 and fig. 6, fig. 5 is a schematic structural diagram of the second training sub-module 32b of the present embodiment, and fig. 6 is a schematic flow chart of step S32b in the method for building a merchandise data prediction model of the present embodiment. The second training sub-module 32b includes a second data acquisition sub-module 32b1, a second data dividing sub-module 32b2, a second fitting sub-module 32b3, a second verification sub-module 32b4, and a second convergence module 32b5.
Wherein the second data acquisition sub-module 32b1 is configured to perform step S32b1: acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to missing characteristic combinations corresponding to the random time length by taking each group of random starting time as a starting point to acquire a plurality of groups of second historical characteristic data;
the second data dividing sub-module 32b2 is configured to perform step S32b2: dividing each group of second historical characteristic data into a second training set, a second testing set and a second evaluating set in time sequence;
In a preferred embodiment, when each set of second historical characteristic data is divided into a second training set, a second testing set and a second evaluation set in time sequence, the dividing ratio of the second training set, the second testing set and the second evaluation set is a:b:c, wherein a is not less than 50% of each set of historical characteristic data, and b is not less than c.
The second fitting sub-module 32b3 is configured to perform step S32b3: fitting through a second training set to obtain characteristic parameters of a second prediction model;
the second verification sub-module 32b4 is configured to perform step S32b4: verifying the prediction error of the second prediction model through the second test set, and adjusting the super-parameters of the second prediction model according to the prediction error;
The second convergence module 32b5 is configured to perform step S32b5: the second fitting sub-module 32b3 and the second verification sub-module 32b4 are run alternately until optimal second predictive model feature parameters are obtained.
The second evaluation sub-module 33b is configured to perform step S33b: performing linear regression processing on the second evaluation set through the corresponding trained second prediction model to obtain a plurality of second prediction results, and evaluating each second prediction result to obtain a plurality of second evaluation values;
in the first evaluation sub-module 33a and the second evaluation sub-module 33b, evaluation of the first prediction result and the second prediction result is to evaluate the rationality of the prediction result, and it is generally possible to evaluate by calculating the variance of the prediction result, the learning curve, and the like. For the first prediction model and the second prediction model, the first evaluation value and the second evaluation value respectively represent the prediction capacities of the first prediction model and the corresponding second prediction model.
The evaluation analysis sub-module 34 is configured to perform step S34: and comparing each second evaluation value with the first evaluation value, and determining the characteristics of the target commodity according to the comparison result.
Wherein, compare each second evaluation value with first evaluation value, confirm the step of the characteristic of the target commodity according to the comparison result includes: if the predictive power represented by the second evaluation value is greater than the predictive power represented by the first evaluation value, determining that the missing feature of the feature missing combination corresponding to the second evaluation value is the target commodity feature.
Illustratively, the larger the variance of the prediction result is, the stronger the prediction capability of the corresponding prediction model is, and when evaluating by calculating the variance of the prediction result, the step of determining the target commodity feature according to the comparison result includes: if the second evaluation value is larger than the first evaluation value, determining that the missing feature of the feature missing combination corresponding to the second evaluation value is the target commodity feature.
Therefore, the contribution degree of each feature can be accurately judged by only reducing the plurality of feature combinations of one feature to judge the contribution degree of the reduced feature to commodity data prediction, so that the accuracy of the commodity data prediction model can be improved. And moreover, a plurality of second prediction models can be trained and evaluated at the same time, so that the feature screening efficiency can be improved.
The model construction module 40 is configured to perform step S4: and constructing a commodity data prediction model through the target commodity characteristics.
The model training module 50 is configured to perform step S5: and training the commodity data prediction model.
Further, please refer to fig. 7 and 8, wherein fig. 7 is a schematic structural diagram of the model training module 50 of the present embodiment, and fig. 8 is a flowchart of step S5 in the method for establishing the merchandise data prediction model of the present embodiment. The model building module 40 includes a third data acquisition sub-module 51, a third data partitioning sub-module 52, a third fitting sub-module 53, a third verification sub-module 54, and a third convergence module 55.
The third data acquisition sub-module 51 is configured to execute step S51: acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to target commodity characteristics of the corresponding random time length by taking each group of random starting time as a starting point to acquire a plurality of groups of historical characteristic data;
the third data dividing sub-module 52 is configured to execute step S52: dividing each group of historical characteristic data into a third training set and a third testing set in time sequence;
the third fitting sub-module 53 is configured to perform step S53: fitting through a third training set to obtain characteristic parameters of the commodity data prediction model;
The third verification sub-module 54 is configured to perform step S54: verifying the prediction error of the commodity data prediction model through the third test set, and adjusting the super-parameters of the commodity data prediction model according to the prediction error;
The third convergence module 55 is configured to perform step S55: the steps of the third fitting sub-module 53 and the third verifying sub-module 54 are alternately operated until the optimal commodity data prediction model characteristic parameters are obtained.
Compared with the prior art, the invention leads the original commodity characteristics in inverse proportion relation with the predicted commodity data to be more suitable for the commodity data prediction model of linear regression through inversion characteristic derivation; deriving a variation amplitude of the original commodity features with small variation amplitude through the quadratic features so as to increase the sensitivity of the linear regression commodity data prediction model to the original commodity features; and adding abnormal features through abnormal feature derivation, and marking the abnormal data to assist the commodity data prediction model in distinguishing the abnormal data, so that the prediction accuracy of the commodity data prediction model is improved. Meanwhile, the characteristics with small contribution to commodity data prediction and certain influence can be reserved through cross characteristic derivation, so that the prediction accuracy of the commodity data prediction model is further improved.
Based on the same inventive concept, the present application also provides an electronic device, which may be a terminal device such as a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet computer, a netbook, etc.). The apparatus includes one or more processors and memory, wherein the processors are configured to perform the commodity data prediction method of the program-implemented method embodiment; the memory is used for storing a computer program executable by the processor.
Based on the same inventive concept, the present application also provides a computer-readable storage medium, corresponding to the foregoing embodiments of the commodity data prediction method, having stored thereon a computer program that, when executed by a processor, implements the steps of the commodity data prediction method described in any of the foregoing embodiments.
The present application may take the form of a computer program product embodied on one or more storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention, and the invention is intended to encompass such modifications and improvements.

Claims (9)

1. A method of building a commodity data predictive model comprising the steps of:
Acquiring original commodity characteristics, and carrying out specific characteristic derivation on the original commodity characteristics to obtain specific derived characteristics;
cross feature derivatization is carried out on the original commodity features and the specific derivative features to obtain cross derivative features;
performing feature screening according to the original commodity features and the cross derivative features to obtain target commodity features;
constructing a commodity data prediction model of linear regression through the target commodity characteristics;
Wherein the particular feature derivation comprises at least one of: inversion feature derivation, quadratic feature derivation, and abnormal feature derivation;
The inversion feature derivation includes the steps of: inverting the original commodity characteristics;
The quadratic characteristic derivation includes the steps of: performing a power calculation on the original commodity characteristics;
The abnormal feature derivation includes the steps of: acquiring the original commodity characteristic data of an evaluation period as an evaluation value; acquiring the same original commodity characteristic data of a period before the evaluation period as a reference value; comparing the evaluation value with the reference value, and obtaining the specific derivative characteristic according to a comparison result;
the feature screening is carried out according to the original commodity features and the cross derivative features to obtain target commodity features, and the method comprises the following steps:
taking all the characteristics corresponding to the original commodity characteristics and the cross derivative characteristics as first characteristic combinations, and constructing a first prediction model through the first characteristic combinations;
training the first prediction model;
Performing linear regression processing on a first evaluation set through the trained first prediction model to obtain a first prediction result, and evaluating the first prediction result to obtain a first evaluation value;
Constructing a missing feature combination missing the feature aiming at each feature in the original commodity feature and the cross derivative feature, and constructing a second prediction model through each missing feature combination;
training each second prediction model;
performing linear regression processing on a second evaluation set through the trained second prediction model to obtain a plurality of second prediction results, and evaluating each second prediction result to obtain a plurality of second evaluation values;
and comparing each second evaluation value with the first evaluation value, and determining the target commodity characteristics according to the comparison result.
2. The method according to claim 1, characterized in that: training the first prediction model, comprising the steps of:
Acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to the first characteristic combination of the corresponding random time length by taking each group of random starting time as a starting point to acquire a plurality of groups of first historical characteristic data;
Dividing each group of the first historical characteristic data into a first training set, a first test set and a first evaluation set in time sequence;
fitting through the first training set to obtain characteristic parameters of the first prediction model;
Verifying the prediction error of the first prediction model through the first test set, and adjusting the super-parameters of the first prediction model according to the prediction error;
and alternately executing the steps until the optimal first prediction model characteristic parameters are obtained.
3. The method of claim 1, wherein training each of the second predictive models comprises the steps of:
Acquiring a plurality of groups of random starting time and random time length, and acquiring historical characteristic data corresponding to missing characteristic combinations corresponding to the random time length by taking each group of random starting time as a starting point to acquire a plurality of groups of second historical characteristic data;
dividing each group of the second historical characteristic data into a second training set, a second testing set and a second evaluating set in time sequence;
fitting through the second training set to obtain characteristic parameters of the second prediction model;
Verifying the prediction error of the second prediction model through the second test set, and adjusting the super-parameters of the second prediction model according to the prediction error;
And alternately executing the steps until the optimal second prediction model characteristic parameters are obtained.
4. A method according to claim 2 or 3, characterized in that: the first evaluation value is the variance of the first prediction result, and the second evaluation value is the variance of the second prediction result;
comparing each second evaluation value with the first evaluation value, and determining the target commodity characteristics according to the comparison result comprises the following steps: and if the second evaluation value is larger than the first evaluation value, determining that the missing feature of the feature missing combination corresponding to the second evaluation value is the target commodity feature.
5. The method according to claim 4, wherein: the dividing ratio of the first training set, the first testing set and the first evaluation set is a, b and c, wherein a is not less than 50% of each group of the first historical characteristic data, and b is not less than c;
the dividing ratio of the second training set, the second testing set and the second evaluation set is a:b:c, wherein a is not less than 50% of each group of the second historical characteristic data, and b is not less than c.
6. A method according to any one of claims 1-3, characterized in that before cross-feature deriving the original commodity feature and the specific derived feature, it comprises the steps of:
The zero values of the multiple zero features in the original commodity feature are transformed to near-zero values.
7. An apparatus for creating a predictive model of merchandise data, comprising:
the specific feature deriving module is used for acquiring the original commodity features, and performing specific feature derivation on the original commodity features to obtain specific derived features;
the cross feature deriving module is used for performing cross feature derivation on the original commodity features and the specific derived features to obtain cross derived features;
The feature screening module is used for carrying out feature screening according to the original commodity features and the cross derivative features to obtain target commodity features;
The model construction module is used for constructing a commodity data prediction model of linear regression through the target commodity characteristics;
Wherein the specific feature derivation module comprises at least one of: an inverted feature derivation sub-module, a quadratic feature derivation sub-module, and an abnormal feature derivation sub-module;
the inversion characteristic deriving submodule is used for carrying out inversion treatment on the original commodity characteristics;
the secondary characteristic deriving submodule is used for carrying out secondary calculation on the original commodity characteristics;
The abnormal characteristic deriving sub-module is used for acquiring the original commodity characteristic data of the evaluation period as an evaluation value; acquiring the same original commodity characteristic data of a period before the evaluation period as a reference value; comparing the evaluation value with the reference value, and obtaining the specific derivative characteristic according to a comparison result;
wherein, the feature screening module includes:
the first construction submodule is used for constructing a first prediction model through a first feature combination by taking all features corresponding to the original commodity features and the cross derivative features as the first feature combination;
The first training submodule is used for training the first prediction model;
the first evaluation sub-module is used for carrying out linear regression processing on a first evaluation set through the trained first prediction model to obtain a first prediction result, and evaluating the first prediction result to obtain a first evaluation value;
the second construction submodule is used for constructing a missing feature combination missing the feature aiming at each feature in the original commodity feature and the cross derivative feature, and constructing a second prediction model through each missing feature combination;
the second training submodule is used for training each second prediction model;
the second evaluation sub-module is used for carrying out linear regression processing on a second evaluation set through the trained second prediction model to obtain a plurality of second prediction results, and evaluating each second prediction result to obtain a plurality of second evaluation values;
and the evaluation analysis sub-module is used for comparing each second evaluation value with the first evaluation value and determining the target commodity characteristics according to the comparison result.
8. An electronic device, comprising:
A processor;
A memory for storing a computer program for execution by the processor;
Wherein the processor, when executing the computer program, implements the steps of the method of any of claims 1-6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the method of any of claims 1-6.
CN202310488609.7A 2023-04-28 2023-04-28 Method and device for establishing commodity data prediction model, electronic equipment and storage medium Active CN116703460B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310488609.7A CN116703460B (en) 2023-04-28 2023-04-28 Method and device for establishing commodity data prediction model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310488609.7A CN116703460B (en) 2023-04-28 2023-04-28 Method and device for establishing commodity data prediction model, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116703460A CN116703460A (en) 2023-09-05
CN116703460B true CN116703460B (en) 2024-04-16

Family

ID=87822908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310488609.7A Active CN116703460B (en) 2023-04-28 2023-04-28 Method and device for establishing commodity data prediction model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116703460B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297185A (en) * 2020-02-24 2021-08-24 ***通信有限公司研究院 Feature derivation method and device
CN113553540A (en) * 2020-04-24 2021-10-26 株式会社日立制作所 Commodity sales prediction method
CN114661750A (en) * 2022-03-21 2022-06-24 中国工商银行股份有限公司 Feature derivation method and device, nonvolatile storage medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297185A (en) * 2020-02-24 2021-08-24 ***通信有限公司研究院 Feature derivation method and device
CN113553540A (en) * 2020-04-24 2021-10-26 株式会社日立制作所 Commodity sales prediction method
CN114661750A (en) * 2022-03-21 2022-06-24 中国工商银行股份有限公司 Feature derivation method and device, nonvolatile storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于二次组合的特征工程与XGBoost模型的用户行为预测;杨立洪 等;科学技术与工程(第14期);第191-194页 *

Also Published As

Publication number Publication date
CN116703460A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
WO2021174811A1 (en) Prediction method and prediction apparatus for traffic flow time series
WO2021204176A1 (en) Service data prediction method and apparatus, electronic device, and computer readable storage medium
CN108629436B (en) Method and electronic equipment for estimating warehouse goods picking capacity
Aslam et al. A multiple dependent state repetitive sampling plan for linear profiles
CN110334012B (en) Risk assessment method and device
CN116542747A (en) Product recommendation method and device, storage medium and electronic equipment
CN110188793B (en) Data anomaly analysis method and device
US20140280182A1 (en) Method and system for calculating and charting website performance
CN116703460B (en) Method and device for establishing commodity data prediction model, electronic equipment and storage medium
CN112766536A (en) Model training method, device and terminal for calculating road engineering labor unit price
US6868299B2 (en) Generating a sampling plan for testing generated content
Proaño Detecting and Predicting Economic Accelerations, Recessions, and Normal Growth Periods in Real‐Time
CN116579797A (en) Data analysis method, system, server and storage medium
CN110163470B (en) Event evaluation method and device
CN109376285B (en) Data sorting verification method based on json format, electronic device and medium
Shah et al. Determinants and forecast of price level in India: a VAR Framework
JP2005063208A (en) Software reliability growth model selection method, software reliability growth model selection apparatus, software reliability growth model selection program and program recording medium
CN110991873A (en) Marketing resource adjustment method and device based on fluctuation influence factor
CN114066092A (en) Method and equipment for predicting number of stockholders per day
CN117893273A (en) Financial product recommendation method and related device for financial cabin
CN111144617B (en) Method and device for determining model
CN113393120A (en) Method and device for determining energy consumption data
Bertsche et al. Directed graphs and variable selection in large vector autoregressive models
CN116124218B (en) Transformer fault diagnosis method and device, storage medium and electronic equipment
Zhou et al. A novel criterion based on continuous ranked probability score for performance evaluation of control charts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant