NL2027964B1 - Data resource valuation method for network platform - Google Patents
Data resource valuation method for network platform Download PDFInfo
- Publication number
- NL2027964B1 NL2027964B1 NL2027964A NL2027964A NL2027964B1 NL 2027964 B1 NL2027964 B1 NL 2027964B1 NL 2027964 A NL2027964 A NL 2027964A NL 2027964 A NL2027964 A NL 2027964A NL 2027964 B1 NL2027964 B1 NL 2027964B1
- Authority
- NL
- Netherlands
- Prior art keywords
- data
- model
- index
- evaluation
- network platform
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Entrepreneurship & Innovation (AREA)
- Finance (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Educational Administration (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The present disclosure relates to a data resource valuation method for a network platform. The method includes: 1: constructing a data resource valuation index system under a network platform—based trading environment; 2: determining an evaluation index weight according to an entropy correction G1 method; 3: prescreening, according to a grey correlation analysis (GCA) method, traded data resources on a network platform; and obtaining traded data resources whose degrees of correlation with pre-evaluated data resources are greater than or equal to a threshold, to constitute a model sample set T; and 4: selecting a random forest regression (RFR) model as a basic data resource valuation model of the network platform, and using the model sample set T to construct a data resource valuation model. Evaluation indexes of the pre-evaluated data resources are input into the data resource valuation model, and an average value of output values of all regression trees is used as a result of data resource valuation performed based on the data resource valuation model. The present disclosure not only can remarkably improve accuracy of data resource value prediction, but also can reduce an amount of calculation of the RFR model and improve training efficiency of the RFR model.
Description
TECHNICAL FIELD The present disclosure relates to the field of resource valuation, and in particular, to a grey correlation analysis (GCA)-random forest regression (RFR)-based data resource valuation method for a network platform.
BACKGROUND In the era of data explosion, data functions as records and files for future use. Moreover, multi-source and cross-domain data correlation analysis provides more complete knowledge and implements deeper intelligence, thereby greatly enhancing a prediction function. Openness and circulation of data resources as tradable commodities have increasingly become common cognition and objective demands. It is predicted, based on Report on Development of Big Data in China in 2018 issued by the State Information Center, that the scale of China's big data trading market will reach 73.1 billion yuan in 2020. Under the "Internet Plus" strategy, a network platform has become an important trading channel and medium. Many data trading platforms such as Factual, BDEX, Data Plaza, and Global Big Data Exchange (GBDEx) emerged one after another. Data resources are rising but not standardized. Their value is uncertain to both a data resource provider and a data resource demander because the data resource provider is provided with limited accumulated market trading for reference and the data resource demander cannot get direct experience similar to experience provided by tangible commodities. This results in mismatched supply and demand, and reduces a data trading success ratio and a data value revitalization ratio. Therefore, how to implement accurate data resource valuation plays a key role in transformation from disordered data resource trading to standardized data resource trading. With the continuous development of data trading, some data trading platforms have realized importance of data resource valuation and carried out beneficial exploration. However, on existing data trading markets in China, various network trading platforms still rely on subjective evaluation of experts, and perform one evaluation for one case. This results in low reliability and low transparency of data resource evaluation, and makes it difficult to provide valid value reference for data resource trading parties, failing to gain ideal data trading effects. Existing theoretical research shows that data resource valuation methods include an asset evaluation method, a multi-attribute comprehensive evaluation method, and an economics method. However, these methods are discussed from a perspective of data owners, and are not applicable to network platform—based trading. Some scholars also put forward a research idea of performing artificial intelligence (Al)-based evaluation. At present, using a neural network to construct a data resource valuation model has become a research trend in this field. However, this method lacks sufficient research and empirical tests.
SUMMARY To overcome the disadvantages in the prior art, the present disclosure aims to provide a GCA-RFR-based data resource valuation method for a network platform. In the method, a GCA-RFR-based data resource valuation model is constructed. To achieve the above objective, the present disclosure adopts the following technical solutions: A data resource valuation method for a network platform includes the following steps: step 1: constructing a data resource valuation index system under a network platform-based trading environment; step 2: determining an evaluation index weight according to an entropy correction G1 method; step 3: prescreening, according to a GCA method, traded data resources on a network platform; and obtaining traded data resources whose degrees of correlation with pre-evaluated data resources are greater than or equal to a threshold, to constitute a model sample set T, and step 4: selecting an RFR model as a basic data resource valuation model of the network platform, and using the model sample set T to construct a data resource valuation model. Evaluation indexes of the pre- evaluated data resources are input into the data resource valuation model, and an average value of output values of all regression trees is used as a result of data resource valuation performed based on the data resource valuation model.
The step 1 specifically includes: based on influence factors of data resource value, selecting seven factors as evaluation indexes from perspectives of resources and assets by comprehensively considering a use frequency of the influence factors and availability of selected variables, to construct the data resource valuation index system under the network platform—based trading environment; and the evaluation indexes include an efficiency index, a cost index, and a standardization index, where the efficiency index includes a data scale, a market attention degree, and a data application level, the cost index includes data freshness, the standardization index includes data activity, data exclusiveness, and a data ownership confirmation level, and the data activity, the data exclusiveness, and the data ownership confirmation level are standardized data.
The step 2 specifically includes: sorting, by an expert, the evaluation indexes in the data resource valuation index system based on importance of the evaluation indexes; calculating a sum of information entropy of each evaluation index according to an entropy method; calculating a ratio of importance of adjacent evaluation indexes; and calculating a weight of each evaluation index.
A calculation formula of the sum of the information entropy is as follows: hy = — =, f; Inf; (1).
In the foregoing formula, fj; = 7 where 1sism, and 1=<j<7; when fi; = 0, fi In fj; = 0; m represents the number of traded data resources, and x;; represents the j evaluation index of the i traded data resource; h; represents a sum of information entropy of the j! evaluation index; and Vij represents data standardization, and y;; = in 1 gd A calculation formula of the ratio of the importance of the adjacent evaluation indexes is as follows: b= 0 when h;_; > ky (2) 1whenh;, <h; In the foregoing formula, r; represents a ratio of importance of adjacent evaluation indexes x;.; and x; hj, represents a sum of information entropy of the (j-1)i evaluation index, and h; represents the sum of the information entropy of the ji evaluation index.
A calculation formula of the evaluation index weight is as follows: wi = (1+ 2, Terr) tn =23,j- 1j (3) The step 3 specifically includes: performing data standardization on efficiency indexes and cost indexes of the pre-evaluated data resources and the traded data resources; calculating an absolute difference between an evaluation index of a pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z7;; calculating a two-level minimum difference and a two-level maximum difference; calculating a correlation coefficient of each evaluation index of the pre-evaluated data resource Z, and the corresponding evaluation index of each traded data resource Z;; and calculating degrees of correlation between the pre-evaluated data resources and the traded data resources, and selecting traded data resources with degrees of correlation y; 20.8 to constitute the model sample set T.
Data standardization formulas of the efficiency index y1;; and the cost index y2;; are as follows: xj min xj; yl; = Ta ny TEN 7 (4) max Xj “Zij
VT The standardization index, an efficiency index obtained after data 5 standardization, or a cost index obtained after data standardization is denoted as D. A calculation formula of the absolute difference Z between the evaluation index of the pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z; is as follows: Z=1Do() Di lj = 1,2, 7 (6) Calculation formulas of the two-level minimum difference and the two-level maximum difference are as follows: Z1= min; gigm min | Do) — Dil) 1(7) Z2= max max | Do(j) = Di) 18) A calculation formula of the correlation coefficient of each evaluation index of the pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource 7; is as follows: min min IDo(D)-Di{jl+p ma max iD (DiI In the foregoing formula, p represents a discrimination coefficient and its value is 0.5.
A calculation formula of the degrees of correlation between the pre- evaluated data resources and the traded data resources is as follows: Yi = Zie1 wi sij (10) In the foregoing formula, w; represents the evaluation index weight.
The step 4 specifically includes: setting the number K of regression trees; randomly extracting K training sample sets from the model sample set T according to a Bootstrap resampling method, where a sample set that is not extracted is referred to as out of bag (OOB) data; randomly selecting A (1=A<7) evaluation indexes and performing training to generate the RFR model; using the OOB data as a test sample to estimate an error of the RFR model; adjusting a value of the parameter K, establishing multiple RFR models and calculating a generalization error of each RFR model, and selecting an RFR model with a minimum generalization error as the final data resource valuation model; and inputting the evaluation indexes of the pre-evaluated data resources into the data resource valuation model, and using the average value of the output values of the regression trees as the result of data resource valuation performed based on the final data resource valuation model.
A calculation formula of the average value of the output values of the regression trees is as follows: F(X) = 2% (12) In the foregoing formula, fx represents an output value of each regression tree, and K represents the number of regression trees.
Beneficial effects of the present disclosure:
1. The present disclosure uses Octopus data collection software to crawl, on a big data trading platform, trading data of ten types of data resources on a website, and uses real quantifiable data for empirical tests. This effectively ensures validity and practicability of the evaluation model.
2. The present disclosure demonstrates a relationship between the influence factors of the data resource value and the data resource value. In addition, the selected influence factors of the data resource value are quantifiable indexes. This breaks a dilemma that the data resource valuation index is subjective and difficult to measure.
3. The present disclosure provides an intelligent data resource valuation method based on history market trading conditions. The method is strongly objective, and is more applicable to characteristics of the data resources on the network platform, for example, the number of data resources is huge and the data resource demander is unknown.
4. According to the method in the present disclosure, only the number of regression trees needs to be set for the RFR model, which is different from an intelligent valuation method based on a parameter model such as a neural network or a support vector machine. In the method in the present disclosure, only a few parameters need to be adjusted. In addition, when there are a large number of regression trees, the generalization error of the RFR model is converged, and no overfitting will occur. What's more, the samples are randomly selected, and the feature indexes are random.
This reduces correlation between the regression trees, and provides good generalization performance.
5. Compared with a method in which only the RFR model is used, the present disclosure uses the GCA-RFR model. In the GCA-RFR model, the GCA method is first used to preprocess the traded data resources on the network platform, and screening is performed to obtain data resources whose index sequences are highly similar to index sequences of the pre- evaluated data resources, to constitute sample data sets and train to obtain the RFR model, thereby bringing, into full play, an advantage that the RFR model requires less sample data. This not only can remarkably improve accuracy of data resource value prediction, but also can reduce an amount of calculation of the RFR model and improve training efficiency of the RFR model.
BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure has the following drawings: FIG. 1 is a flowchart of a method according to the present disclosure; and FIG. 2 is flowchart of RFR-based data resource value prediction.
DETAILED DESCRIPTION The following further describes the present disclosure with reference to the accompanying drawings.
As shown in FIG. 1 and FIG. 2, a data resource valuation method for a network platform in the present disclosure includes the following steps: step 1: constructing a data resource valuation index system under a network platform-based trading environment; step 2: determining an evaluation index weight according to an entropy correction G1 method; step 3: prescreening, according to a GCA method, traded data resources on a network platform; and obtaining traded data resources whose degrees of correlation with pre-evaluated data resources are greater than or equal to a threshold, to constitute a model sample set T; and step 4: selecting an RFR model as a basic data resource valuation model of the network platform, and using the model sample set T to construct a data resource valuation model. Evaluation indexes of the pre- evaluated data resources are input into the data resource valuation model, and an average value of output values of all regression trees is used as a result of data resource valuation performed based on the data resource valuation model.
The step 1 specifically includes: based on influence factors of data resource value, selecting seven factors as evaluation indexes from perspectives of resources and assets by comprehensively considering a use frequency of the influence factors and availability of selected variables, to construct the data resource valuation index system under the network platform—based trading environment.
The evaluation indexes include data activity, a data scale, data freshness, data exclusiveness, a data ownership confirmation level, a market attention degree, and a data application level.
An index that is in positive correlation with an evaluation result is referred to as an efficiency index (for example, the data scale, the market attention degree, and the data application level). An index that is in negative correlation with the evaluation result is referred to as a cost index
(for example, the data freshness). The data activity, the data exclusiveness, and the data ownership confirmation level are standardized data (0/1 variable). The step 2 specifically includes: sorting, by an expert, the evaluation indexes x; in the data resource valuation index system based on importance of the evaluation indexes; calculating a sum of information entropy of each evaluation index according to an entropy method; calculating a ratio of importance of adjacent evaluation indexes; and calculating a weight of each evaluation index.
A calculation formula of the sum of the information entropy is as follows: hj = ZEE fi; In fy; (1).
In the foregoing formula, fj; = 7 where 1sism, and 1=<j<7; when fi; = 0, fi In fj; = 0; m represents the number of traded data resources, and x;; represents the j evaluation index of the i traded data resource; h; represents a sum of information entropy of the j! evaluation index; and Vij CL Xjj— min Xj; represents data standardization, and y;; = RE - DAR A calculation formula of the ratio of the importance of the adjacent evaluation indexes is as follows: n= 0 when h;_; > hy 2) 1whenh;_; <h; In the foregoing formula, r; represents a ratio of importance of adjacent evaluation indexes x;.; and x; hj, represents a sum of information entropy of the (j-1)" evaluation index, and h; represents the sum of the information entropy of the jt evaluation index.
A calculation formula of the evaluation index weight is as follows: w= (1+ lr) n=23,..,j— Lj (3) The step 3 specifically includes: performing data standardization on efficiency indexes and cost indexes of the pre-evaluated data resources and the traded data resources; calculating an absolute difference between an evaluation index of a pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z;; calculating a two-level minimum difference and a two-level maximum difference; calculating a correlation coefficient of each evaluation index of the pre-evaluated data resource Z, and the corresponding evaluation index of each traded data resource Z;; and calculating degrees of correlation between the pre-evaluated data resources and the traded data resources, and selecting traded data resources with degrees of correlation y; =the threshold r (a value of r is 0.8 in this specification) to constitute the model sample set T.
Data standardization formulas of the efficiency index y1;; and the cost index y2;; are as follows: Xij—~ Min Xi; max xij —xj 5 VT a The data activity, the data exclusiveness, the data ownership confirmation level, an efficiency index obtained after data standardization, or a cost index obtained after data standardization is denoted as D.
A calculation formula of the absolute difference Z between the evaluation index of the pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z; is as follows: Z=1Do() Di Lj = 12, 7 (6) Calculation formulas of the two-level minimum difference and the two-level maximum difference are as follows: Z1= minim Min | Do(D Di) 1 (7) Z2= max max | Dof) = Di() 1 (8) A calculation formula of the correlation coefficient of each evaluation index of the pre-evaluated data resource Z, and the corresponding evaluation index of each traded data resource Z; is as follows: sj = min BE DoD DIOP a, zj Di (9) Do) -DiG)l+p max max Do ()-Di | In the foregoing formula, p represents a discrimination coefficient, and its value ranges from O to 1 and is 0.5 in the present disclosure.
A calculation formula of the degrees of correlation between the pre- evaluated data resources and the traded data resources is as follows: Yi = Zizi W; s (10) In the foregoing formula, w; represents the evaluation index weight.
The step 4 specifically includes: setting the number K of regression trees; randomly extracting K training sample sets t;, t,, ..., and tk from the model sample set T according to a Bootstrap resampling method, where a sample set that is not extracted is referred to as out of bag (OOB) data; randomly selecting A (1=A<7) evaluation indexes and performing training to generate the RFR model; using the OOB data as a test sample to estimate an error of the RFR model; adjusting a value of the parameter K, establishing multiple RFR models and calculating a generalization error of each RFR model, and selecting an RFR model with a minimum generalization error as the final data resource valuation model; and inputting the evaluation indexes of the pre-evaluated data resources into the data resource valuation model, and using the average value of the output values of the regression trees as the result of data resource valuation performed based on the final data resource valuation model.
When the error of the RFR model is estimated, the OOB data is used as the test sample, and neither cross verification nor another independent test sample set is required.
The evaluation indexes of the pre-evaluated data resources are input into the data resource valuation model. The average value of the output values of the regression trees is calculated according to the following formula, and is used as the result of data resource valuation performed based on the data resource valuation model:
F(X) = 2% (12) In the foregoing formula, fx represents an output value of each regression tree, and K represents the number of regression trees.
The above merely describes preferred examples of the present disclosure, but is not intended to limit the present disclosure. Any modifications, equivalent replacements or improvements made within the spirit and principle of the present disclosure shall fall within the protection scope of the present disclosure.
The content not described in detail in this specification belongs to existing technologies known to those skilled in the art.
Claims (10)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010298734.8A CN111681022A (en) | 2020-04-16 | 2020-04-16 | Network platform data resource value evaluation method |
Publications (2)
Publication Number | Publication Date |
---|---|
NL2027964A NL2027964A (en) | 2021-10-25 |
NL2027964B1 true NL2027964B1 (en) | 2022-06-21 |
Family
ID=72433321
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
NL2027964A NL2027964B1 (en) | 2020-04-16 | 2021-04-14 | Data resource valuation method for network platform |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111681022A (en) |
NL (1) | NL2027964B1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668851B (en) * | 2020-12-21 | 2021-11-02 | 浙江弄潮儿智慧科技有限公司 | Method and system for determining biodiversity protection key area |
CN112686530B (en) * | 2020-12-28 | 2022-07-26 | 贵州电网有限责任公司 | Relay protection operation reliability evaluation method |
CN113128907A (en) * | 2021-05-12 | 2021-07-16 | 北京大学 | Patent value online evaluation method and system |
CN113128911A (en) * | 2021-05-12 | 2021-07-16 | 北京大学 | Online evaluation method and device for data resource value |
CN113128621A (en) * | 2021-05-12 | 2021-07-16 | 北京大学 | Data resource value evaluation report generation method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010035256A (en) * | 2001-01-29 | 2001-05-07 | 이귀영 | Method for appraisal of technology value by using internet web appraisal model |
CN108074115A (en) * | 2016-11-11 | 2018-05-25 | 上海文化广播影视集团有限公司 | A kind of TV programme copyright valve estimating system and its appraisal procedure |
CN108805422A (en) * | 2018-05-24 | 2018-11-13 | 国信优易数据有限公司 | A kind of data assessment model training systems, data assessment platform and method |
-
2020
- 2020-04-16 CN CN202010298734.8A patent/CN111681022A/en active Pending
-
2021
- 2021-04-14 NL NL2027964A patent/NL2027964B1/en active
Also Published As
Publication number | Publication date |
---|---|
CN111681022A (en) | 2020-09-18 |
NL2027964A (en) | 2021-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
NL2027964B1 (en) | Data resource valuation method for network platform | |
Cao et al. | Exploration of stock index change prediction model based on the combination of principal component analysis and artificial neural network | |
CN111785329B (en) | Single-cell RNA sequencing clustering method based on countermeasure automatic encoder | |
Liu et al. | Investment decision making along the B&R using critic approach in probabilistic hesitant fuzzy environment | |
Askari et al. | An integrated method for ranking of risk in BOT projects | |
Gross et al. | Systemic test and evaluation of a hard+ soft information fusion framework: Challenges and current approaches | |
Endres et al. | Synthetic data generation: A comparative study | |
Wang et al. | Clustering multiple time series with structural breaks | |
Tanamal et al. | House price prediction model using random forest in surabaya city | |
Waibel et al. | Clustering and ranking based methods for selecting tuned search heuristic parameters | |
CN113742495B (en) | Rating feature weight determining method and device based on prediction model and electronic equipment | |
CN114529063A (en) | Financial field data prediction method, device and medium based on machine learning | |
Omran et al. | Intelligent decision support system for the Egyptian food security | |
Luo et al. | Towards business interestingness in actionable knowledge discovery | |
Al Habesyah et al. | Sentiment Analysis of TikTok Shop Closure in Indonesia on Twitter Using Supervised Machine Learning | |
Zong-you et al. | The application of cloud matter—Element in information security risk assessment | |
Song et al. | Design of Improved Algorithm and Model for Multi-constrained Fuzzy Predictive Analysis. | |
Sun et al. | Learning local instance correlations for multi-target regression | |
Huang et al. | Prediction of Heart Disease based on Enhanced Random Forest | |
Kars | Predicting Neighborhood Prices: Machine Learning and Hedonic Pricing In The Dutch Housing Market | |
Shen et al. | Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation | |
CN115310720A (en) | Method, device and equipment for predicting use intention of old people on intelligent product | |
Rola et al. | ARIMA Prognostic Application to Bull Services for Resource Usage Optimization | |
Bullah et al. | A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem | |
Gu | Risk prediction of enterprise credit financing using machine learning |