NL2027964B1 - Data resource valuation method for network platform - Google Patents

Data resource valuation method for network platform Download PDF

Info

Publication number
NL2027964B1
NL2027964B1 NL2027964A NL2027964A NL2027964B1 NL 2027964 B1 NL2027964 B1 NL 2027964B1 NL 2027964 A NL2027964 A NL 2027964A NL 2027964 A NL2027964 A NL 2027964A NL 2027964 B1 NL2027964 B1 NL 2027964B1
Authority
NL
Netherlands
Prior art keywords
data
model
index
evaluation
network platform
Prior art date
Application number
NL2027964A
Other languages
Dutch (nl)
Other versions
NL2027964A (en
Inventor
Gao Xia
Li Zifeng
Zhang Jian
Ni Yuan
Gao Yudong
Cai Gongshan
Yang Lu
Original Assignee
Univ Beijing Inf Sci & Tech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Beijing Inf Sci & Tech filed Critical Univ Beijing Inf Sci & Tech
Publication of NL2027964A publication Critical patent/NL2027964A/en
Application granted granted Critical
Publication of NL2027964B1 publication Critical patent/NL2027964B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The present disclosure relates to a data resource valuation method for a network platform. The method includes: 1: constructing a data resource valuation index system under a network platform—based trading environment; 2: determining an evaluation index weight according to an entropy correction G1 method; 3: prescreening, according to a grey correlation analysis (GCA) method, traded data resources on a network platform; and obtaining traded data resources whose degrees of correlation with pre-evaluated data resources are greater than or equal to a threshold, to constitute a model sample set T; and 4: selecting a random forest regression (RFR) model as a basic data resource valuation model of the network platform, and using the model sample set T to construct a data resource valuation model. Evaluation indexes of the pre-evaluated data resources are input into the data resource valuation model, and an average value of output values of all regression trees is used as a result of data resource valuation performed based on the data resource valuation model. The present disclosure not only can remarkably improve accuracy of data resource value prediction, but also can reduce an amount of calculation of the RFR model and improve training efficiency of the RFR model.

Description

DATA RESOURCE VALUATION METHOD FOR NETWORK PLATFORM
TECHNICAL FIELD The present disclosure relates to the field of resource valuation, and in particular, to a grey correlation analysis (GCA)-random forest regression (RFR)-based data resource valuation method for a network platform.
BACKGROUND In the era of data explosion, data functions as records and files for future use. Moreover, multi-source and cross-domain data correlation analysis provides more complete knowledge and implements deeper intelligence, thereby greatly enhancing a prediction function. Openness and circulation of data resources as tradable commodities have increasingly become common cognition and objective demands. It is predicted, based on Report on Development of Big Data in China in 2018 issued by the State Information Center, that the scale of China's big data trading market will reach 73.1 billion yuan in 2020. Under the "Internet Plus" strategy, a network platform has become an important trading channel and medium. Many data trading platforms such as Factual, BDEX, Data Plaza, and Global Big Data Exchange (GBDEx) emerged one after another. Data resources are rising but not standardized. Their value is uncertain to both a data resource provider and a data resource demander because the data resource provider is provided with limited accumulated market trading for reference and the data resource demander cannot get direct experience similar to experience provided by tangible commodities. This results in mismatched supply and demand, and reduces a data trading success ratio and a data value revitalization ratio. Therefore, how to implement accurate data resource valuation plays a key role in transformation from disordered data resource trading to standardized data resource trading. With the continuous development of data trading, some data trading platforms have realized importance of data resource valuation and carried out beneficial exploration. However, on existing data trading markets in China, various network trading platforms still rely on subjective evaluation of experts, and perform one evaluation for one case. This results in low reliability and low transparency of data resource evaluation, and makes it difficult to provide valid value reference for data resource trading parties, failing to gain ideal data trading effects. Existing theoretical research shows that data resource valuation methods include an asset evaluation method, a multi-attribute comprehensive evaluation method, and an economics method. However, these methods are discussed from a perspective of data owners, and are not applicable to network platform—based trading. Some scholars also put forward a research idea of performing artificial intelligence (Al)-based evaluation. At present, using a neural network to construct a data resource valuation model has become a research trend in this field. However, this method lacks sufficient research and empirical tests.
SUMMARY To overcome the disadvantages in the prior art, the present disclosure aims to provide a GCA-RFR-based data resource valuation method for a network platform. In the method, a GCA-RFR-based data resource valuation model is constructed. To achieve the above objective, the present disclosure adopts the following technical solutions: A data resource valuation method for a network platform includes the following steps: step 1: constructing a data resource valuation index system under a network platform-based trading environment; step 2: determining an evaluation index weight according to an entropy correction G1 method; step 3: prescreening, according to a GCA method, traded data resources on a network platform; and obtaining traded data resources whose degrees of correlation with pre-evaluated data resources are greater than or equal to a threshold, to constitute a model sample set T, and step 4: selecting an RFR model as a basic data resource valuation model of the network platform, and using the model sample set T to construct a data resource valuation model. Evaluation indexes of the pre- evaluated data resources are input into the data resource valuation model, and an average value of output values of all regression trees is used as a result of data resource valuation performed based on the data resource valuation model.
The step 1 specifically includes: based on influence factors of data resource value, selecting seven factors as evaluation indexes from perspectives of resources and assets by comprehensively considering a use frequency of the influence factors and availability of selected variables, to construct the data resource valuation index system under the network platform—based trading environment; and the evaluation indexes include an efficiency index, a cost index, and a standardization index, where the efficiency index includes a data scale, a market attention degree, and a data application level, the cost index includes data freshness, the standardization index includes data activity, data exclusiveness, and a data ownership confirmation level, and the data activity, the data exclusiveness, and the data ownership confirmation level are standardized data.
The step 2 specifically includes: sorting, by an expert, the evaluation indexes in the data resource valuation index system based on importance of the evaluation indexes; calculating a sum of information entropy of each evaluation index according to an entropy method; calculating a ratio of importance of adjacent evaluation indexes; and calculating a weight of each evaluation index.
A calculation formula of the sum of the information entropy is as follows: hy = — =, f; Inf; (1).
In the foregoing formula, fj; = 7 where 1sism, and 1=<j<7; when fi; = 0, fi In fj; = 0; m represents the number of traded data resources, and x;; represents the j evaluation index of the i traded data resource; h; represents a sum of information entropy of the j! evaluation index; and Vij represents data standardization, and y;; = in 1 gd A calculation formula of the ratio of the importance of the adjacent evaluation indexes is as follows: b= 0 when h;_; > ky (2) 1whenh;, <h; In the foregoing formula, r; represents a ratio of importance of adjacent evaluation indexes x;.; and x; hj, represents a sum of information entropy of the (j-1)i evaluation index, and h; represents the sum of the information entropy of the ji evaluation index.
A calculation formula of the evaluation index weight is as follows: wi = (1+ 2, Terr) tn =23,j- 1j (3) The step 3 specifically includes: performing data standardization on efficiency indexes and cost indexes of the pre-evaluated data resources and the traded data resources; calculating an absolute difference between an evaluation index of a pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z7;; calculating a two-level minimum difference and a two-level maximum difference; calculating a correlation coefficient of each evaluation index of the pre-evaluated data resource Z, and the corresponding evaluation index of each traded data resource Z;; and calculating degrees of correlation between the pre-evaluated data resources and the traded data resources, and selecting traded data resources with degrees of correlation y; 20.8 to constitute the model sample set T.
Data standardization formulas of the efficiency index y1;; and the cost index y2;; are as follows: xj min xj; yl; = Ta ny TEN 7 (4) max Xj “Zij
VT The standardization index, an efficiency index obtained after data 5 standardization, or a cost index obtained after data standardization is denoted as D. A calculation formula of the absolute difference Z between the evaluation index of the pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z; is as follows: Z=1Do() Di lj = 1,2, 7 (6) Calculation formulas of the two-level minimum difference and the two-level maximum difference are as follows: Z1= min; gigm min | Do) — Dil) 1(7) Z2= max max | Do(j) = Di) 18) A calculation formula of the correlation coefficient of each evaluation index of the pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource 7; is as follows: min min IDo(D)-Di{jl+p ma max iD (DiI In the foregoing formula, p represents a discrimination coefficient and its value is 0.5.
A calculation formula of the degrees of correlation between the pre- evaluated data resources and the traded data resources is as follows: Yi = Zie1 wi sij (10) In the foregoing formula, w; represents the evaluation index weight.
The step 4 specifically includes: setting the number K of regression trees; randomly extracting K training sample sets from the model sample set T according to a Bootstrap resampling method, where a sample set that is not extracted is referred to as out of bag (OOB) data; randomly selecting A (1=A<7) evaluation indexes and performing training to generate the RFR model; using the OOB data as a test sample to estimate an error of the RFR model; adjusting a value of the parameter K, establishing multiple RFR models and calculating a generalization error of each RFR model, and selecting an RFR model with a minimum generalization error as the final data resource valuation model; and inputting the evaluation indexes of the pre-evaluated data resources into the data resource valuation model, and using the average value of the output values of the regression trees as the result of data resource valuation performed based on the final data resource valuation model.
A calculation formula of the average value of the output values of the regression trees is as follows: F(X) = 2% (12) In the foregoing formula, fx represents an output value of each regression tree, and K represents the number of regression trees.
Beneficial effects of the present disclosure:
1. The present disclosure uses Octopus data collection software to crawl, on a big data trading platform, trading data of ten types of data resources on a website, and uses real quantifiable data for empirical tests. This effectively ensures validity and practicability of the evaluation model.
2. The present disclosure demonstrates a relationship between the influence factors of the data resource value and the data resource value. In addition, the selected influence factors of the data resource value are quantifiable indexes. This breaks a dilemma that the data resource valuation index is subjective and difficult to measure.
3. The present disclosure provides an intelligent data resource valuation method based on history market trading conditions. The method is strongly objective, and is more applicable to characteristics of the data resources on the network platform, for example, the number of data resources is huge and the data resource demander is unknown.
4. According to the method in the present disclosure, only the number of regression trees needs to be set for the RFR model, which is different from an intelligent valuation method based on a parameter model such as a neural network or a support vector machine. In the method in the present disclosure, only a few parameters need to be adjusted. In addition, when there are a large number of regression trees, the generalization error of the RFR model is converged, and no overfitting will occur. What's more, the samples are randomly selected, and the feature indexes are random.
This reduces correlation between the regression trees, and provides good generalization performance.
5. Compared with a method in which only the RFR model is used, the present disclosure uses the GCA-RFR model. In the GCA-RFR model, the GCA method is first used to preprocess the traded data resources on the network platform, and screening is performed to obtain data resources whose index sequences are highly similar to index sequences of the pre- evaluated data resources, to constitute sample data sets and train to obtain the RFR model, thereby bringing, into full play, an advantage that the RFR model requires less sample data. This not only can remarkably improve accuracy of data resource value prediction, but also can reduce an amount of calculation of the RFR model and improve training efficiency of the RFR model.
BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure has the following drawings: FIG. 1 is a flowchart of a method according to the present disclosure; and FIG. 2 is flowchart of RFR-based data resource value prediction.
DETAILED DESCRIPTION The following further describes the present disclosure with reference to the accompanying drawings.
As shown in FIG. 1 and FIG. 2, a data resource valuation method for a network platform in the present disclosure includes the following steps: step 1: constructing a data resource valuation index system under a network platform-based trading environment; step 2: determining an evaluation index weight according to an entropy correction G1 method; step 3: prescreening, according to a GCA method, traded data resources on a network platform; and obtaining traded data resources whose degrees of correlation with pre-evaluated data resources are greater than or equal to a threshold, to constitute a model sample set T; and step 4: selecting an RFR model as a basic data resource valuation model of the network platform, and using the model sample set T to construct a data resource valuation model. Evaluation indexes of the pre- evaluated data resources are input into the data resource valuation model, and an average value of output values of all regression trees is used as a result of data resource valuation performed based on the data resource valuation model.
The step 1 specifically includes: based on influence factors of data resource value, selecting seven factors as evaluation indexes from perspectives of resources and assets by comprehensively considering a use frequency of the influence factors and availability of selected variables, to construct the data resource valuation index system under the network platform—based trading environment.
The evaluation indexes include data activity, a data scale, data freshness, data exclusiveness, a data ownership confirmation level, a market attention degree, and a data application level.
An index that is in positive correlation with an evaluation result is referred to as an efficiency index (for example, the data scale, the market attention degree, and the data application level). An index that is in negative correlation with the evaluation result is referred to as a cost index
(for example, the data freshness). The data activity, the data exclusiveness, and the data ownership confirmation level are standardized data (0/1 variable). The step 2 specifically includes: sorting, by an expert, the evaluation indexes x; in the data resource valuation index system based on importance of the evaluation indexes; calculating a sum of information entropy of each evaluation index according to an entropy method; calculating a ratio of importance of adjacent evaluation indexes; and calculating a weight of each evaluation index.
A calculation formula of the sum of the information entropy is as follows: hj = ZEE fi; In fy; (1).
In the foregoing formula, fj; = 7 where 1sism, and 1=<j<7; when fi; = 0, fi In fj; = 0; m represents the number of traded data resources, and x;; represents the j evaluation index of the i traded data resource; h; represents a sum of information entropy of the j! evaluation index; and Vij CL Xjj— min Xj; represents data standardization, and y;; = RE - DAR A calculation formula of the ratio of the importance of the adjacent evaluation indexes is as follows: n= 0 when h;_; > hy 2) 1whenh;_; <h; In the foregoing formula, r; represents a ratio of importance of adjacent evaluation indexes x;.; and x; hj, represents a sum of information entropy of the (j-1)" evaluation index, and h; represents the sum of the information entropy of the jt evaluation index.
A calculation formula of the evaluation index weight is as follows: w= (1+ lr) n=23,..,j— Lj (3) The step 3 specifically includes: performing data standardization on efficiency indexes and cost indexes of the pre-evaluated data resources and the traded data resources; calculating an absolute difference between an evaluation index of a pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z;; calculating a two-level minimum difference and a two-level maximum difference; calculating a correlation coefficient of each evaluation index of the pre-evaluated data resource Z, and the corresponding evaluation index of each traded data resource Z;; and calculating degrees of correlation between the pre-evaluated data resources and the traded data resources, and selecting traded data resources with degrees of correlation y; =the threshold r (a value of r is 0.8 in this specification) to constitute the model sample set T.
Data standardization formulas of the efficiency index y1;; and the cost index y2;; are as follows: Xij—~ Min Xi; max xij —xj 5 VT a The data activity, the data exclusiveness, the data ownership confirmation level, an efficiency index obtained after data standardization, or a cost index obtained after data standardization is denoted as D.
A calculation formula of the absolute difference Z between the evaluation index of the pre-evaluated data resource Zg and the corresponding evaluation index of each traded data resource Z; is as follows: Z=1Do() Di Lj = 12, 7 (6) Calculation formulas of the two-level minimum difference and the two-level maximum difference are as follows: Z1= minim Min | Do(D Di) 1 (7) Z2= max max | Dof) = Di() 1 (8) A calculation formula of the correlation coefficient of each evaluation index of the pre-evaluated data resource Z, and the corresponding evaluation index of each traded data resource Z; is as follows: sj = min BE DoD DIOP a, zj Di (9) Do) -DiG)l+p max max Do ()-Di | In the foregoing formula, p represents a discrimination coefficient, and its value ranges from O to 1 and is 0.5 in the present disclosure.
A calculation formula of the degrees of correlation between the pre- evaluated data resources and the traded data resources is as follows: Yi = Zizi W; s (10) In the foregoing formula, w; represents the evaluation index weight.
The step 4 specifically includes: setting the number K of regression trees; randomly extracting K training sample sets t;, t,, ..., and tk from the model sample set T according to a Bootstrap resampling method, where a sample set that is not extracted is referred to as out of bag (OOB) data; randomly selecting A (1=A<7) evaluation indexes and performing training to generate the RFR model; using the OOB data as a test sample to estimate an error of the RFR model; adjusting a value of the parameter K, establishing multiple RFR models and calculating a generalization error of each RFR model, and selecting an RFR model with a minimum generalization error as the final data resource valuation model; and inputting the evaluation indexes of the pre-evaluated data resources into the data resource valuation model, and using the average value of the output values of the regression trees as the result of data resource valuation performed based on the final data resource valuation model.
When the error of the RFR model is estimated, the OOB data is used as the test sample, and neither cross verification nor another independent test sample set is required.
The evaluation indexes of the pre-evaluated data resources are input into the data resource valuation model. The average value of the output values of the regression trees is calculated according to the following formula, and is used as the result of data resource valuation performed based on the data resource valuation model:
F(X) = 2% (12) In the foregoing formula, fx represents an output value of each regression tree, and K represents the number of regression trees.
The above merely describes preferred examples of the present disclosure, but is not intended to limit the present disclosure. Any modifications, equivalent replacements or improvements made within the spirit and principle of the present disclosure shall fall within the protection scope of the present disclosure.
The content not described in detail in this specification belongs to existing technologies known to those skilled in the art.

Claims (10)

ConclusiesConclusions 1. Werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform, die de volgende stappen omvat: stap 1: het construeren van een gegevensbronnenbeoordelingsindexsysteem in een op een netwerkplatform gebaseerde handelsomgeving; stap 2: het bepalen van een evaluatie-indexweging volgens een G1-entropiecorrectiewerkwijze; stap 3: het vooraf screenen, volgens een werkwijze met grijze-correlatie-analyse (GCA-werkwijze), van verhandelde gegevensbronnen op een netwerkplatform; en het verkrijgen van verhandelde gegevensbronnen waarvan de maten van correlatie met vooraf geëvalueerde gegevensbronnen groter zijn dan of gelijk zijn aan een drempelwaarde, om een modelmonsterreeks T te vormen; en stap 4: het selecteren van een model met willekeurig-bos- regressie (RFR-model) als basis-gegevensbronnenbeoordelingsmodel van het netwerkplatform, het gebruiken van de modelmonsterreeks T voor het construeren van een gegevensbronnenbeoordelingsmodel, het invoeren van evaluatie-indices van de vooraf geëvalueerde gegevensbronnen in het gegevensbronnenbeoordelingsmodel, en het gebruiken van een gemiddelde waarde van uitgangswaarden van alle regressiebomen als resultaat van gegevensbronnenbeoordeling die uitgevoerd is op basis van het gegevensbronnenbeoordelingsmodel.A data source rating method for a network platform comprising the steps of: step 1: constructing a data source rating index system in a network platform based trading environment; step 2: determining an evaluation index weighting according to a G1 entropy correction method; step 3: pre-screening, according to a gray correlation analysis (GCA method), traded data sources on a network platform; and obtaining traded data sources whose degrees of correlation with pre-evaluated data sources are greater than or equal to a threshold value to form a model sample set T; and step 4: selecting a random forest regression (RFR) model as the base data source assessment model of the network platform, using the model sample set T to construct a data source rating model, entering evaluation indices of the predefined evaluated data sources in the data source assessment model, and using an average value of baseline values of all regression trees as a result of data source assessment performed based on the data source assessment model. 2. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 1, waarbij stap 1 specifiek omvat: het selecteren, op basis van invloedfactoren van gegevensbronwaarde, van zeven factoren als evaluatie-indices vanuit het perspectief van middelen en activa door het omvattend in acht nemen van een gebruiksfrequentie van de invloedfactoren en beschikbaarheid van geselecteerde variabelen, voor het construeren van het gegevensbronnenbeoordelingsindexsysteem in de op een netwerkplatform gebaseerde handelsomgeving; en waarbij de evaluatie-indices een efficiéntie-index, een kostenindex, en een standaardiseringsindex omvatten, waarbij de efficiëntie-index een gegevensschaal, een mate van marktaandacht, en een gegevenstoepassingsniveau omvat, de kostenindex versheid van gegevens omvat, de standaardiseringsindex gegevensactiviteit, gegevensexclusiviteit, en een gegevenseigendombevestigingsniveau omvat, en waarbij de gegevensactiviteit, de gegevensexclusiviteit, en het gegevenseigendombevestigingsniveau gestandaardiseerde gegevens zijn.The data source assessment method for a network platform according to claim 1, wherein step 1 specifically comprises: selecting, based on data source value influencing factors, seven factors as evaluation indices from the perspective of resources and assets by including taking a frequency of use of the influencing factors and availability of selected variables, to construct the data source rating index system in the network platform based trading environment; and wherein the evaluation indices include an efficiency index, a cost index, and a standardization index, wherein the efficiency index includes a data scale, a degree of market attention, and a data application level, the cost index includes data freshness, the standardization index includes data activity, data exclusivity, and a data ownership assertion level, and wherein the data activity, data exclusivity, and data ownership assertion level are standardized data. 3. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 2, waarbij stap 2 specifiek omvat: het sorteren, door een expert, van de evaluatie-indices in het gegevensbronnenbeoordelingsindexsysteem op basis van belangrijkheid van de evaluatie-indices; het berekenen van een som van informatie- entropie van iedere evaluatie-index volgens een entropiewerkwijze; het berekenen van een verhouding van belangrijkheid van aangrenzende evaluatie-indices; en het berekenen van een gewicht van iedere evaluatie- index.The data resource rating method for a network platform according to claim 2, wherein step 2 specifically comprises: sorting, by an expert, the rating indices in the data resource rating index system based on importance of the rating indices; calculating a sum of information entropy of each evaluation index according to an entropy method; calculating a ratio of importance of adjacent evaluation indices; and calculating a weight of each evaluation index. 4. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 3, waarbij een formule voor het berekenen van de som van de informatie-entropie als volgt is: hy = — == 3, fj Inf; (1) waarbij fj; = Se waarbij 1Sism, en 1<j<7; wanneer fj; = 0, fy; In f;; = 0; m staat voor het aantal verhandelde gegevensbronnen, en x;; staat voor de j% evaluatie-index van de i% verhandelde gegevensbron; hy staat voor een som van informatie-entropie van de j% evaluatie-index; en oo Xjj— Min Xj; yi; staat voor standaardisering van gegevens, en y; = a je in en 1s1sm 1sism een formule voor het berekenen van de verhouding van de belangrijkheid van de aangrenzende evaluatie-indices als volgt is: Bim wanneer h;_; > h; =|n (2) 1 wanneer h; ; < hj waarbij r; staat voor een verhouding van belangrijkheid van aangrenzende evaluatie-indices x;_, en x;, h;_, staat voor een som van informatie-entropie van de (j-1)% evaluatie-index, en h; staat voor de som van de informatie-entropie van de j% evaluatie-index.The data resource assessment method for a network platform according to claim 3, wherein a formula for calculating the sum of the information entropy is as follows: hy = — == 3, fj Inf; (1) where fj; = Se where 1Sism, and 1<j<7; when fj; = 0, fy; in f;; = 0; m represents the number of data sources traded, and x;; represents the j% evaluation index of the i% traded data source; hy represents a sum of information entropy of the j% evaluation index; and oo Xjj— Min Xj; yi; stands for standardization of data, and y; = a je in and 1s1sm 1sism a formula for calculating the importance ratio of the adjacent evaluation indices is as follows: Bim when h;_; > h; =|n (2) 1 when h; † < hj where r; represents a ratio of importance of adjacent evaluation indices x;_, and x;, h;_, represents a sum of information entropy of the (j-1)% evaluation index, and h; represents the sum of the information entropy of the j% evaluation index. 5. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 4, waarbij een formule voor het berekenen van de evaluatie-indexweging als volgt is: w= (+3 nr) n=23..,j-1j(3)The data resource evaluation method for a network platform according to claim 4, wherein a formula for calculating the evaluation index weighting is as follows: w= (+3 no) n=23..,j-1j(3) 6. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 5, waarbij stap 3 specifiek omvat: het uitvoeren van standaardisering van gegevens op efficiëntie-indices en kostenindices van de vooraf geëvalueerde gegevensbronnen en de verhandelde gegevensbronnen; het berekenen van een absoluut verschil tussen een evaluatie-index van een vooraf geëvalueerde gegevensbron Z, en de overeenkomstige evaluatie-index van iedere verhandelde gegevensbron Z;; het berekenen van een minimaal verschil op twee niveaus en een maximaal verschil op twee niveaus; het berekenen van een correlatiecoëfficiënt van iedere evaluatie-index van de vooraf geëvalueerde gegevensbron Zg en de overeenkomstige evaluatie-index van iedere verhandelde gegevensbron Z;; en het berekenen van maten van correlatie tussen de vooraf geëvalueerde gegevensbronnen en de verhandelde gegevensbronnen, en het selecteren van verhandelde gegevensbronnen met maten van correlatie y; 20,8 om de modelmonsterreeks T te vormen.The data resource assessment method for a network platform according to claim 5, wherein step 3 specifically comprises: performing standardization of data on efficiency indices and cost indices of the pre-evaluated data sources and the traded data sources; calculating an absolute difference between an evaluation index of a pre-evaluated data source Z, and the corresponding evaluation index of each traded data source Z; calculating a minimum two-level difference and a maximum two-level difference; calculating a correlation coefficient of each evaluation index of the pre-evaluated data source Zg and the corresponding evaluation index of each traded data source Z;; and calculating measures of correlation between the pre-evaluated data sources and the traded data sources, and selecting traded data sources having measures of correlation y; 20.8 to form the model sample set T. 7. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 6, waarbij formules voor standaardisering van gegevens van de efficiëntie-index y1; en de kostenindex yZ;; als volgt zijn: xj” min xj Vij = is min (4) isism isism V2ij _ an ij ij (5) Ym, min, x waarbij de standaardiseringsindex, een efficiéntie-index die is verkregen na standaardisering van gegevens, of een kostenindex die is verkregen na standaardisering van gegevens wordt aangeduid als D, en een formule voor het berekenen van het absolute verschil Z tussen de evaluatie-index van de vooraf geëvalueerde gegevensbron Zg en de overeenkomstige evaluatie-index van iedere verhandelde gegevensbron Z; als volgt is: Z=1Dy(G) Dil j= 12,7 (6) waarbij formules voor het berekenen van het minimale verschil op twee niveaus en het maximale verschil op twee niveaus als volgt zijn: Z1= Min;<i<m min | Do) — Dy) 1 (7) 1sjs7 Z2= max max | Do(j) —D;(j) | (8); en waarbij een formule voor het berekenen van de correlatiecoéfficiént van iedere evaluatie-index van de vooraf geëvalueerde gegevensbron Z, en de overeenkomstige evaluatie-index van iedere verhandelde gegevensbron Z; als volgt is: min min IDo(j)-D;{j)l+p max maxiDe()-Di(§ w= RE i1<i<m 1<j<7 waarbij p staat voor een onderscheidingscoéfficiént en de waarde ervan 0,5 bedraagt.The data resource assessment method for a network platform according to claim 6, wherein efficiency index data standardization formulas y1; and the cost index yZ;; are: xj” min xj Vij = is min (4) isism isism V2ij _ an ij ij (5) Ym, min, x where the standardization index, an efficiency index obtained after standardizing data, or a cost index that obtained after standardization of data is denoted as D, and a formula for calculating the absolute difference Z between the evaluation index of the pre-evaluated data source Zg and the corresponding evaluation index of each traded data source Z; is as follows: Z=1Dy(G) Dil j= 12.7 (6) where formulas for calculating the minimum two-level difference and the maximum two-level difference are as follows: Z1= Min;<i<m min | Do) — Dy) 1 (7) 1sjs7 Z2= max max | Do(j) —D;(j) | (8); and wherein a formula for calculating the correlation coefficient of each evaluation index of the pre-evaluated data source Z, and the corresponding evaluation index of each traded data source Z; is as follows: min min IDo(j)-D;{j)l+p max maxiDe()-Di(§ w= RE i1<i<m 1<j<7 where p stands for a coefficient of distinction and its value 0.5 is. 8. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 7, waarbij een formule voor het berekenen van de maten van correlatie tussen de vooraf geévalueerde gegevensbronnen en de verhandelde gegevensbronnen als volgt is: vi = Ziz1 W; sg (10)The data resource assessment method for a network platform according to claim 7, wherein a formula for calculating the measures of correlation between the pre-evaluated data resources and the traded data resources is as follows: vi = Ziz1 W; sg (10) waarbij w; staat voor de evaluatie-indexweging.where w; stands for the evaluation index weighting. 9. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 1, waarbij stap 4 specifiek omvat: het instellen van het aantal regressiebomen K; het willekeurig extraheren van K trainingsmonsterreeksen uit de modelmonsterreeks T volgens een bootstrap-bemonsteringswerkwijze, waarbij een monsterreeks die niet wordt geëxtraheerd aangeduid wordt als out-of-bag-gegevens (OOB- gegevens); het willekeurig selecteren van A (15As7/) evaluatie-indices en het uitvoeren van training om het RFR-model te genereren; het gebruiken van de OOB-gegevens als testmonster voor het schatten van een fout van het RFR-model; het bijstellen van een waarde van de parameter K, het opstellen van meerdere RFR-modellen en het berekenen van een veralgemeningsfout van ieder RFR-model, en het selecteren van een RFR-model met een kleinste veralgemeningsfout als het finale gegevensbronnenbeoordelingsmodel; en het invoeren van de evaluatie- indices van de vooraf geëvalueerde gegevensbronnen in het gegevensbronnenbeoordelingsmodel, en het gebruiken van de gemiddelde waarde van de uitgangswaarden van de regressiebomen als resultaat van gegevensbronnenbeoordeling die uitgevoerd is op basis van het finale gegevensbronnenbeoordelingsmodel.The data resource assessment method for a network platform according to claim 1, wherein step 4 specifically comprises: setting the number of regression trees K; randomly extracting training sample sets K from the model sample set T according to a bootstrap sampling method, wherein a sample set that is not extracted is referred to as out-of-bag data (OOB data); randomly selecting A (15As7/) evaluation indices and performing training to generate the RFR model; using the OOB data as a test sample to estimate an error of the RFR model; adjusting a value of the parameter K, preparing a plurality of RFR models and calculating a generalization error of each RFR model, and selecting an RFR model with the least generalization error as the final data source assessment model; and entering the evaluation indices of the pre-evaluated data sources into the data source evaluation model, and using the average value of the output values of the regression trees as a result of data source evaluation performed based on the final data source evaluation model. 10. De werkwijze voor beoordeling van gegevensbronnen voor een netwerkplatform volgens conclusie 9, waarbij een formule voor het berekenen van de gemiddelde waarde van de uitgangswaarden van de regressiebomen als volgt is: F(x) = is (12) waarbij fg staat voor een uitgangswaarde van iedere regressieboom, en K staat voor het aantal regressiebomen.The data source assessment method for a network platform according to claim 9, wherein a formula for calculating the mean value of the output values of the regression trees is as follows: F(x) = is (12) where fg represents an output value of each regression tree, and K represents the number of regression trees.
NL2027964A 2020-04-16 2021-04-14 Data resource valuation method for network platform NL2027964B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010298734.8A CN111681022A (en) 2020-04-16 2020-04-16 Network platform data resource value evaluation method

Publications (2)

Publication Number Publication Date
NL2027964A NL2027964A (en) 2021-10-25
NL2027964B1 true NL2027964B1 (en) 2022-06-21

Family

ID=72433321

Family Applications (1)

Application Number Title Priority Date Filing Date
NL2027964A NL2027964B1 (en) 2020-04-16 2021-04-14 Data resource valuation method for network platform

Country Status (2)

Country Link
CN (1) CN111681022A (en)
NL (1) NL2027964B1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668851B (en) * 2020-12-21 2021-11-02 浙江弄潮儿智慧科技有限公司 Method and system for determining biodiversity protection key area
CN112686530B (en) * 2020-12-28 2022-07-26 贵州电网有限责任公司 Relay protection operation reliability evaluation method
CN113128907A (en) * 2021-05-12 2021-07-16 北京大学 Patent value online evaluation method and system
CN113128911A (en) * 2021-05-12 2021-07-16 北京大学 Online evaluation method and device for data resource value
CN113128621A (en) * 2021-05-12 2021-07-16 北京大学 Data resource value evaluation report generation method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010035256A (en) * 2001-01-29 2001-05-07 이귀영 Method for appraisal of technology value by using internet web appraisal model
CN108074115A (en) * 2016-11-11 2018-05-25 上海文化广播影视集团有限公司 A kind of TV programme copyright valve estimating system and its appraisal procedure
CN108805422A (en) * 2018-05-24 2018-11-13 国信优易数据有限公司 A kind of data assessment model training systems, data assessment platform and method

Also Published As

Publication number Publication date
CN111681022A (en) 2020-09-18
NL2027964A (en) 2021-10-25

Similar Documents

Publication Publication Date Title
NL2027964B1 (en) Data resource valuation method for network platform
Cao et al. Exploration of stock index change prediction model based on the combination of principal component analysis and artificial neural network
CN111785329B (en) Single-cell RNA sequencing clustering method based on countermeasure automatic encoder
Liu et al. Investment decision making along the B&R using critic approach in probabilistic hesitant fuzzy environment
Askari et al. An integrated method for ranking of risk in BOT projects
Gross et al. Systemic test and evaluation of a hard+ soft information fusion framework: Challenges and current approaches
Endres et al. Synthetic data generation: A comparative study
Wang et al. Clustering multiple time series with structural breaks
Tanamal et al. House price prediction model using random forest in surabaya city
Waibel et al. Clustering and ranking based methods for selecting tuned search heuristic parameters
CN113742495B (en) Rating feature weight determining method and device based on prediction model and electronic equipment
CN114529063A (en) Financial field data prediction method, device and medium based on machine learning
Omran et al. Intelligent decision support system for the Egyptian food security
Luo et al. Towards business interestingness in actionable knowledge discovery
Al Habesyah et al. Sentiment Analysis of TikTok Shop Closure in Indonesia on Twitter Using Supervised Machine Learning
Zong-you et al. The application of cloud matter—Element in information security risk assessment
Song et al. Design of Improved Algorithm and Model for Multi-constrained Fuzzy Predictive Analysis.
Sun et al. Learning local instance correlations for multi-target regression
Huang et al. Prediction of Heart Disease based on Enhanced Random Forest
Kars Predicting Neighborhood Prices: Machine Learning and Hedonic Pricing In The Dutch Housing Market
Shen et al. Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation
CN115310720A (en) Method, device and equipment for predicting use intention of old people on intelligent product
Rola et al. ARIMA Prognostic Application to Bull Services for Resource Usage Optimization
Bullah et al. A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem
Gu Risk prediction of enterprise credit financing using machine learning