CN115983877A - Patent value evaluation method based on depth map and semantic learning - Google Patents

Patent value evaluation method based on depth map and semantic learning Download PDF

Info

Publication number
CN115983877A
CN115983877A CN202310027211.3A CN202310027211A CN115983877A CN 115983877 A CN115983877 A CN 115983877A CN 202310027211 A CN202310027211 A CN 202310027211A CN 115983877 A CN115983877 A CN 115983877A
Authority
CN
China
Prior art keywords
value
index
indexes
sea
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310027211.3A
Other languages
Chinese (zh)
Inventor
孙玉涛
刘嘉莹
杨祥君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310027211.3A priority Critical patent/CN115983877A/en
Publication of CN115983877A publication Critical patent/CN115983877A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of patent evaluation, and provides a patent value evaluation method based on a depth map and semantic learning. In the index screening process, the patent assignment and the construction of a patent value evaluation index system are combined, and an objective fair and strong-operability evaluation method is provided for feature selection. Secondly, the novelty of the patent is calculated through text semantic learning, and the patent value is measured from the semantic perspective. And further utilizing deep graph learning to maximize the information integration node feature representation between the local representation and the global representation, and evaluating the patent value by combining an XGboost algorithm. The method breaks through the defects of the traditional method in the problem of patent value evaluation, and simultaneously introduces the novelty of a patent text to measure the value of the patent. The experimental result shows that the method has higher accuracy and reliability. The invention provides a new method for evaluating patent value and simultaneously provides a new solution for the research of patent value.

Description

Patent value evaluation method based on depth map and semantic learning
Technical Field
The invention belongs to the technical field of patent evaluation, and particularly relates to a patent value evaluation method based on a depth map and semantic learning.
Background
The high-value patent is a hot word of high attention in the industry, the cultivation of the high-value patent becomes an era consensus for innovating and driving high-quality development, and the national intellectual property competent department takes the cultivation of the high-value patent and the improvement of the patent quality as one of the key tasks. Therefore, how to evaluate the patent value and identify high-value patents becomes a key problem which needs to be solved urgently at present. However, with the deep advancement and implementation of intellectual property strategies, the number of patents in China has been greatly increased, and the conventional patent value evaluation method gradually fails to meet the requirement of evaluating the value of a large number of patents to be evaluated. Therefore, constructing a patent value evaluation model suitable for a big data background, and quickly and effectively identifying high-value patents from a large number of patents becomes a key problem for improving the development quality of innovation.
The current research related to patent value mainly explores the influence factors of the patent value from a single index , such as "Hall B , market value and patent indications [ J ]. The Rand Journal of Economics , 2005 , 36 (1): 16-38" , "Lerner J , the impedance of patent scope, an empirical analysis [ J ]. The Rand Journal of economics , 1994 , 25 , 319-333." , "HarhoffD , scherer F M , vopel K , family size , the position and the value of the patents [ J ]. Research Policy , 2003 , 32 (8) 'and' LanjouwJ O , schema M.patent quality and research production, measuringinnovationwith multiple indicators [ J ]. Economic Journal , 2004 , 114 (495): 441-465." , or evaluating patent value by multiple indexes , such as "Wan Xiao Li , evaluation index system of vermilion patent value and fuzzy comprehensive evaluation [ J ] scientific research management , 2008 (02): 185-191." , song river hair , murongping , patent quality and its measuring method and measuring index system research [ J ]. Scientific and scientific technical management , 2010 , 31 (04): 21-27. And Guo Lei , cai rainbow , situation analysis of industrial core patents under situation of patent strategy , 2016 , 34 (11): 1663-1671+1757."). . For example, hall and the like firstly put forward and utilize the value of patents which are frequently introduced to react, and Lerner research finds that the technical range related to the patents has obvious influence on the patent value, but the methods are difficult to objectively reflect the economic value of the patents; secondly, many existing researches focus on evaluating the value of patents by means of patent indexes, for example, the patent is quoted, patent litigation and the like, and all the minds and the like establish an index system comprising 17 indexes such as innovation degree, technical content and the like by means of hierarchical analysis and fuzzy comprehensive evaluation, and provide a new idea for evaluating the patent value in a qualitative and quantitative combined mode; research such as Guo Lei finds that there is a significant forward relationship between the right width, the technical range and the self-priming behavior and the patent value, but it can be found that all indexes in the research are characteristic information of patents, indexes and index weights related in the model are different, and the academic community does not agree with the selection of the indexes. Meanwhile, the text information of the patent is an important factor reflecting the novelty of the patent, and the semantic novelty is not considered in the prior research. Therefore, researchers are required to provide a patent value evaluation method which can effectively fuse multiple indexes and measure patent values from the semantic perspective.
Disclosure of Invention
The present invention addresses the deficiencies of the prior art, and (4) providing a patent value evaluation method by combining the characteristics of the patent. The patent features are screened firstly, and then the semantic novelty of the patent is evaluated by combining deep semantic learning. Meanwhile, in order to effectively fuse external indexes and semantic information, the expression of the nodes is maximally learned based on mutual information, local information of the nodes and global information of the network are reserved, and finally, the value of the patent is estimated by combining an XGboost algorithm. The invention provides a big data oriented patent economic value evaluation method by using semantic learning and deep map learning for the first time.
The technical scheme of the invention is as follows: a comprehensive evaluation value comprehensive evaluation model which effectively integrates multiple indexes and semantic novelty is established through an existing patent data set, and the comprehensive evaluation model is applied to the patent data set to be evaluated to predict the patent value. The method comprises the following steps:
step 1, acquiring the reference relation between the attribute characteristics of the patent and the patent, and constructing a patent reference network;
step 2, determining a sea election index for evaluating the patent value and a criterion layer to which the index belongs by taking the transferred patent as a standard of the high economic value patent;
the method for constructing the criterion layer of the sea election indexes of the sea election index patent value assessment of the patent value assessment comprises the following steps: technical indexes, cited indexes, IPC indexes, internationalization indexes, time indexes, right indexes and patentee indexes; the construction of the sea election index is shown in table 1;
TABLE 1 Standard layer and sea selection index system table
Figure BDA0004045100600000031
Step 3, screening the sea selection indexes for patent value evaluation based on a K-S method and constructing an index system for patent value evaluation;
step 3.1, standardizing the sea election index data of patent value evaluation;
the data standardization processing is to adopt a maximum value-minimum value standardization method to process the sample data of the sea election index of the patent value evaluation and eliminate the influence of dimension;
step 3.2, calculating a single index D value;
calculating the maximum value of the accumulated frequency difference value of the assigned patents and the unassigned patents corresponding to the sea election indexes of the patent value evaluation in the existing patent data set to obtain the K-S test statistic D value of the sea election indexes of the patent value evaluation;
step 3.3, calculating the index correlation coefficient in the same criterion layer;
calculating a correlation coefficient between any two indexes in the same criterion layer, determining an index pair reflecting repeated information in the candidate indexes for patent value evaluation, and deleting the index with a small D value from the index pair with the correlation coefficient more than 0.7 to complete the first screening of the candidate indexes for patent value evaluation; forming an index system by the remaining K sea selection indexes for patent value evaluation;
step 3.4, calculating the economic value score of the patent;
weighting the sea-choosing indexes of the residual patent value evaluation according to the K-S test statistic D value, and ensuring that the indexes with larger D values have larger weights; calculating the economic value score of the patent in a linear weighting mode; calculating the sea election index weight of patent value evaluation by using the formula (1):
Figure BDA0004045100600000041
calculating a patent economic value score by using a formula (2):
Figure BDA0004045100600000042
wherein, wjA candidate index weight for the jth patent value assessment; djThe value of K-S test statistic D of j index; k is the number of highly selected indices needed to give weighted patent value assessments: k =1,2, \ 8230; k is the number of sea election indexes for evaluating the residual patent value after the first screening; z is the score of the economic value of the patent; x is the number ofjThe normalized value of the sea election index for the j patent value evaluation of the patent to be evaluated;
step 3.5, calculating a K-S test statistic D value of the index system;
calculating a K-S test statistic D value of the patent economic value score obtained by an index system by analogy with the calculation of a sea-choosing index D value of single value evaluation;
step 3.6, after calculating an index system D value formed by the sea election indexes of the remaining K patent value evaluations after the first screening, deleting the sea election index of the patent value evaluation in sequence, calculating the maximum value of the D values in the sea election index combinations of the remaining K-1 patent value evaluations, comparing the change of the D values before and after deleting the sea election index of the patent value evaluation, and deleting the sea election index of the patent value evaluation when the sea election index of the patent value evaluation is deleted and the D value of the remaining index combination is larger than that before deleting;
step 3.7, the step 3.6 is circulated until after any one of the candidate indexes of the patent value evaluation is deleted, the D values of the combination of the remaining indexes are all smaller than the D value before the candidate index of the patent value evaluation is deleted, at this moment, the deletion of the candidate indexes of the patent value evaluation is stopped, and the secondary screening of the candidate indexes of the patent value evaluation is completed; the remaining sea election indexes of the patent value assessment are the sea election index combination of the optimal patent value assessment;
step 4, calculating the semantic novelty of the patent, which comprises the following steps;
step 4.1, establishing a corpus set T = { T ] according to the invention name and abstract of the patent1,t2,…,tiWhere t isiThe method is characterized in that the method is a text information set of a patent i, namely a text consisting of an invention name and a patent specification abstract; the unique column vector of the paragraph vector matrix V represents the text paragraph of each patent, and the unique column vector of the word vector matrix W represents each word in the text paragraph of the patent;
step 4.2, predicting text paragraph t according to the unique column vector in the paragraph vector matrix and the word vector matrix, namely the average value of the text paragraph and the wordiObtaining the text paragraph representation and the word representation according to the probability of the occurrence of the next word; according to a training word sequence w1,w2,…,w|T|And paragraph viThe following objectives are maximized under a fixed length window win:
Figure BDA0004045100600000051
where M is the number of all training words, viIs a text paragraph representation vector containing the context word of the current window; the prediction task is performed by hierarchical softmax:
Figure BDA0004045100600000052
wherein N iswIs the total number of words in the training word sequence, pr is the output logarithmic probability, and the calculation formula is:
Pr=Ua(wt-|win|,...,wt-1,wt+1,…,wt+|win|,vi;W,V)+b (5)
wherein U and b are softmax parameters, and a is represented by wtAnd viAveraging, using the PV-DM model in the underlying space RkRepresenting a text paragraph of each patent by vectorization to obtain a text characterization matrix V of the final patent;
step 4.3, calculating the Euclidean distance between the text paragraph characterization vector of the patent and the text paragraph characterization vector of the patent cited by the text paragraph characterization vector:
Figure BDA0004045100600000061
step 4.4, summarizing Euclidean distances between all patent citation pairs | R | in the patent citation network, ranking, and calculating semantic novelty S of the patenti
Figure BDA0004045100600000062
Step 5, generating a node feature matrix based on the sea selection index combination of the optimal patent value evaluation obtained in the step 3 and the semantic novelty calculated in the step 4
Figure BDA0004045100600000063
Wherein n is1= | V |, establish patent cited adjacency matrix
Figure BDA0004045100600000064
Saving reference information between nodes, using an encoder @>
Figure BDA0004045100600000065
Figure BDA0004045100600000066
Acquiring a final node feature representation, comprising the steps of:
step 5.1, inputting a node feature matrix X, and acquiring local representation of nodes in the positive sample through neighborhood information of an epsilon integration target node of a graph convolution network; the information integration process comprises the following steps:
Figure BDA0004045100600000067
wherein the content of the first and second substances,
Figure BDA0004045100600000068
is->
Figure BDA0004045100600000069
Degree matrix of (H)lIs a feature representation learned for each layer; w is a group oflIs the learning parameter of the l-th layer in the convolutional neural network; for input layer l =0, H0= X, σ is a non-linear activation function;
step 5.2, using the function
Figure BDA00040451006000000610
Nodes in the convolutional neural network are modified to obtain negative samples, the same information integration method as in step 5.1 is used to generate node local representations &forthe negative samples>
Figure BDA00040451006000000611
Step 5.3, passing the transfer function
Figure BDA00040451006000000615
Passing a local representation of a node in positive samples hiComputing a network global representation:
Figure BDA00040451006000000612
wherein N represents the number of positive samples;
step 5.4, use the arbiter
Figure BDA00040451006000000613
Distinguishing local positive and negative sample representations:
Figure BDA00040451006000000614
/>
step 5.5, minimizing the final loss function LnUpdating the final representation h of each patent node in the generated positive samplei
Figure BDA0004045100600000071
Wherein N isnIs the number of negative samples;
Figure BDA0004045100600000072
is a negative sample representation; s is the network global representation; (ii) a E(.)[.]The expression function [.]The expected value of (d); />
Figure BDA0004045100600000073
Represents the logarithmic value of equation (10);
step 6, predicting the patent value; finally, the patent nodes are input into a machine learning XGboost model to predict the value of the patent, and a grading prediction result is obtained
Figure BDA0004045100600000074
For a certain patent sample i, inputting the final expression h of the patent nodeiObtaining a prediction result, wherein the calculation formula is as follows:
Figure BDA0004045100600000075
wherein f iskThe K decision tree in the XGboost model, where K is the number of trees in the model, fk(hi) Indicating the predicted value of patent sample i on the kth tree.
The invention has the beneficial effects that: the invention provides a patent value evaluation method based on a depth map and semantic learning. In the index screening process, the patent assignment and the construction of a patent value evaluation index system are combined, and an objective fair and strong-operability evaluation method is provided for feature selection. Secondly, the novelty of the patent is calculated through text semantic learning, and the patent value is measured from the semantic perspective. And further utilizing depth map learning to maximize the information integration node feature representation between the local representation and the global representation, and evaluating the patent value. The method breaks through the defects of the traditional method in the problem of patent value evaluation, and simultaneously introduces the novelty of a patent text to measure the value of the patent. The experimental result shows that the method has higher accuracy and reliability. The invention provides a new method for evaluating patent value and simultaneously provides a new solution for the research of patent value.
Drawings
FIG. 1 is a flow chart of a patent value evaluation method based on depth map and semantic learning according to the present invention.
FIG. 2 is a flowchart of index screening.
Detailed Description
The following further describes the specific embodiments of the present invention with reference to the drawings and technical solutions.
In this embodiment, 2209 biopharmaceutical field patents with the publication time of more than 5 years are taken as examples, and the index and criterion layer with the publication time of more than 5 years are used for constructing a patent value evaluation model and verifying the validity of the model. 1473 patent samples are selected for constructing a value evaluation model, 736 patent samples are selected for patent value evaluation and verification of effectiveness of the evaluation model, and the implementation steps of the technical scheme of the invention are as follows:
1. and constructing a patent citation network according to the real patent publication and citation information.
2. And selecting a sea election index and constructing a criterion layer according to the characteristics of different patent indexes in the publication time.
3. And (4) carrying out standardization processing on the index data of the patent sample by a maximum-minimum standardization method, and eliminating the influence of dimensions.
4. And calculating the value D of the statistic D of the K-S test of the single index.
The distinguishing capability of the index on the patent transfer state is measured through the size of the sea election index D value, and the larger the index D value is, the larger the difference degree of the transferred patent and the non-transferred patent on the index is, namely, the more the state whether the patent is transferred or not can be identified through the index. The following describes the calculation procedure of the single index D value, taking the index "number of pages in the specification" as an example. For convenience of understanding, it is assumed that the standardized value of "number of specification pages" is 1,0.5,0.
(4.1) each index value of the 'specification page number' corresponds to one or more patents, the patents with the same index value form a patent group, and the patent groups are arranged in a descending order according to the value of the index value of the 'specification page number'. Are listed in table 2, line 2, and table 2, line 1, the number of the patent group.
(4.2) the number of assigned patents and the number of unassigned patents in each patent group are calculated and listed in line 3 and line 4 of Table 2, respectively.
And (4.3) calculating the number of the assigned patents and the number of the unassigned patents in each accumulated patent group.
The patent group with the highest index value is used as the first accumulated patent group, and then the patent group with the lower index value is accumulated each time, namely the first two patent groups form the second accumulated patent group, and the first three patent groups form the third accumulated patent group. The number of patents assigned and the number of patents not assigned to each accumulated patent group are calculated and listed in the 5 th row and the 6 th row of table 2, respectively.
And (4.4) calculating the accumulated patent frequency and the accumulated patent frequency in each accumulated patent group.
The cumulative frequency of assigned patents is obtained by dividing the number of assigned patents accumulated in row 5 of table 2 by the total number of assigned patents accumulated in the last column of row 5 of table 2, and is listed in row 7 of table 2. Similarly, the cumulative frequency of the unassigned patents is obtained by dividing the cumulative number of unassigned patents by the total number of unassigned patents, and is listed in line 8 of table 2.
(4.5) calculating the difference d between the cumulative frequency of patents assigned and the cumulative frequency of patents not assigned in each cumulative patent group, d = | cumulative frequency of patents assigned — cumulative frequency of patents not assigned | each of which is listed in line 9 of table 2.
And (4.6) determining the value of the K-S test statistic D of the single index.
The K-S test statistic D value is the maximum value of the difference D between the cumulative frequency of assigned patents and the cumulative frequency of assigned patents, i.e., D = max (D), and the obtained D value is listed in row 10 of table 2.
TABLE 2 calculation of the D value of the K-S test statistic
Figure BDA0004045100600000091
5. Deleting indexes reflecting repeated information, and performing first screening of indexes
And calculating a correlation coefficient between any two indexes in the same criterion layer, and deleting the index with a small D value in the index pair with the correlation coefficient more than 0.7, so that information redundancy of an index system is avoided, and the index with strong capacity of distinguishing and transferring by mistakenly deleting is also avoided. The calculation formula of the correlation coefficient between the index q and the index j is as follows:
Figure BDA0004045100600000101
wherein r isqjA correlation coefficient representing the qth index and the jth index; x is the number ofiqIs the q index value of the i patent;
Figure BDA0004045100600000105
represents the q index average; x is the number ofijIs the j index value of the ith patent; />
Figure BDA0004045100600000102
Is the average of the j-th index.
Through correlation analysis, 9 indexes such as 'number of cited patents in the country' and 'number of cited foreign patents' are deleted altogether, and the remaining 20 indexes are deleted in an index system with a patent publication time of more than 5 years.
6. Empowering indexes based on D values
And (3) giving weight to the index according to the idea that the larger the value of the transfer distinguishing capability K-S test statistic D of the index is, the larger the index weight is. The empowerment formula is:
Figure BDA0004045100600000103
wherein, wjIs the weight of the jth index; djThe value D of K-S test statistic of the jth index represents the transfer distinguishing capability of the index; k is the number of indices to be assigned, k =1,2, \ 8230;, 20.
7. Patent calculation value score
Calculating the economic value score of the patent by a linear weighting mode, wherein the weighting formula is as follows:
Figure BDA0004045100600000104
wherein Z is a patent value score; w is ajIs the weight of the jth index; k requires the number of entitled indicators, k =1,2, \ 8230;, 20; x is the number ofjIs the normalized value of the j index of the patent to be evaluated.
8. And calculating the D value of the patent value score, and carrying out secondary screening on the index system.
(8.1) calculating D of the rating index system consisting of the remaining 20 indexes after the first screening20
According to the calculation method of the D value of the single index, D of the patent value scores of 20 index composition systems is calculated20The value is obtained. Wherein D20The calculation of (2) is similar to the calculation of the D value of a single index, and when the data is brought in, the standardized value of the single index needs to be replaced by the patent value score.
(8.2) determination of maximum value
Figure BDA0004045100600000111
After obtaining 20 indexes D20After the value is obtained, one index is sequentially removed, and the residue is calculated19 indexes are combined into a system
Figure BDA0004045100600000112
Value, 20 index combinations, and 20 removed indexes are selected>
Figure BDA0004045100600000113
In (1) maximum value->
Figure BDA0004045100600000114
(8.3) screening out an index system with strong patent assignment distinguishing capability D value.
When D is present20In the meantime, it is explained that the index system consisting of 19 indexes left after one index is removed from 20 indexes becomes stronger in the ability to distinguish the assigned patent from the non-assigned patent. Thus, a 19-index rating system is retained.
(8.4) repeating the step (2) and the step (3), and continuing to delete the index until the index is deleted
Figure BDA0004045100600000115
In the meantime, the screening of the index is stopped.
Figure BDA0004045100600000116
And after one index in the k indexes is arbitrarily removed, the distinguishing capability of an index system consisting of the remaining k-1 indexes for the patent transfer is weakened, and at the moment, the index system of the k indexes is reserved, and the index screening is terminated.
After the second index screening, 9 indexes such as IPC (International patent medicine) subclass number, figure number and the like are deleted in an index system with the patent publication time of more than 5 years, and the rest 11 indexes are deleted, so that the index system formed by the rest indexes is the index system with strong patent transfer distinguishing capability.
9. Calculating the semantic novelty of the patent.
(9.1) establishing a corpus set T = { T) according to the invention name and abstract of the patent1,t2,…,tiWhere t isiIs the text information set of patent i. The unique column vector of matrix V represents each paragraph of text and the unique column vector of matrix W represents each word in the sentence. The following objectives are maximized under a fixed length window win:
Figure BDA0004045100600000117
where M is the number of all training words, viIs a document representation vector containing the context words of the current window. The probability of the next word occurrence in the document is predicted using hierarchical softmax:
Figure BDA0004045100600000121
the log probability of each paper output was calculated:
Pr=Ua(wt-|win|,...,wt-1,wt+1,…,wt+|win|,vi;W,V)+b
wherein U and b are softmax parameters, and a is represented by wiAnd djAveraging, using the PV-DM model in the underlying space RkAnd obtaining a text characterization matrix V of the patent through vectorization.
(9.2) calculating the distance between the vector of the patent and the vector of the patent it refers to:
Figure BDA0004045100600000122
(9.3) summarizing and ranking the distances between all citation pairs, calculating the semantic novelty score S of the patenti
Figure BDA0004045100600000123
10. Generating a node feature matrix based on semantic novelty of screening indexes and calculation
Figure BDA0004045100600000124
Wherein n is1= V |, establishing a matrix>
Figure BDA0004045100600000125
Saving reference information between nodes using an encoder>
Figure BDA0004045100600000126
Acquiring the final node characteristic representation, comprising the following steps:
(10.1) inputting a feature matrix X, and integrating neighborhood information of a target node through a graph convolution network epsilon to obtain the node representation in the positive sample:
Figure BDA0004045100600000127
wherein
Figure BDA0004045100600000128
Is->
Figure BDA0004045100600000129
Degree matrix of (H)lIs a feature representation learned for each layer.
(10.2) use function
Figure BDA00040451006000001210
Modifying a node in the network to obtain a negative sample, generating a representation ^ for the negative sample in the same way as in step (10.1)>
Figure BDA00040451006000001211
(10.3) passing the transfer function
Figure BDA00040451006000001214
Passing a local node representation, computing a network global representation:
Figure BDA00040451006000001212
where N represents the number of positive samples.
(10.4) use of the discriminator
Figure BDA00040451006000001213
Expressed by distinguishing local positive and negative samples: />
Figure BDA0004045100600000131
(10.5) calculating the final loss function:
Figure BDA0004045100600000132
wherein N isnIs the number of negative samples.
(10.6) minimizing the loss function, generating a representation h of each patent nodei
11. And (5) predicting the patent value. And (4) inputting the patent node representation into a value prediction model XGboost to obtain a grading prediction result. For a certain sample i, its feature representation h is inputiObtaining a prediction result, wherein the calculation formula is as follows:
Figure BDA0004045100600000133
wherein f iskIs the k-th decision tree.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any insubstantial changes and substitutions made by those skilled in the art based on the present invention are included in the scope of the present invention claimed in the claims.

Claims (1)

1. A patent value evaluation method based on depth map and semantic learning is characterized by comprising the following steps:
step 1, acquiring the reference relation between the attribute characteristics of the patent and the patent, and constructing a patent reference network;
step 2, determining a sea election index for evaluating the patent value and a criterion layer to which the index belongs by taking the transferred patent as a standard of the high economic value patent;
the method for constructing the criterion layer of the sea election indexes of the sea election index patent value assessment of the patent value assessment comprises the following steps: technical indexes, cited indexes, IPC indexes, internationalization indexes, time indexes, right indexes and patentee indexes; the construction of the sea election index is shown in table 1;
TABLE 1 Standard layer and sea selection index system table
Figure FDA0004045100590000011
Step 3, screening the sea selection indexes for patent value evaluation based on a K-S method and constructing an index system for patent value evaluation;
step 3.1, standardizing the sea election index data of patent value evaluation;
the data standardization processing is to adopt a maximum value-minimum value standardization method to process the sample data of the sea election index of the patent value evaluation and eliminate the influence of dimension;
step 3.2, calculating a single index D value;
calculating the maximum value of the accumulated frequency difference value of the assigned patents and the non-assigned patents corresponding to the sea election indexes of the patent value evaluation in the existing patent data set to obtain the K-S test statistic D value of the sea election indexes of the patent value evaluation;
step 3.3, calculating index correlation coefficients in the same criterion layer;
calculating a correlation coefficient between any two indexes in the same criterion layer, determining an index pair reflecting repeated information in the sea election indexes of the patent value evaluation, deleting the index with a small D value from the index pair with the correlation coefficient larger than 0.7, and finishing the first screening of the sea election indexes of the patent value evaluation; forming an index system by the remaining K marine selection indexes for patent value evaluation;
step 3.4, calculating the economic value score of the patent;
weighting the sea-choosing indexes of the residual patent value evaluation according to the K-S test statistic D value, and ensuring that the indexes with larger D values have larger weights; calculating the economic value score of the patent in a linear weighting mode; calculating the sea election index weight of patent value evaluation by using the formula (1):
Figure FDA0004045100590000021
calculating the patent economic value score by using the formula (2):
Figure FDA0004045100590000022
wherein, wjSelecting the index weight for the jth patent value evaluation; djThe value of K-S test statistic D of j index; k is the number of highly selected indices needed to give weighted patent value assessments: k =1,2, \ 8230; k is the number of sea election indexes for evaluating the residual patent value after the first screening; z is the score of the economic value of the patent; x is the number ofjThe normalized value of the sea election index for the j patent value evaluation of the patent to be evaluated;
step 3.5, calculating a K-S test statistic D value of the index system;
calculating a K-S test statistic D value of the patent economic price value score obtained by an index system by analogy with the calculation of the sea selection index D value of single value evaluation;
step 3.6, after calculating an index system D value formed by the sea election indexes of the remaining K patent value evaluations after the first screening, deleting the sea election index of the patent value evaluation in sequence, calculating the maximum value of the D values in the sea election index combinations of the remaining K-1 patent value evaluations, comparing the change of the D values before and after deleting the sea election index of the patent value evaluation, and deleting the sea election index of the patent value evaluation when the sea election index of the patent value evaluation is deleted and the D value of the remaining index combination is larger than that before deleting;
step 3.7, the step 3.6 is circulated until after any one of the candidate indexes of the patent value evaluation is deleted, the D values of the combination of the remaining indexes are all smaller than the D value before the candidate index of the patent value evaluation is deleted, at this moment, the deletion of the candidate indexes of the patent value evaluation is stopped, and the secondary screening of the candidate indexes of the patent value evaluation is completed; the remaining sea election indexes of the patent value assessment are the sea election index combination of the optimal patent value assessment;
step 4, calculating the semantic novelty of the patent, which comprises the following steps;
step 4.1, establishing a corpus set T = { T ] according to the invention name and abstract of the patent1,t2,…,tiIn which tiThe method is a text information set of the patent i, namely a text consisting of the invention name and the abstract of the patent specification; the unique column vector of the paragraph vector matrix V represents the text paragraph of each patent, and the unique column vector of the word vector matrix W represents each word in the text paragraph of the patent;
step 4.2, predicting text paragraph t according to the unique column vector in the paragraph vector matrix and the word vector matrix, namely the average value of the text paragraph and the wordiObtaining text paragraph representation and word representation according to the probability of the occurrence of the next word; according to a training word sequence w1,w2,…,w|T|And paragraph viThe following objectives are maximized under a fixed length window win:
Figure FDA0004045100590000031
where M is the number of all training words, viIs a text paragraph representation vector containing the context word of the current window; the prediction task is performed by hierarchical softmax:
Figure FDA0004045100590000041
wherein N iswIs the total number of words in the training word sequence, pr is the output log probability, and the calculation formula is:
Pr=Ua(wt-|win|,...,wt-1,wt+1,…,wt+|win|,vi;W,V)+b (5)
where U and b are softmax parameters and a is a parameter represented by wtAnd viAveraged using the PV-DM model in the underlying space RkRepresenting a text paragraph of each patent by vectorization to obtain a text characterization matrix V of the final patent;
step 4.3, calculating the Euclidean distance between the text paragraph characterization vector of the patent and the text paragraph characterization vector of the patent cited by the text paragraph characterization vector:
Figure FDA0004045100590000042
step 4.4, summarizing Euclidean distances between all patent citation pairs | R | in the patent citation network, ranking, and calculating semantic novelty S of the patenti
Figure FDA0004045100590000043
Step 5, generating a node feature matrix based on the sea selection index combination of the optimal patent value evaluation obtained in the step 3 and the semantic novelty calculated in the step 4
Figure FDA0004045100590000044
Wherein n is1= | V |, sets up patent cited adjacency matrix
Figure FDA0004045100590000045
Saving reference information between nodes, using an encoder @>
Figure FDA0004045100590000046
Figure FDA0004045100590000047
Obtaining a final node feature representation, comprising the steps of:
step 5.1, inputting a node feature matrix X, and acquiring local representation of nodes in the positive sample through neighborhood information of an epsilon integration target node of a graph convolution network; the information integration process comprises the following steps:
Figure FDA0004045100590000048
wherein the content of the first and second substances,
Figure FDA0004045100590000049
Figure FDA00040451005900000410
is->
Figure FDA00040451005900000411
Degree matrix of (H)lIs a feature representation learned for each layer; w is a group oflIs the learning parameter of the l-th layer in the convolutional neural network; for input layer l =0, H0= X, σ is a non-linear activation function;
step 5.2, using the function
Figure FDA0004045100590000051
Nodes in the convolutional neural network are modified to obtain negative samples, the same information integration method as in step 5.1 is used to generate node local representations &forthe negative samples>
Figure FDA0004045100590000052
Step 5.3, transferring the node local expression h in the positive sample through the transfer function RiComputing a network global representation:
Figure FDA0004045100590000053
wherein N represents the number of positive samples;
step 5.4, use the arbiter
Figure FDA0004045100590000054
Distinguishing local positive and negative sample representations:
Figure FDA0004045100590000055
step 5.5, minimizing the final loss function LnUpdating the final representation h of each patent node in the generated positive samplei
Figure FDA0004045100590000056
Wherein N isnIs the number of negative samples;
Figure FDA0004045100590000057
is a negative sample representation; s is the network global representation; (ii) a E(.)[.]The expression function [.]The expected value of (a); />
Figure FDA0004045100590000058
Represents the logarithmic value of equation (10);
step 6, predicting the patent value; finally, the patent nodes are input into a machine learning XGboost model to predict the value of the patent, and a grading prediction result is obtained
Figure FDA0004045100590000059
For a certain patent sample i, inputting the final expression h of the patent nodeiObtaining a prediction result, wherein the calculation formula is as follows:
Figure FDA00040451005900000510
wherein f iskThe K decision tree in the XGboost model, where K is the number of trees in the model, fk(hi) Indicating the predicted value of patent sample i on the kth tree.
CN202310027211.3A 2023-01-09 2023-01-09 Patent value evaluation method based on depth map and semantic learning Pending CN115983877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310027211.3A CN115983877A (en) 2023-01-09 2023-01-09 Patent value evaluation method based on depth map and semantic learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310027211.3A CN115983877A (en) 2023-01-09 2023-01-09 Patent value evaluation method based on depth map and semantic learning

Publications (1)

Publication Number Publication Date
CN115983877A true CN115983877A (en) 2023-04-18

Family

ID=85963010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310027211.3A Pending CN115983877A (en) 2023-01-09 2023-01-09 Patent value evaluation method based on depth map and semantic learning

Country Status (1)

Country Link
CN (1) CN115983877A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776868A (en) * 2023-08-25 2023-09-19 北京知呱呱科技有限公司 Evaluation method of model generation text and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776868A (en) * 2023-08-25 2023-09-19 北京知呱呱科技有限公司 Evaluation method of model generation text and computer equipment
CN116776868B (en) * 2023-08-25 2023-11-03 北京知呱呱科技有限公司 Evaluation method of model generation text and computer equipment

Similar Documents

Publication Publication Date Title
TWI689871B (en) Gradient lifting decision tree (GBDT) model feature interpretation method and device
CN112948541B (en) Financial news text emotional tendency analysis method based on graph convolution network
CN111080117A (en) Method and device for constructing equipment risk label, electronic equipment and storage medium
CN115983877A (en) Patent value evaluation method based on depth map and semantic learning
CN114169869A (en) Attention mechanism-based post recommendation method and device
CN112527769B (en) Automatic quality assurance framework for software change log generation method
Tiruneh et al. Feature selection for construction organizational competencies impacting performance
Wang et al. Evaluation of the survival of Yangtze finless porpoise under probabilistic hesitant fuzzy environment
CN111291189B (en) Text processing method and device and computer readable storage medium
CN111105041B (en) Machine learning method and device for intelligent data collision
CN113516189B (en) Website malicious user prediction method based on two-stage random forest algorithm
CN115345248A (en) Deep learning-oriented data depolarization method and device
Okagbue et al. Predicting access mode of multidisciplinary and library and information sciences journals using machine learning
Lv et al. An empirical study of factors influencing entrepreneurship using fuzzy logic: based on provincial panel data
Dubois et al. Measuring the expertise of workers for crowdsourcing applications
CN113112166A (en) Equipment state variable selection method and equipment based on gray fuzzy hierarchical analysis
Syafiandini et al. Classification of Indonesian Government Budget Appropriations or Outlays for Research and Development (GBAORD) using decision tree and naive bayes
CN115470332B (en) Intelligent question-answering system for content matching based on matching degree
CN117573814B (en) Public opinion situation assessment method, device and system and storage medium
Omondiagbe et al. Evaluating simple and complex models’ performance when predicting accepted answers on stack overflow
CN108376261B (en) Tobacco classification method based on density and online semi-supervised learning
Anastasopoulos et al. Computational text analysis for public management research
CN116956027A (en) Employee portrait updating method, device, equipment and storage medium
Rahmawati et al. Classification and Regression Trees (CART) Algorithm for Employee Selection
CN117556118A (en) Visual recommendation system and method based on scientific research big data prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination