CN115983877A

CN115983877A - Patent value evaluation method based on depth map and semantic learning

Info

Publication number: CN115983877A
Application number: CN202310027211.3A
Authority: CN
Inventors: 孙玉涛; 刘嘉莹; 杨祥君
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-04-18

Abstract

The invention belongs to the technical field of patent evaluation, and provides a patent value evaluation method based on a depth map and semantic learning. In the index screening process, the patent assignment and the construction of a patent value evaluation index system are combined, and an objective fair and strong-operability evaluation method is provided for feature selection. Secondly, the novelty of the patent is calculated through text semantic learning, and the patent value is measured from the semantic perspective. And further utilizing deep graph learning to maximize the information integration node feature representation between the local representation and the global representation, and evaluating the patent value by combining an XGboost algorithm. The method breaks through the defects of the traditional method in the problem of patent value evaluation, and simultaneously introduces the novelty of a patent text to measure the value of the patent. The experimental result shows that the method has higher accuracy and reliability. The invention provides a new method for evaluating patent value and simultaneously provides a new solution for the research of patent value.

Description

Patent value evaluation method based on depth map and semantic learning

Technical Field

The invention belongs to the technical field of patent evaluation, and particularly relates to a patent value evaluation method based on a depth map and semantic learning.

Background

The high-value patent is a hot word of high attention in the industry, the cultivation of the high-value patent becomes an era consensus for innovating and driving high-quality development, and the national intellectual property competent department takes the cultivation of the high-value patent and the improvement of the patent quality as one of the key tasks. Therefore, how to evaluate the patent value and identify high-value patents becomes a key problem which needs to be solved urgently at present. However, with the deep advancement and implementation of intellectual property strategies, the number of patents in China has been greatly increased, and the conventional patent value evaluation method gradually fails to meet the requirement of evaluating the value of a large number of patents to be evaluated. Therefore, constructing a patent value evaluation model suitable for a big data background, and quickly and effectively identifying high-value patents from a large number of patents becomes a key problem for improving the development quality of innovation.

The current research related to patent value mainly explores the influence factors of the patent value from a single index , such as "Hall B , market value and patent indications [ J ]. The Rand Journal of Economics , 2005 , 36 (1): 16-38" , "Lerner J , the impedance of patent scope, an empirical analysis [ J ]. The Rand Journal of economics , 1994 , 25 , 319-333." , "HarhoffD , scherer F M , vopel K , family size , the position and the value of the patents [ J ]. Research Policy , 2003 , 32 (8) 'and' LanjouwJ O , schema M.patent quality and research production, measuringinnovationwith multiple indicators [ J ]. Economic Journal , 2004 , 114 (495): 441-465." , or evaluating patent value by multiple indexes , such as "Wan Xiao Li , evaluation index system of vermilion patent value and fuzzy comprehensive evaluation [ J ] scientific research management , 2008 (02): 185-191." , song river hair , murongping , patent quality and its measuring method and measuring index system research [ J ]. Scientific and scientific technical management , 2010 , 31 (04): 21-27. And Guo Lei , cai rainbow , situation analysis of industrial core patents under situation of patent strategy , 2016 , 34 (11): 1663-1671+1757."). . For example, hall and the like firstly put forward and utilize the value of patents which are frequently introduced to react, and Lerner research finds that the technical range related to the patents has obvious influence on the patent value, but the methods are difficult to objectively reflect the economic value of the patents; secondly, many existing researches focus on evaluating the value of patents by means of patent indexes, for example, the patent is quoted, patent litigation and the like, and all the minds and the like establish an index system comprising 17 indexes such as innovation degree, technical content and the like by means of hierarchical analysis and fuzzy comprehensive evaluation, and provide a new idea for evaluating the patent value in a qualitative and quantitative combined mode; research such as Guo Lei finds that there is a significant forward relationship between the right width, the technical range and the self-priming behavior and the patent value, but it can be found that all indexes in the research are characteristic information of patents, indexes and index weights related in the model are different, and the academic community does not agree with the selection of the indexes. Meanwhile, the text information of the patent is an important factor reflecting the novelty of the patent, and the semantic novelty is not considered in the prior research. Therefore, researchers are required to provide a patent value evaluation method which can effectively fuse multiple indexes and measure patent values from the semantic perspective.

Disclosure of Invention

The present invention addresses the deficiencies of the prior art, and (4) providing a patent value evaluation method by combining the characteristics of the patent. The patent features are screened firstly, and then the semantic novelty of the patent is evaluated by combining deep semantic learning. Meanwhile, in order to effectively fuse external indexes and semantic information, the expression of the nodes is maximally learned based on mutual information, local information of the nodes and global information of the network are reserved, and finally, the value of the patent is estimated by combining an XGboost algorithm. The invention provides a big data oriented patent economic value evaluation method by using semantic learning and deep map learning for the first time.

The technical scheme of the invention is as follows: a comprehensive evaluation value comprehensive evaluation model which effectively integrates multiple indexes and semantic novelty is established through an existing patent data set, and the comprehensive evaluation model is applied to the patent data set to be evaluated to predict the patent value. The method comprises the following steps:

step 1, acquiring the reference relation between the attribute characteristics of the patent and the patent, and constructing a patent reference network;

step 2, determining a sea election index for evaluating the patent value and a criterion layer to which the index belongs by taking the transferred patent as a standard of the high economic value patent;

the method for constructing the criterion layer of the sea election indexes of the sea election index patent value assessment of the patent value assessment comprises the following steps: technical indexes, cited indexes, IPC indexes, internationalization indexes, time indexes, right indexes and patentee indexes; the construction of the sea election index is shown in table 1;

TABLE 1 Standard layer and sea selection index system table

Step 3, screening the sea selection indexes for patent value evaluation based on a K-S method and constructing an index system for patent value evaluation;

step 3.1, standardizing the sea election index data of patent value evaluation;

the data standardization processing is to adopt a maximum value-minimum value standardization method to process the sample data of the sea election index of the patent value evaluation and eliminate the influence of dimension;

step 3.2, calculating a single index D value;

calculating the maximum value of the accumulated frequency difference value of the assigned patents and the unassigned patents corresponding to the sea election indexes of the patent value evaluation in the existing patent data set to obtain the K-S test statistic D value of the sea election indexes of the patent value evaluation;

step 3.3, calculating the index correlation coefficient in the same criterion layer;

calculating a correlation coefficient between any two indexes in the same criterion layer, determining an index pair reflecting repeated information in the candidate indexes for patent value evaluation, and deleting the index with a small D value from the index pair with the correlation coefficient more than 0.7 to complete the first screening of the candidate indexes for patent value evaluation; forming an index system by the remaining K sea selection indexes for patent value evaluation;

step 3.4, calculating the economic value score of the patent;

weighting the sea-choosing indexes of the residual patent value evaluation according to the K-S test statistic D value, and ensuring that the indexes with larger D values have larger weights; calculating the economic value score of the patent in a linear weighting mode; calculating the sea election index weight of patent value evaluation by using the formula (1):

calculating a patent economic value score by using a formula (2):

wherein, w_jA candidate index weight for the jth patent value assessment; d_jThe value of K-S test statistic D of j index; k is the number of highly selected indices needed to give weighted patent value assessments: k =1,2, \ 8230; k is the number of sea election indexes for evaluating the residual patent value after the first screening; z is the score of the economic value of the patent; x is the number of_jThe normalized value of the sea election index for the j patent value evaluation of the patent to be evaluated;

step 3.5, calculating a K-S test statistic D value of the index system;

calculating a K-S test statistic D value of the patent economic value score obtained by an index system by analogy with the calculation of a sea-choosing index D value of single value evaluation;

step 3.6, after calculating an index system D value formed by the sea election indexes of the remaining K patent value evaluations after the first screening, deleting the sea election index of the patent value evaluation in sequence, calculating the maximum value of the D values in the sea election index combinations of the remaining K-1 patent value evaluations, comparing the change of the D values before and after deleting the sea election index of the patent value evaluation, and deleting the sea election index of the patent value evaluation when the sea election index of the patent value evaluation is deleted and the D value of the remaining index combination is larger than that before deleting;

step 3.7, the step 3.6 is circulated until after any one of the candidate indexes of the patent value evaluation is deleted, the D values of the combination of the remaining indexes are all smaller than the D value before the candidate index of the patent value evaluation is deleted, at this moment, the deletion of the candidate indexes of the patent value evaluation is stopped, and the secondary screening of the candidate indexes of the patent value evaluation is completed; the remaining sea election indexes of the patent value assessment are the sea election index combination of the optimal patent value assessment;

step 4, calculating the semantic novelty of the patent, which comprises the following steps;

step 4.1, establishing a corpus set T = { T ] according to the invention name and abstract of the patent₁,t₂,…,t_iWhere t is_iThe method is characterized in that the method is a text information set of a patent i, namely a text consisting of an invention name and a patent specification abstract; the unique column vector of the paragraph vector matrix V represents the text paragraph of each patent, and the unique column vector of the word vector matrix W represents each word in the text paragraph of the patent;

step 4.2, predicting text paragraph t according to the unique column vector in the paragraph vector matrix and the word vector matrix, namely the average value of the text paragraph and the word_iObtaining the text paragraph representation and the word representation according to the probability of the occurrence of the next word; according to a training word sequence w₁,w₂,…,w_|T|And paragraph v_iThe following objectives are maximized under a fixed length window win:

where M is the number of all training words, v_iIs a text paragraph representation vector containing the context word of the current window; the prediction task is performed by hierarchical softmax:

wherein N is_wIs the total number of words in the training word sequence, pr is the output logarithmic probability, and the calculation formula is:

Pr＝Ua(w_t-|win|,...,w_t-1,w_t+1,…,w_t+|win|,v_i；W,V)+b (5)

wherein U and b are softmax parameters, and a is represented by w_tAnd v_iAveraging, using the PV-DM model in the underlying space R^kRepresenting a text paragraph of each patent by vectorization to obtain a text characterization matrix V of the final patent;

step 4.3, calculating the Euclidean distance between the text paragraph characterization vector of the patent and the text paragraph characterization vector of the patent cited by the text paragraph characterization vector:

step 4.4, summarizing Euclidean distances between all patent citation pairs | R | in the patent citation network, ranking, and calculating semantic novelty S of the patent_i：

Step 5, generating a node feature matrix based on the sea selection index combination of the optimal patent value evaluation obtained in the step 3 and the semantic novelty calculated in the step 4

Wherein n is₁= | V |, establish patent cited adjacency matrix

Saving reference information between nodes, using an encoder @>

Acquiring a final node feature representation, comprising the steps of:

step 5.1, inputting a node feature matrix X, and acquiring local representation of nodes in the positive sample through neighborhood information of an epsilon integration target node of a graph convolution network; the information integration process comprises the following steps:

wherein the content of the first and second substances,

is->

Degree matrix of (H)_lIs a feature representation learned for each layer; w is a group of_lIs the learning parameter of the l-th layer in the convolutional neural network; for input layer l =0, H₀= X, σ is a non-linear activation function;

step 5.2, using the function

Nodes in the convolutional neural network are modified to obtain negative samples, the same information integration method as in step 5.1 is used to generate node local representations &forthe negative samples>

Step 5.3, passing the transfer function

Passing a local representation of a node in positive samples h_iComputing a network global representation:

wherein N represents the number of positive samples;

step 5.4, use the arbiter

Distinguishing local positive and negative sample representations:

/>

step 5.5, minimizing the final loss function L_nUpdating the final representation h of each patent node in the generated positive sample_i：

Wherein N is_nIs the number of negative samples;

is a negative sample representation; s is the network global representation; (ii) a E_(.)[.]The expression function [.]The expected value of (d); />

Represents the logarithmic value of equation (10);

step 6, predicting the patent value; finally, the patent nodes are input into a machine learning XGboost model to predict the value of the patent, and a grading prediction result is obtained

For a certain patent sample i, inputting the final expression h of the patent node_iObtaining a prediction result, wherein the calculation formula is as follows:

wherein f is_kThe K decision tree in the XGboost model, where K is the number of trees in the model, f_k(h_i) Indicating the predicted value of patent sample i on the kth tree.

The invention has the beneficial effects that: the invention provides a patent value evaluation method based on a depth map and semantic learning. In the index screening process, the patent assignment and the construction of a patent value evaluation index system are combined, and an objective fair and strong-operability evaluation method is provided for feature selection. Secondly, the novelty of the patent is calculated through text semantic learning, and the patent value is measured from the semantic perspective. And further utilizing depth map learning to maximize the information integration node feature representation between the local representation and the global representation, and evaluating the patent value. The method breaks through the defects of the traditional method in the problem of patent value evaluation, and simultaneously introduces the novelty of a patent text to measure the value of the patent. The experimental result shows that the method has higher accuracy and reliability. The invention provides a new method for evaluating patent value and simultaneously provides a new solution for the research of patent value.

Drawings

FIG. 1 is a flow chart of a patent value evaluation method based on depth map and semantic learning according to the present invention.

FIG. 2 is a flowchart of index screening.

Detailed Description

The following further describes the specific embodiments of the present invention with reference to the drawings and technical solutions.

In this embodiment, 2209 biopharmaceutical field patents with the publication time of more than 5 years are taken as examples, and the index and criterion layer with the publication time of more than 5 years are used for constructing a patent value evaluation model and verifying the validity of the model. 1473 patent samples are selected for constructing a value evaluation model, 736 patent samples are selected for patent value evaluation and verification of effectiveness of the evaluation model, and the implementation steps of the technical scheme of the invention are as follows:

1. and constructing a patent citation network according to the real patent publication and citation information.

2. And selecting a sea election index and constructing a criterion layer according to the characteristics of different patent indexes in the publication time.

3. And (4) carrying out standardization processing on the index data of the patent sample by a maximum-minimum standardization method, and eliminating the influence of dimensions.

4. And calculating the value D of the statistic D of the K-S test of the single index.

The distinguishing capability of the index on the patent transfer state is measured through the size of the sea election index D value, and the larger the index D value is, the larger the difference degree of the transferred patent and the non-transferred patent on the index is, namely, the more the state whether the patent is transferred or not can be identified through the index. The following describes the calculation procedure of the single index D value, taking the index "number of pages in the specification" as an example. For convenience of understanding, it is assumed that the standardized value of "number of specification pages" is 1,0.5,0.

(4.1) each index value of the 'specification page number' corresponds to one or more patents, the patents with the same index value form a patent group, and the patent groups are arranged in a descending order according to the value of the index value of the 'specification page number'. Are listed in table 2, line 2, and table 2, line 1, the number of the patent group.

(4.2) the number of assigned patents and the number of unassigned patents in each patent group are calculated and listed in line 3 and line 4 of Table 2, respectively.

And (4.3) calculating the number of the assigned patents and the number of the unassigned patents in each accumulated patent group.

The patent group with the highest index value is used as the first accumulated patent group, and then the patent group with the lower index value is accumulated each time, namely the first two patent groups form the second accumulated patent group, and the first three patent groups form the third accumulated patent group. The number of patents assigned and the number of patents not assigned to each accumulated patent group are calculated and listed in the 5 th row and the 6 th row of table 2, respectively.

And (4.4) calculating the accumulated patent frequency and the accumulated patent frequency in each accumulated patent group.

The cumulative frequency of assigned patents is obtained by dividing the number of assigned patents accumulated in row 5 of table 2 by the total number of assigned patents accumulated in the last column of row 5 of table 2, and is listed in row 7 of table 2. Similarly, the cumulative frequency of the unassigned patents is obtained by dividing the cumulative number of unassigned patents by the total number of unassigned patents, and is listed in line 8 of table 2.

(4.5) calculating the difference d between the cumulative frequency of patents assigned and the cumulative frequency of patents not assigned in each cumulative patent group, d = | cumulative frequency of patents assigned — cumulative frequency of patents not assigned | each of which is listed in line 9 of table 2.

And (4.6) determining the value of the K-S test statistic D of the single index.

The K-S test statistic D value is the maximum value of the difference D between the cumulative frequency of assigned patents and the cumulative frequency of assigned patents, i.e., D = max (D), and the obtained D value is listed in row 10 of table 2.

TABLE 2 calculation of the D value of the K-S test statistic

5. Deleting indexes reflecting repeated information, and performing first screening of indexes

And calculating a correlation coefficient between any two indexes in the same criterion layer, and deleting the index with a small D value in the index pair with the correlation coefficient more than 0.7, so that information redundancy of an index system is avoided, and the index with strong capacity of distinguishing and transferring by mistakenly deleting is also avoided. The calculation formula of the correlation coefficient between the index q and the index j is as follows:

wherein r is_qjA correlation coefficient representing the qth index and the jth index; x is the number of_iqIs the q index value of the i patent;

represents the q index average; x is the number of_ijIs the j index value of the ith patent; />

Is the average of the j-th index.

Through correlation analysis, 9 indexes such as 'number of cited patents in the country' and 'number of cited foreign patents' are deleted altogether, and the remaining 20 indexes are deleted in an index system with a patent publication time of more than 5 years.

6. Empowering indexes based on D values

And (3) giving weight to the index according to the idea that the larger the value of the transfer distinguishing capability K-S test statistic D of the index is, the larger the index weight is. The empowerment formula is:

wherein, w_jIs the weight of the jth index; d_jThe value D of K-S test statistic of the jth index represents the transfer distinguishing capability of the index; k is the number of indices to be assigned, k =1,2, \ 8230;, 20.

7. Patent calculation value score

Calculating the economic value score of the patent by a linear weighting mode, wherein the weighting formula is as follows:

wherein Z is a patent value score; w is a_jIs the weight of the jth index; k requires the number of entitled indicators, k =1,2, \ 8230;, 20; x is the number of_jIs the normalized value of the j index of the patent to be evaluated.

8. And calculating the D value of the patent value score, and carrying out secondary screening on the index system.

(8.1) calculating D of the rating index system consisting of the remaining 20 indexes after the first screening²⁰。

According to the calculation method of the D value of the single index, D of the patent value scores of 20 index composition systems is calculated²⁰The value is obtained. Wherein D²⁰The calculation of (2) is similar to the calculation of the D value of a single index, and when the data is brought in, the standardized value of the single index needs to be replaced by the patent value score.

(8.2) determination of maximum value

After obtaining 20 indexes D²⁰After the value is obtained, one index is sequentially removed, and the residue is calculated19 indexes are combined into a system

Value, 20 index combinations, and 20 removed indexes are selected>

In (1) maximum value->

(8.3) screening out an index system with strong patent assignment distinguishing capability D value.

When D is present²⁰In the meantime, it is explained that the index system consisting of 19 indexes left after one index is removed from 20 indexes becomes stronger in the ability to distinguish the assigned patent from the non-assigned patent. Thus, a 19-index rating system is retained.

(8.4) repeating the step (2) and the step (3), and continuing to delete the index until the index is deleted

In the meantime, the screening of the index is stopped.

And after one index in the k indexes is arbitrarily removed, the distinguishing capability of an index system consisting of the remaining k-1 indexes for the patent transfer is weakened, and at the moment, the index system of the k indexes is reserved, and the index screening is terminated.

After the second index screening, 9 indexes such as IPC (International patent medicine) subclass number, figure number and the like are deleted in an index system with the patent publication time of more than 5 years, and the rest 11 indexes are deleted, so that the index system formed by the rest indexes is the index system with strong patent transfer distinguishing capability.

9. Calculating the semantic novelty of the patent.

(9.1) establishing a corpus set T = { T) according to the invention name and abstract of the patent₁,t₂,…,t_iWhere t is_iIs the text information set of patent i. The unique column vector of matrix V represents each paragraph of text and the unique column vector of matrix W represents each word in the sentence. The following objectives are maximized under a fixed length window win:

where M is the number of all training words, v_iIs a document representation vector containing the context words of the current window. The probability of the next word occurrence in the document is predicted using hierarchical softmax:

the log probability of each paper output was calculated:

Pr＝Ua(w_t-|win|,...,w_t-1,w_t+1,…,w_t+|win|,v_i；W,V)+b

wherein U and b are softmax parameters, and a is represented by w_iAnd d_jAveraging, using the PV-DM model in the underlying space R^kAnd obtaining a text characterization matrix V of the patent through vectorization.

(9.2) calculating the distance between the vector of the patent and the vector of the patent it refers to:

(9.3) summarizing and ranking the distances between all citation pairs, calculating the semantic novelty score S of the patent_i：

10. Generating a node feature matrix based on semantic novelty of screening indexes and calculation

Wherein n is₁= V |, establishing a matrix>

Saving reference information between nodes using an encoder>

Acquiring the final node characteristic representation, comprising the following steps:

(10.1) inputting a feature matrix X, and integrating neighborhood information of a target node through a graph convolution network epsilon to obtain the node representation in the positive sample:

wherein

Is->

Degree matrix of (H)_lIs a feature representation learned for each layer.

(10.2) use function

Modifying a node in the network to obtain a negative sample, generating a representation ^ for the negative sample in the same way as in step (10.1)>

(10.3) passing the transfer function

Passing a local node representation, computing a network global representation:

where N represents the number of positive samples.

(10.4) use of the discriminator

Expressed by distinguishing local positive and negative samples: />

(10.5) calculating the final loss function:

wherein N is_nIs the number of negative samples.

(10.6) minimizing the loss function, generating a representation h of each patent node_i。

11. And (5) predicting the patent value. And (4) inputting the patent node representation into a value prediction model XGboost to obtain a grading prediction result. For a certain sample i, its feature representation h is input_iObtaining a prediction result, wherein the calculation formula is as follows:

wherein f is_kIs the k-th decision tree.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any insubstantial changes and substitutions made by those skilled in the art based on the present invention are included in the scope of the present invention claimed in the claims.

Claims

1. A patent value evaluation method based on depth map and semantic learning is characterized by comprising the following steps:

TABLE 1 Standard layer and sea selection index system table

step 3.1, standardizing the sea election index data of patent value evaluation;

step 3.2, calculating a single index D value;

calculating the maximum value of the accumulated frequency difference value of the assigned patents and the non-assigned patents corresponding to the sea election indexes of the patent value evaluation in the existing patent data set to obtain the K-S test statistic D value of the sea election indexes of the patent value evaluation;

step 3.3, calculating index correlation coefficients in the same criterion layer;

calculating a correlation coefficient between any two indexes in the same criterion layer, determining an index pair reflecting repeated information in the sea election indexes of the patent value evaluation, deleting the index with a small D value from the index pair with the correlation coefficient larger than 0.7, and finishing the first screening of the sea election indexes of the patent value evaluation; forming an index system by the remaining K marine selection indexes for patent value evaluation;

step 3.4, calculating the economic value score of the patent;

calculating the patent economic value score by using the formula (2):

wherein, w_jSelecting the index weight for the jth patent value evaluation; d_jThe value of K-S test statistic D of j index; k is the number of highly selected indices needed to give weighted patent value assessments: k =1,2, \ 8230; k is the number of sea election indexes for evaluating the residual patent value after the first screening; z is the score of the economic value of the patent; x is the number of_jThe normalized value of the sea election index for the j patent value evaluation of the patent to be evaluated;

step 3.5, calculating a K-S test statistic D value of the index system;

calculating a K-S test statistic D value of the patent economic price value score obtained by an index system by analogy with the calculation of the sea selection index D value of single value evaluation;

step 4.1, establishing a corpus set T = { T ] according to the invention name and abstract of the patent₁,t₂,…,t_iIn which t_iThe method is a text information set of the patent i, namely a text consisting of the invention name and the abstract of the patent specification; the unique column vector of the paragraph vector matrix V represents the text paragraph of each patent, and the unique column vector of the word vector matrix W represents each word in the text paragraph of the patent;

step 4.2, predicting text paragraph t according to the unique column vector in the paragraph vector matrix and the word vector matrix, namely the average value of the text paragraph and the word_iObtaining text paragraph representation and word representation according to the probability of the occurrence of the next word; according to a training word sequence w₁,w₂,…,w_|T|And paragraph v_iThe following objectives are maximized under a fixed length window win:

wherein N is_wIs the total number of words in the training word sequence, pr is the output log probability, and the calculation formula is:

Pr＝Ua(w_t-|win|,...,w_t-1,w_t+1,…,w_t+|win|,v_i；W,V)+b (5)

where U and b are softmax parameters and a is a parameter represented by w_tAnd v_iAveraged using the PV-DM model in the underlying space R^kRepresenting a text paragraph of each patent by vectorization to obtain a text characterization matrix V of the final patent;