CN101625680A - Document retrieval method in patent field - Google Patents

Document retrieval method in patent field Download PDF

Info

Publication number
CN101625680A
CN101625680A CN200810012248A CN200810012248A CN101625680A CN 101625680 A CN101625680 A CN 101625680A CN 200810012248 A CN200810012248 A CN 200810012248A CN 200810012248 A CN200810012248 A CN 200810012248A CN 101625680 A CN101625680 A CN 101625680A
Authority
CN
China
Prior art keywords
text
similarity
classification
rightarrow
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810012248A
Other languages
Chinese (zh)
Other versions
CN101625680B (en
Inventor
朱靖波
王会珍
曹菲菲
肖桐
李天宁
宋国龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN200810012248A priority Critical patent/CN101625680B/en
Publication of CN101625680A publication Critical patent/CN101625680A/en
Application granted granted Critical
Publication of CN101625680B publication Critical patent/CN101625680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a document retrieval method in the patent field, which comprises the following steps: preprocessing query texts and patent texts; retrieving the patent texts correlative with the query texts, adopting a calculation method with various similarities to obtain values of different similarities, combining the values of different similarities to recalculate the similarities, and sequencing the patent texts according to the new values of the similarities; adopting various decision methods to map the sequencing of the similarities of the patent text into different sequencings of patent category interdependencies; integrating the sequencing results of various patent category interdependencies, and performing resequencing to obtain the sequencing of new patent category interdependencies; and selecting the patent category most relevant to the query texts from the sequencing of the new patent category interdependencies. The document retrieval method uses the calculation method with various similarities to finally weigh the degree of correlation of the query texts and the patent texts, and uses information of characteristic multi-angles and considers a plurality of system combinations to achieve the aim of mutual complementation and improve the system performance.

Description

Document retrieval method towards patent field
Technical field
The present invention relates to a kind of data-searching method, particularly a kind of document retrieval method towards patent field.
Background technology
Developing rapidly of science and technology, the document dramatic growth of record scientific and technological achievement, patent more and more is much accounted of as one of most important means of intellectual property protection.The related technical scheme of innovation and creation that the patent text record is the most novel, yet the document of record scientific and technological achievement except patent, also have other non-patent text, for example scientific research paper, technical report etc.There is certain relation between patent and the non-patent, for example, to the research of scientific research paper and patent relation, can the forecasting techniques developing trend.To the research of patent documentation and off-patent scientific research document, can understand the up-to-date technology of every field, thereby avoid overlapping development, avoid infringement, even can analyze the development of whole technique industry; Can analyze rival's technical research situation and strategy; Can realize ineffectivity retrieval to patent.Retrieval to patent documentation and non-patent literature is the newer problem in patent research field.
Usually have in the patent text and quote relevant patent or scientific research paper, utilize the adduction relationship research non-patent literature of patent and scientific research paper and the relation between the patent text merely, very limited.And, the patent file in the patent database have millions of more than, adopting the patent operation of manual type merely is a job of wasting time and energy.How from huge patent database, to retrieve relevant patent and obtain the difficult problem that useful patent information is a patent research.
Present patent retrieval and sorting technique have two kinds, a kind of patent retrieval of patent database to having classified, another kind of search method based on natural language processing technique of being based on.
Early stage patent retrieval method great majority are based on the method for patent database, and for example publication number is the CN1996290A patent, has mainly utilized the text message of patent structureization, extract the patent citation relation, make up the patent associated diagram.Then according to certain patent querying condition, for example application number, the patent No., date of application, date of declaration, inventor, patentee etc., patent searching and in the patent associated diagram with the patent that retrieves.This method depends on the fixing structured text of patent itself, and is intelligent inadequately, patent content do not analyzed.
Method based on natural language processing, be meant and adopt natural language processing technique the patent text content analysis, title from patent, summary, instructions, in the texts such as claims, obtain the useful feature that characterizes patent, give weight information to feature, the relevant patent text of retrieval, for example (this article author is Leah S.Larkey to article SomeIssues in the Automatic Classification of U.S.Patents, article is the special report in the AAAI-98 text classification study group), introduced and adopted natural language processing technique to carry out the method for patent classification.(this article author is In-Su Kang to article POSTECH at NTCIR-5Patent Retrieval:Smoothing Experiments in a Language Modeling Approach toPatent Retrieval, Seung-Hoon Na, Jun-Ki Kim, Jong-Hyeok Lee, article is published in Proceedings of NTCIR-5 Workshop Meeting, December 6-9,2005, Tokyo Japan), adopts natural language processing technique to realize patent retrieval.
But existing method only is confined to keyword retrieval, and only at the retrieval between the patent text, do not consider the relation between non-patent text and patent text, non-patent text and the patent classification, can not realize the intelligent full-text search of non-patent text and patent text.
Summary of the invention
At not considering relation between non-patent text and patent text, non-patent text and the patent classification towards the file retrieval of patent field in the prior art, can not realize the weak point of the intelligent full-text search of non-patent text and patent text, the technical problem to be solved in the present invention provides a kind of method of patent retrieval, can realize that the proper vector of patent text represents, calculate non-patent text and relevant patent text similarity, retrieve maximally related patent text.
For solving the problems of the technologies described above, the technical solution used in the present invention may further comprise the steps based on the patent retrieval method of natural language processing technique:
Query text and patent text are carried out pre-service;
Retrieve the patent text relevant, adopt multiple different similarity Calculation Method to obtain the value of different similarities, make up the value of different similarities, recomputate similarity, patent text is sorted by the value of new similarity with query text;
Adopt multiple different decision-making technique, the sequencing of similarity of patent text is become the difference ordering of patent classification correlativity; A plurality of different patent classification relevance ranking results are integrated, and rearrangement obtains new patent classification relevance ranking;
From new patent classification relevance ranking, select and the maximally related patent classification of query text.
Described disposal route to text comprises the pre-service to text, obtain the candidate of feature speech, statistical nature speech data message, adopt the method selected characteristic of Feature Selection, text is converted into the vector representation form, be specially: removing in the patent text is not the label of patent text, extracts patent text information, the number of patenting, patent IPC classification mark, patent name, specification digest, claims, instructions; English text is kept all Caps word; Remove the word that contains numeral; Remove stop word; English text is carried out the morphological pattern reduction handle, obtain feature candidate vocabulary; Feature candidate vocabulary is added up, obtained the classification frequency information of word frequency, document frequency, speech; Selected characteristic vocabulary from the feature candidate word, the feature weight of each feature speech in the calculated characteristics vocabulary is converted into computable vector according to feature speech and feature weight thereof with patent text and query text.
Described multiple different calculation of similarity degree methods obtain the similarity value of query text and patent text, and based on the above-mentioned multiple different similarity value of Log-linear model integration, computing formula is as follows:
Sim ( D → 1 , D → 2 ) = exp ( θ → · S → ( D → 1 , D → 2 ) ) Σ k = 0 n exp ( θ · S → → ( D → 1 , d → k ) )
Wherein,
Figure S2008100122484D00022
It is query text
Figure S2008100122484D00023
And patent text
Figure S2008100122484D00024
The vector that the similarity value that adopts different similarity calculating methods to obtain is formed as feature, Be the weight vectors that adopts the similarity value that different similarity calculating methods obtain, n is the patent text sum relevant with query text,
Figure S2008100122484D00031
Represent k relevant patent text vector.
Described multiple different decision-making technique, the similarity that comprises patent classification weight adds similarity with method, patent text sequencing of similarity position weight and adds with method and patent text similarity and add and method, and wherein the similarity of patent classification weight adds with computing formula as follows:
score ( x ) = Σ i = 1 k ( k r ) c i × ICF × score d i × role ( x , i )
ICF = log ( N + 0.5 C x + 0.5 )
Figure S2008100122484D00034
Wherein, k rBe the penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result, c iBe meant the position that the affiliated patent classification of candidate's patent text i obtains according to sequencing of similarity,
Figure S2008100122484D00035
Be query text and patent text d iThe similarity value, ICF is meant the inverse of classification text frequency, wherein C xBe meant the textual data under the classification x, the textual data that N is total, score (x) is the value of the correlativity of query text and patent classification x, (x i) judges whether patent text di belongs to patent classification x to role.
The similarity of described patent text sequencing of similarity position weight adds with computing formula as follows:
score ( x ) = Σ i = 1 k ( k t ) i × score d i × role ( x , i )
Figure S2008100122484D00037
Described a plurality of different patent classification relevance ranking results are integrated, be the patent classification relevance ranking result who adopts after multiple different similarity values and multiple different classes of decision methods make up, as the feature of patent classification position, based on of the combination of Rank-SVM model to a plurality of patent classification relevance ranking results.
Described a plurality of different patent classification relevance ranking results being integrated, is to adopt according in a plurality of different patent classification correlation results, the positional value that classification occurs add and, calculate the value of new patent classification correlativity.
The present invention has following beneficial effect and advantage:
1. the inventive method has adopted the technology of natural language processing, utilizes the degree of correlation of multiple similarity Calculation Method as final balance query text and patent text, makes full use of the information of feature multi-angle.At last, consider a plurality of system in combination, reached the purpose of complementation each other, improved system performance.
Description of drawings
Fig. 1 is the inventive method process flow diagram;
Fig. 2 is text pretreatment process figure;
Fig. 3 is query text and patent text similarity calculation flow chart;
Fig. 4 is query text and patent classification correlation calculations process flow diagram;
Embodiment
Below in conjunction with being that embodiment and accompanying drawing are further illustrated method of the present invention:
As shown in Figure 1, a kind of document retrieval method towards patent field may further comprise the steps:
Query text and patent text are carried out pre-service; Retrieve the patent text relevant, adopt multiple different similarity Calculation Method to obtain the value of different similarities, make up the value of different similarities, recomputate similarity, patent text is sorted by the value of new similarity with query text; Adopt multiple different decision-making technique, the sequencing of similarity of patent text is become the difference ordering of patent classification correlativity, a plurality of different patent classification relevance ranking results are integrated, rearrangement obtains new patent classification relevance ranking; From new patent classification relevance ranking, select and the maximally related patent classification of query text.
As shown in Figure 2, describedly query text and patent text are carried out pre-service may further comprise the steps:
A) removing in the patent text is not the label of patent text, extracts patent text information, the number of patenting, patent IPC classification mark, patent name, specification digest, claims and instructions; Remove inner non-letter of word or non-Chinese symbol in the patent text information of acquisition, for example: '-', ', ', ' (', ') ' etc.; English text is kept all Caps word; Remove the word that contains numeral; Remove stop word, for example: in the English patent, " claim ", " said " etc., in the Chinese patent, " step ", " feature " etc. and preposition, adverbial word, article etc.; English text is carried out the morphological pattern reduction handle, obtain feature candidate vocabulary;
B) feature candidate vocabulary is added up, obtained the classification frequency information of word frequency, document frequency, speech;
C) selected characteristic vocabulary from the feature candidate word, the feature weight of each feature speech in the calculated characteristics vocabulary is converted into computable vector according to feature speech and feature weight thereof with patent text and query text.
D) with the feature speech of patent as index terms, be that patent file and patent text vector makes up the inverted index document storage.
As shown in Figure 3, multiple different calculation of similarity degree method may further comprise the steps:
In the patent text storehouse, find the patent text that co-occurrence feature speech is arranged with query text, constitute relevant patent text set.
Calculate the relevant patent in the relevant patent text set and the similarity of query text, adopted multiple similarity Calculation Method in the present embodiment, wherein directed quantity cosine method, BM25 method, SMART method specifically are calculated as follows:
1. the computing method of vectorial cosine
Represent query text with vector space model And patent text
Figure S2008100122484D00042
, the cosine computing formula of two vectors:
cos ( D → 1 , D → 2 ) = D → 1 · D → 2 | | D → 1 | · | | D → 2 | |
2.BM25 computing method
BM25 has a lot of mutation, and BM25 computing method formula is as follows in the present embodiment:
score ( D → 1 , D → 2 ) = Σ i = 1 n IDF ( t i ) · f ( t i , D → 2 ) · ( k 1 + 1 ) f ( t i , D → 2 ) + k 1 · ( 1 - b + b · | D → 2 | avgdl )
Wherein n represents query text
Figure S2008100122484D00052
Feature speech number; F (t i, D 2) be feature speech t iAt patent text
Figure S2008100122484D00053
The middle number of times that occurs;
Figure S2008100122484D00054
The expression patent text
Figure S2008100122484D00055
Text size; Avgdl is the average length that the patent text relevant with query text gathered Chinese version; k 1With b be free parameter, in the present embodiment, k 1Value is 2.0, and the b value is 0.75; IDF (t i) be the inverse of document frequency, be term t iWeight, computing formula is as follows:
IDF ( t i ) = log N - n ( t i ) + 0.5 n ( t i ) + 0.5
Wherein N is the total number of documents on the whole data set, n (t i) be meant and comprise term t iNumber of files.
3.SMART computing method
SMART algorithm computation formula is as follows:
Sim SMART = Σ t ∈ T ( D → 1 × D → 2 )
The query text vector
Figure S2008100122484D00058
In the weight w of every dimensional feature iThe employing following formula calculates:
w i = ( 1 + log ( tf i ) ) × log N + 1 n
The patent text vector
Figure S2008100122484D000510
In the weight w of every dimensional feature iThe employing following formula calculates:
w i = 1 + log ( tf i ) 1 + log ( avtf ) × 1 0.8 + 0.2 utf pivot
Wherein T represents query text
Figure S2008100122484D000512
With patent text The feature set of words of common appearance; Tf iIt is the word frequency of i feature speech in the text vector; N is whole patent text set Chinese version numbers, and n is meant the patent text number that i feature occur; Avtf is the average word frequency of feature speech document in relevant patent text set; Utf is the patent text vector In feature speech number; Pivot is the average characteristics speech number of each document in whole patent text set.
Calculate the similarity value of different query text and patent text respectively with three kinds of methods.
The different similarity value that obtains through above-mentioned each computing method is carried out normalized, obtain the similarity value between 0 to 1.
Similarity values different after the normalization is taken the logarithm respectively.
With the feature of the different similarity values after taking the logarithm as the Log-linear model, computing formula is as follows:
Sim ( D → 1 , D → 2 ) = exp ( θ → · S → ( D → 1 , D → 2 ) ) Σ k = 0 n exp ( θ · S → → ( D → 1 , d → k ) )
Wherein,
Figure S2008100122484D00061
It is query text
Figure S2008100122484D00062
And patent text
Figure S2008100122484D00063
The vector that the similarity value that adopts different similarity calculating methods to obtain is formed as feature, Be the weight vectors that adopts the similarity value that different similarity calculating methods obtain, n is the patent text sum relevant with query text, Represent k relevant patent text vector.
As shown in Figure 4, adopt multiple different patent classification decision methods, calculate the relevance ranking between query text and the patent classification different patent text sequencing of similarity results.In the present embodiment, the patent classification decision methods of employing has: similarity add and method, patent text similarity position weight add with method and patent classification weight and add and method, its computing method are as follows:
Similarity add and method, calculate as follows as formula:
score ( x ) = Σ i = 1 k score d i × role ( x , i )
Figure S2008100122484D00067
Wherein x represents the classification of IPC, and k represents the patent text number of the candidate among the patent text sequencing of similarity result,
Figure S2008100122484D00068
Represent the similarity value of i candidate's patent text.(x i) judges patent text d to role iWhether belong to patent classification x.
2. patent classification weight adds and method, and computing formula is as follows:
score ( x ) = Σ i = 1 k ( k r ) c i × ICF × score d i × role ( x , i )
ICF = log ( N + 0.5 C x + 0.5 )
Figure S2008100122484D000611
Wherein, k rBe the penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result, c iBe meant the position that the affiliated patent classification of candidate's patent text i obtains according to sequencing of similarity,
Figure S2008100122484D000612
Be query text and patent text d iThe similarity value, ICF is meant the inverse of classification text frequency, wherein C xBe meant the textual data under the classification x, N is total textual data, and score (x) is the value of the correlativity of query text and patent classification x.(x i) judges patent text d to role iWhether belong to patent classification x.
3. patent text similarity position weight adds and method, and computing formula is as follows:
score ( x ) = Σ i = 1 k ( k t ) i × score d i × role ( x , i )
Figure S2008100122484D000614
Wherein, k iBe a penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result,
Figure S2008100122484D00071
Be query text and patent text d iThe similarity value.(x i) judges patent text d to role iWhether belong to patent classification x.
A plurality of different patent classification relevance ranking results 1~3 are made up, the classification ranking results is resequenced.Array mode has multiple, and the combined method of Cai Yonging has following two kinds in the present embodiment:
With the patent classification relevance ranking result after multiple different similarity values and the multiple different classes of decision methods combination, as the feature of patent classification position, based on of the combination of Rank-SVM model to a plurality of patent classification relevance ranking results.
Employing is according in a plurality of different patent classification correlation results, the positional value that classification occurs add and, calculate the value of new patent classification correlativity.
Obtain the similarity value of query text and patent text by above-mentioned steps, sort, select maximally related patent classification with query text according to this similarity value.
Method of the present invention is not limited to the embodiment described in collective's implementation method, as if those skilled in the art's just scheme according to the present invention draws other embodiment, belongs to technological innovation scope of the present invention equally.

Claims (7)

1. document retrieval method towards patent field may further comprise the steps:
Query text and patent text are carried out pre-service;
Retrieve the patent text relevant, adopt multiple different similarity Calculation Method to obtain the value of different similarities, make up the value of different similarities, recomputate similarity, patent text is sorted by the value of new similarity with query text;
Adopt multiple different decision-making technique, the sequencing of similarity of patent text is become the difference ordering of patent classification correlativity; A plurality of different patent classification relevance ranking results are integrated, and rearrangement obtains new patent classification relevance ranking;
From new patent classification relevance ranking, select and the maximally related patent classification of query text.
2. a kind of document retrieval method as claimed in claim 1 towards patent field, it is characterized in that: the disposal route of text is comprised pre-service to text, obtain the candidate of feature speech, statistical nature speech data message, adopt the method selected characteristic of Feature Selection, text is converted into the vector representation form, is specially:
Removing in the patent text is not the label of patent text, extracts patent text information, the number of patenting, patent IPC classification mark, patent name, specification digest, claims, instructions; English text is kept all Caps word; Remove the word that contains numeral; Remove stop word; English text is carried out the morphological pattern reduction handle, obtain feature candidate vocabulary;
Feature candidate vocabulary is added up, obtained the classification frequency information of word frequency, document frequency, speech;
Selected characteristic vocabulary from the feature candidate word, the feature weight of each feature speech in the calculated characteristics vocabulary is converted into computable vector according to feature speech and feature weight thereof with patent text and query text.
3. a kind of document retrieval method as claimed in claim 1 towards patent field, it is characterized in that: described multiple different calculation of similarity degree methods obtain the similarity value of query text and patent text, based on the above-mentioned multiple different similarity value of Log-linear model integration, computing formula is as follows:
Sim ( D → 1 , D → 2 ) = exp ( θ → · S → ( D → 1 , D → 2 ) ) Σ k = 0 n exp ( θ · S → ( D → 1 , d → k ) ) →
Wherein,
Figure A2008100122480002C2
It is query text
Figure A2008100122480002C3
And patent text
Figure A2008100122480002C4
The vector that the similarity value that adopts different similarity calculating methods to obtain is formed as feature,
Figure A2008100122480002C5
Be the weight vectors that adopts the similarity value that different similarity calculating methods obtain, n is the patent text sum relevant with query text,
Figure A2008100122480002C6
Represent k relevant patent text vector.
4. a kind of according to claim 1 document retrieval method towards patent field, it is characterized in that: described multiple different decision-making technique, the similarity that comprises patent classification weight adds similarity with method, patent text sequencing of similarity position weight and adds with method and patent text similarity and add and method, and wherein the similarity of patent classification weight adds with computing formula as follows:
score ( x ) = Σ i = 1 k ( k r ) c i × ICF × score d i × role ( x , i )
ICF = log ( N + 0.5 C x + 0.5 )
Wherein, k rBe the penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result, c iBe meant the position that the affiliated patent classification of candidate's patent text i obtains according to sequencing of similarity, Be query text and patent text d iThe similarity value, ICF is meant the inverse of classification text frequency, wherein C xBe meant the textual data under the classification x, the textual data that N is total, score (x) is the value of the correlativity of query text and patent classification x, (x i) judges whether patent text di belongs to patent classification x to role.
5. as a kind of document retrieval method towards patent field as described in the claim 4, it is characterized in that: the similarity of described patent text sequencing of similarity position weight adds with computing formula as follows:
score ( x ) = Σ i = 1 k ( k t ) i × score d i × role ( x , i )
Figure A2008100122480003C6
6. a kind of according to claim 1 document retrieval method towards patent field, it is characterized in that: described a plurality of different patent classification relevance ranking results are integrated, be the patent classification relevance ranking result who adopts after multiple different similarity values and multiple different classes of decision methods make up, as the feature of patent classification position, based on of the combination of Rank-SVM model to a plurality of patent classification relevance ranking results.
7. a kind of according to claim 1 document retrieval method towards patent field, it is characterized in that: described a plurality of different patent classification relevance ranking results are integrated, be to adopt according in a plurality of different patent classification correlation results, the positional value that classification occurs add and, calculate the value of new patent classification correlativity.
CN200810012248A 2008-07-09 2008-07-09 Document retrieval method in patent field Active CN101625680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810012248A CN101625680B (en) 2008-07-09 2008-07-09 Document retrieval method in patent field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810012248A CN101625680B (en) 2008-07-09 2008-07-09 Document retrieval method in patent field

Publications (2)

Publication Number Publication Date
CN101625680A true CN101625680A (en) 2010-01-13
CN101625680B CN101625680B (en) 2012-08-29

Family

ID=41521531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810012248A Active CN101625680B (en) 2008-07-09 2008-07-09 Document retrieval method in patent field

Country Status (1)

Country Link
CN (1) CN101625680B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 Searching method and searching system
CN102792262A (en) * 2010-02-03 2012-11-21 汤姆森路透社全球资源公司 Method and system for ranking intellectual property documents using claim analysis
CN103455609A (en) * 2013-09-05 2013-12-18 江苏大学 New kernel function Luke kernel-based patent document similarity detection method
CN103577462A (en) * 2012-08-02 2014-02-12 北京百度网讯科技有限公司 Document classification method and document classification device
CN104778276A (en) * 2015-04-29 2015-07-15 北京航空航天大学 Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency)
CN107153689A (en) * 2017-04-29 2017-09-12 安徽富驰信息技术有限公司 A kind of case search method based on Topic Similarity
CN107193814A (en) * 2016-03-14 2017-09-22 北京京东尚科信息技术有限公司 The method and apparatus that the automatic taxonomic revision of books is realized in digital reading
CN107256275A (en) * 2011-11-02 2017-10-17 微软技术许可有限责任公司 Routing inquiry result
CN108090047A (en) * 2018-01-10 2018-05-29 华南师范大学 A kind of definite method and apparatus of text similarity
US10073890B1 (en) 2015-08-03 2018-09-11 Marca Research & Development International, Llc Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm
CN109726401A (en) * 2019-01-03 2019-05-07 中国联合网络通信集团有限公司 A kind of patent portfolios generation method and platform
CN109960757A (en) * 2019-02-27 2019-07-02 北京搜狗科技发展有限公司 Web search method and device
CN110334269A (en) * 2019-07-11 2019-10-15 中国船舶工业综合技术经济研究院 A kind of information retrieval method and system
CN110516062A (en) * 2019-08-26 2019-11-29 腾讯科技(深圳)有限公司 A kind of search processing method and device of document
CN110633407A (en) * 2018-06-20 2019-12-31 百度在线网络技术(北京)有限公司 Information retrieval method, device, equipment and computer readable medium
US10540439B2 (en) 2016-04-15 2020-01-21 Marca Research & Development International, Llc Systems and methods for identifying evidentiary information
US10621499B1 (en) 2015-08-03 2020-04-14 Marca Research & Development International, Llc Systems and methods for semantic understanding of digital information
US11281846B2 (en) 2011-11-02 2022-03-22 Microsoft Technology Licensing, Llc Inheritance of rules across hierarchical levels

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7664735B2 (en) * 2004-04-30 2010-02-16 Microsoft Corporation Method and system for ranking documents of a search result to improve diversity and information richness
CN100442292C (en) * 2007-03-22 2008-12-10 华中科技大学 Method for indexing and acquiring semantic net information

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102792262B (en) * 2010-02-03 2016-08-10 汤姆森路透社全球资源公司 Use the method and system of claim analysis sequence intellectual property document
CN102792262A (en) * 2010-02-03 2012-11-21 汤姆森路透社全球资源公司 Method and system for ranking intellectual property documents using claim analysis
US11281846B2 (en) 2011-11-02 2022-03-22 Microsoft Technology Licensing, Llc Inheritance of rules across hierarchical levels
CN107256275A (en) * 2011-11-02 2017-10-17 微软技术许可有限责任公司 Routing inquiry result
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 Searching method and searching system
CN102768679B (en) * 2012-06-25 2015-04-22 深圳市汉络计算机技术有限公司 Searching method and searching system
CN103577462B (en) * 2012-08-02 2018-10-16 北京百度网讯科技有限公司 A kind of Document Classification Method and device
CN103577462A (en) * 2012-08-02 2014-02-12 北京百度网讯科技有限公司 Document classification method and document classification device
CN103455609B (en) * 2013-09-05 2017-06-16 江苏大学 A kind of patent document similarity detection method based on kernel function Luke cores
WO2015032301A1 (en) * 2013-09-05 2015-03-12 江苏大学 Method for detecting the similarity of the patent documents on the basis of new kernel function luke kernel
CN103455609A (en) * 2013-09-05 2013-12-18 江苏大学 New kernel function Luke kernel-based patent document similarity detection method
CN104778276A (en) * 2015-04-29 2015-07-15 北京航空航天大学 Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency)
US10073890B1 (en) 2015-08-03 2018-09-11 Marca Research & Development International, Llc Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm
US10621499B1 (en) 2015-08-03 2020-04-14 Marca Research & Development International, Llc Systems and methods for semantic understanding of digital information
CN107193814A (en) * 2016-03-14 2017-09-22 北京京东尚科信息技术有限公司 The method and apparatus that the automatic taxonomic revision of books is realized in digital reading
CN107193814B (en) * 2016-03-14 2020-07-31 北京京东尚科信息技术有限公司 Method and device for realizing automatic book sorting in digital reading
US10540439B2 (en) 2016-04-15 2020-01-21 Marca Research & Development International, Llc Systems and methods for identifying evidentiary information
CN107153689A (en) * 2017-04-29 2017-09-12 安徽富驰信息技术有限公司 A kind of case search method based on Topic Similarity
CN108090047A (en) * 2018-01-10 2018-05-29 华南师范大学 A kind of definite method and apparatus of text similarity
CN108090047B (en) * 2018-01-10 2022-05-24 华南师范大学 Text similarity determination method and equipment
CN110633407A (en) * 2018-06-20 2019-12-31 百度在线网络技术(北京)有限公司 Information retrieval method, device, equipment and computer readable medium
US11977589B2 (en) 2018-06-20 2024-05-07 Baidu Online Network Technology (Beijing) Co., Ltd. Information search method, device, apparatus and computer-readable medium
CN109726401A (en) * 2019-01-03 2019-05-07 中国联合网络通信集团有限公司 A kind of patent portfolios generation method and platform
CN109726401B (en) * 2019-01-03 2022-09-23 中国联合网络通信集团有限公司 Patent combination generation method and system
CN109960757A (en) * 2019-02-27 2019-07-02 北京搜狗科技发展有限公司 Web search method and device
CN110334269A (en) * 2019-07-11 2019-10-15 中国船舶工业综合技术经济研究院 A kind of information retrieval method and system
CN110334269B (en) * 2019-07-11 2021-05-07 中国船舶工业综合技术经济研究院 Information retrieval method and system
CN110516062A (en) * 2019-08-26 2019-11-29 腾讯科技(深圳)有限公司 A kind of search processing method and device of document
CN110516062B (en) * 2019-08-26 2022-11-04 腾讯科技(深圳)有限公司 Method and device for searching and processing document

Also Published As

Publication number Publication date
CN101625680B (en) 2012-08-29

Similar Documents

Publication Publication Date Title
CN101625680B (en) Document retrieval method in patent field
CN101430695B (en) System and method for computing difference affinities of word
CN104199857B (en) A kind of tax document hierarchy classification method based on multi-tag classification
Sarkar Sentence clustering-based summarization of multiple text documents
CN106095949A (en) A kind of digital library's resource individuation recommendation method recommended based on mixing and system
Wang et al. Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications
Zaw et al. Web document clustering using cuckoo search clustering algorithm based on levy flight
CN101097570A (en) Advertisement classification method capable of automatic recognizing classified advertisement type
CN106407182A (en) A method for automatic abstracting for electronic official documents of enterprises
CN104484380A (en) Personalized search method and personalized search device
CN109840532A (en) A kind of law court's class case recommended method based on k-means
Landthaler et al. Extending Full Text Search for Legal Document Collections Using Word Embeddings.
CN100511214C (en) Method and system for abstracting batch single document for document set
CN105279264A (en) Semantic relevancy calculation method of document
CN109670014A (en) A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning
Wang et al. Neural related work summarization with a joint context-driven attention mechanism
CN104778157A (en) Multi-document abstract sentence generating method
Barla et al. From ambiguous words to key-concept extraction
CN1916904A (en) Method of abstracting single file based on expansion of file
Murthy et al. A comparative study on term weighting methods for automated telugu text categorization with effective classifiers
Zhu et al. Research on summary sentences extraction oriented to live sports text
Wang et al. Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents.
Wang et al. User intention-based document summarization on heterogeneous sentence networks
Kalita et al. An extractive approach of text summarization of Assamese using WordNet
Nghiem et al. Which one is better: presentation-based or content-based math search?

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant