CN101625680A - Document retrieval method in patent field - Google Patents
Document retrieval method in patent field Download PDFInfo
- Publication number
- CN101625680A CN101625680A CN200810012248A CN200810012248A CN101625680A CN 101625680 A CN101625680 A CN 101625680A CN 200810012248 A CN200810012248 A CN 200810012248A CN 200810012248 A CN200810012248 A CN 200810012248A CN 101625680 A CN101625680 A CN 101625680A
- Authority
- CN
- China
- Prior art keywords
- text
- similarity
- classification
- rightarrow
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a document retrieval method in the patent field, which comprises the following steps: preprocessing query texts and patent texts; retrieving the patent texts correlative with the query texts, adopting a calculation method with various similarities to obtain values of different similarities, combining the values of different similarities to recalculate the similarities, and sequencing the patent texts according to the new values of the similarities; adopting various decision methods to map the sequencing of the similarities of the patent text into different sequencings of patent category interdependencies; integrating the sequencing results of various patent category interdependencies, and performing resequencing to obtain the sequencing of new patent category interdependencies; and selecting the patent category most relevant to the query texts from the sequencing of the new patent category interdependencies. The document retrieval method uses the calculation method with various similarities to finally weigh the degree of correlation of the query texts and the patent texts, and uses information of characteristic multi-angles and considers a plurality of system combinations to achieve the aim of mutual complementation and improve the system performance.
Description
Technical field
The present invention relates to a kind of data-searching method, particularly a kind of document retrieval method towards patent field.
Background technology
Developing rapidly of science and technology, the document dramatic growth of record scientific and technological achievement, patent more and more is much accounted of as one of most important means of intellectual property protection.The related technical scheme of innovation and creation that the patent text record is the most novel, yet the document of record scientific and technological achievement except patent, also have other non-patent text, for example scientific research paper, technical report etc.There is certain relation between patent and the non-patent, for example, to the research of scientific research paper and patent relation, can the forecasting techniques developing trend.To the research of patent documentation and off-patent scientific research document, can understand the up-to-date technology of every field, thereby avoid overlapping development, avoid infringement, even can analyze the development of whole technique industry; Can analyze rival's technical research situation and strategy; Can realize ineffectivity retrieval to patent.Retrieval to patent documentation and non-patent literature is the newer problem in patent research field.
Usually have in the patent text and quote relevant patent or scientific research paper, utilize the adduction relationship research non-patent literature of patent and scientific research paper and the relation between the patent text merely, very limited.And, the patent file in the patent database have millions of more than, adopting the patent operation of manual type merely is a job of wasting time and energy.How from huge patent database, to retrieve relevant patent and obtain the difficult problem that useful patent information is a patent research.
Present patent retrieval and sorting technique have two kinds, a kind of patent retrieval of patent database to having classified, another kind of search method based on natural language processing technique of being based on.
Early stage patent retrieval method great majority are based on the method for patent database, and for example publication number is the CN1996290A patent, has mainly utilized the text message of patent structureization, extract the patent citation relation, make up the patent associated diagram.Then according to certain patent querying condition, for example application number, the patent No., date of application, date of declaration, inventor, patentee etc., patent searching and in the patent associated diagram with the patent that retrieves.This method depends on the fixing structured text of patent itself, and is intelligent inadequately, patent content do not analyzed.
Method based on natural language processing, be meant and adopt natural language processing technique the patent text content analysis, title from patent, summary, instructions, in the texts such as claims, obtain the useful feature that characterizes patent, give weight information to feature, the relevant patent text of retrieval, for example (this article author is Leah S.Larkey to article SomeIssues in the Automatic Classification of U.S.Patents, article is the special report in the AAAI-98 text classification study group), introduced and adopted natural language processing technique to carry out the method for patent classification.(this article author is In-Su Kang to article POSTECH at NTCIR-5Patent Retrieval:Smoothing Experiments in a Language Modeling Approach toPatent Retrieval, Seung-Hoon Na, Jun-Ki Kim, Jong-Hyeok Lee, article is published in Proceedings of NTCIR-5 Workshop Meeting, December 6-9,2005, Tokyo Japan), adopts natural language processing technique to realize patent retrieval.
But existing method only is confined to keyword retrieval, and only at the retrieval between the patent text, do not consider the relation between non-patent text and patent text, non-patent text and the patent classification, can not realize the intelligent full-text search of non-patent text and patent text.
Summary of the invention
At not considering relation between non-patent text and patent text, non-patent text and the patent classification towards the file retrieval of patent field in the prior art, can not realize the weak point of the intelligent full-text search of non-patent text and patent text, the technical problem to be solved in the present invention provides a kind of method of patent retrieval, can realize that the proper vector of patent text represents, calculate non-patent text and relevant patent text similarity, retrieve maximally related patent text.
For solving the problems of the technologies described above, the technical solution used in the present invention may further comprise the steps based on the patent retrieval method of natural language processing technique:
Query text and patent text are carried out pre-service;
Retrieve the patent text relevant, adopt multiple different similarity Calculation Method to obtain the value of different similarities, make up the value of different similarities, recomputate similarity, patent text is sorted by the value of new similarity with query text;
Adopt multiple different decision-making technique, the sequencing of similarity of patent text is become the difference ordering of patent classification correlativity; A plurality of different patent classification relevance ranking results are integrated, and rearrangement obtains new patent classification relevance ranking;
From new patent classification relevance ranking, select and the maximally related patent classification of query text.
Described disposal route to text comprises the pre-service to text, obtain the candidate of feature speech, statistical nature speech data message, adopt the method selected characteristic of Feature Selection, text is converted into the vector representation form, be specially: removing in the patent text is not the label of patent text, extracts patent text information, the number of patenting, patent IPC classification mark, patent name, specification digest, claims, instructions; English text is kept all Caps word; Remove the word that contains numeral; Remove stop word; English text is carried out the morphological pattern reduction handle, obtain feature candidate vocabulary; Feature candidate vocabulary is added up, obtained the classification frequency information of word frequency, document frequency, speech; Selected characteristic vocabulary from the feature candidate word, the feature weight of each feature speech in the calculated characteristics vocabulary is converted into computable vector according to feature speech and feature weight thereof with patent text and query text.
Described multiple different calculation of similarity degree methods obtain the similarity value of query text and patent text, and based on the above-mentioned multiple different similarity value of Log-linear model integration, computing formula is as follows:
Wherein,
It is query text
And patent text
The vector that the similarity value that adopts different similarity calculating methods to obtain is formed as feature,
Be the weight vectors that adopts the similarity value that different similarity calculating methods obtain, n is the patent text sum relevant with query text,
Represent k relevant patent text vector.
Described multiple different decision-making technique, the similarity that comprises patent classification weight adds similarity with method, patent text sequencing of similarity position weight and adds with method and patent text similarity and add and method, and wherein the similarity of patent classification weight adds with computing formula as follows:
Wherein, k
rBe the penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result, c
iBe meant the position that the affiliated patent classification of candidate's patent text i obtains according to sequencing of similarity,
Be query text and patent text d
iThe similarity value, ICF is meant the inverse of classification text frequency, wherein C
xBe meant the textual data under the classification x, the textual data that N is total, score (x) is the value of the correlativity of query text and patent classification x, (x i) judges whether patent text di belongs to patent classification x to role.
The similarity of described patent text sequencing of similarity position weight adds with computing formula as follows:
Described a plurality of different patent classification relevance ranking results are integrated, be the patent classification relevance ranking result who adopts after multiple different similarity values and multiple different classes of decision methods make up, as the feature of patent classification position, based on of the combination of Rank-SVM model to a plurality of patent classification relevance ranking results.
Described a plurality of different patent classification relevance ranking results being integrated, is to adopt according in a plurality of different patent classification correlation results, the positional value that classification occurs add and, calculate the value of new patent classification correlativity.
The present invention has following beneficial effect and advantage:
1. the inventive method has adopted the technology of natural language processing, utilizes the degree of correlation of multiple similarity Calculation Method as final balance query text and patent text, makes full use of the information of feature multi-angle.At last, consider a plurality of system in combination, reached the purpose of complementation each other, improved system performance.
Description of drawings
Fig. 1 is the inventive method process flow diagram;
Fig. 2 is text pretreatment process figure;
Fig. 3 is query text and patent text similarity calculation flow chart;
Fig. 4 is query text and patent classification correlation calculations process flow diagram;
Embodiment
Below in conjunction with being that embodiment and accompanying drawing are further illustrated method of the present invention:
As shown in Figure 1, a kind of document retrieval method towards patent field may further comprise the steps:
Query text and patent text are carried out pre-service; Retrieve the patent text relevant, adopt multiple different similarity Calculation Method to obtain the value of different similarities, make up the value of different similarities, recomputate similarity, patent text is sorted by the value of new similarity with query text; Adopt multiple different decision-making technique, the sequencing of similarity of patent text is become the difference ordering of patent classification correlativity, a plurality of different patent classification relevance ranking results are integrated, rearrangement obtains new patent classification relevance ranking; From new patent classification relevance ranking, select and the maximally related patent classification of query text.
As shown in Figure 2, describedly query text and patent text are carried out pre-service may further comprise the steps:
A) removing in the patent text is not the label of patent text, extracts patent text information, the number of patenting, patent IPC classification mark, patent name, specification digest, claims and instructions; Remove inner non-letter of word or non-Chinese symbol in the patent text information of acquisition, for example: '-', ', ', ' (', ') ' etc.; English text is kept all Caps word; Remove the word that contains numeral; Remove stop word, for example: in the English patent, " claim ", " said " etc., in the Chinese patent, " step ", " feature " etc. and preposition, adverbial word, article etc.; English text is carried out the morphological pattern reduction handle, obtain feature candidate vocabulary;
B) feature candidate vocabulary is added up, obtained the classification frequency information of word frequency, document frequency, speech;
C) selected characteristic vocabulary from the feature candidate word, the feature weight of each feature speech in the calculated characteristics vocabulary is converted into computable vector according to feature speech and feature weight thereof with patent text and query text.
D) with the feature speech of patent as index terms, be that patent file and patent text vector makes up the inverted index document storage.
As shown in Figure 3, multiple different calculation of similarity degree method may further comprise the steps:
In the patent text storehouse, find the patent text that co-occurrence feature speech is arranged with query text, constitute relevant patent text set.
Calculate the relevant patent in the relevant patent text set and the similarity of query text, adopted multiple similarity Calculation Method in the present embodiment, wherein directed quantity cosine method, BM25 method, SMART method specifically are calculated as follows:
1. the computing method of vectorial cosine
Represent query text with vector space model
And patent text
, the cosine computing formula of two vectors:
2.BM25 computing method
BM25 has a lot of mutation, and BM25 computing method formula is as follows in the present embodiment:
Wherein n represents query text
Feature speech number; F (t
i, D
2) be feature speech t
iAt patent text
The middle number of times that occurs;
The expression patent text
Text size; Avgdl is the average length that the patent text relevant with query text gathered Chinese version; k
1With b be free parameter, in the present embodiment, k
1Value is 2.0, and the b value is 0.75; IDF (t
i) be the inverse of document frequency, be term t
iWeight, computing formula is as follows:
Wherein N is the total number of documents on the whole data set, n (t
i) be meant and comprise term t
iNumber of files.
3.SMART computing method
SMART algorithm computation formula is as follows:
The query text vector
In the weight w of every dimensional feature
iThe employing following formula calculates:
The patent text vector
In the weight w of every dimensional feature
iThe employing following formula calculates:
Wherein T represents query text
With patent text
The feature set of words of common appearance; Tf
iIt is the word frequency of i feature speech in the text vector; N is whole patent text set Chinese version numbers, and n is meant the patent text number that i feature occur; Avtf is the average word frequency of feature speech document in relevant patent text set; Utf is the patent text vector
In feature speech number; Pivot is the average characteristics speech number of each document in whole patent text set.
Calculate the similarity value of different query text and patent text respectively with three kinds of methods.
The different similarity value that obtains through above-mentioned each computing method is carried out normalized, obtain the similarity value between 0 to 1.
Similarity values different after the normalization is taken the logarithm respectively.
With the feature of the different similarity values after taking the logarithm as the Log-linear model, computing formula is as follows:
Wherein,
It is query text
And patent text
The vector that the similarity value that adopts different similarity calculating methods to obtain is formed as feature,
Be the weight vectors that adopts the similarity value that different similarity calculating methods obtain, n is the patent text sum relevant with query text,
Represent k relevant patent text vector.
As shown in Figure 4, adopt multiple different patent classification decision methods, calculate the relevance ranking between query text and the patent classification different patent text sequencing of similarity results.In the present embodiment, the patent classification decision methods of employing has: similarity add and method, patent text similarity position weight add with method and patent classification weight and add and method, its computing method are as follows:
Similarity add and method, calculate as follows as formula:
Wherein x represents the classification of IPC, and k represents the patent text number of the candidate among the patent text sequencing of similarity result,
Represent the similarity value of i candidate's patent text.(x i) judges patent text d to role
iWhether belong to patent classification x.
2. patent classification weight adds and method, and computing formula is as follows:
Wherein, k
rBe the penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result, c
iBe meant the position that the affiliated patent classification of candidate's patent text i obtains according to sequencing of similarity,
Be query text and patent text d
iThe similarity value, ICF is meant the inverse of classification text frequency, wherein C
xBe meant the textual data under the classification x, N is total textual data, and score (x) is the value of the correlativity of query text and patent classification x.(x i) judges patent text d to role
iWhether belong to patent classification x.
3. patent text similarity position weight adds and method, and computing formula is as follows:
Wherein, k
iBe a penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result,
Be query text and patent text d
iThe similarity value.(x i) judges patent text d to role
iWhether belong to patent classification x.
A plurality of different patent classification relevance ranking results 1~3 are made up, the classification ranking results is resequenced.Array mode has multiple, and the combined method of Cai Yonging has following two kinds in the present embodiment:
With the patent classification relevance ranking result after multiple different similarity values and the multiple different classes of decision methods combination, as the feature of patent classification position, based on of the combination of Rank-SVM model to a plurality of patent classification relevance ranking results.
Employing is according in a plurality of different patent classification correlation results, the positional value that classification occurs add and, calculate the value of new patent classification correlativity.
Obtain the similarity value of query text and patent text by above-mentioned steps, sort, select maximally related patent classification with query text according to this similarity value.
Method of the present invention is not limited to the embodiment described in collective's implementation method, as if those skilled in the art's just scheme according to the present invention draws other embodiment, belongs to technological innovation scope of the present invention equally.
Claims (7)
1. document retrieval method towards patent field may further comprise the steps:
Query text and patent text are carried out pre-service;
Retrieve the patent text relevant, adopt multiple different similarity Calculation Method to obtain the value of different similarities, make up the value of different similarities, recomputate similarity, patent text is sorted by the value of new similarity with query text;
Adopt multiple different decision-making technique, the sequencing of similarity of patent text is become the difference ordering of patent classification correlativity; A plurality of different patent classification relevance ranking results are integrated, and rearrangement obtains new patent classification relevance ranking;
From new patent classification relevance ranking, select and the maximally related patent classification of query text.
2. a kind of document retrieval method as claimed in claim 1 towards patent field, it is characterized in that: the disposal route of text is comprised pre-service to text, obtain the candidate of feature speech, statistical nature speech data message, adopt the method selected characteristic of Feature Selection, text is converted into the vector representation form, is specially:
Removing in the patent text is not the label of patent text, extracts patent text information, the number of patenting, patent IPC classification mark, patent name, specification digest, claims, instructions; English text is kept all Caps word; Remove the word that contains numeral; Remove stop word; English text is carried out the morphological pattern reduction handle, obtain feature candidate vocabulary;
Feature candidate vocabulary is added up, obtained the classification frequency information of word frequency, document frequency, speech;
Selected characteristic vocabulary from the feature candidate word, the feature weight of each feature speech in the calculated characteristics vocabulary is converted into computable vector according to feature speech and feature weight thereof with patent text and query text.
3. a kind of document retrieval method as claimed in claim 1 towards patent field, it is characterized in that: described multiple different calculation of similarity degree methods obtain the similarity value of query text and patent text, based on the above-mentioned multiple different similarity value of Log-linear model integration, computing formula is as follows:
Wherein,
It is query text
And patent text
The vector that the similarity value that adopts different similarity calculating methods to obtain is formed as feature,
Be the weight vectors that adopts the similarity value that different similarity calculating methods obtain, n is the patent text sum relevant with query text,
Represent k relevant patent text vector.
4. a kind of according to claim 1 document retrieval method towards patent field, it is characterized in that: described multiple different decision-making technique, the similarity that comprises patent classification weight adds similarity with method, patent text sequencing of similarity position weight and adds with method and patent text similarity and add and method, and wherein the similarity of patent classification weight adds with computing formula as follows:
Wherein, k
rBe the penalty factor constant, k represents the patent text number of the candidate among the patent text sequencing of similarity result, c
iBe meant the position that the affiliated patent classification of candidate's patent text i obtains according to sequencing of similarity,
Be query text and patent text d
iThe similarity value, ICF is meant the inverse of classification text frequency, wherein C
xBe meant the textual data under the classification x, the textual data that N is total, score (x) is the value of the correlativity of query text and patent classification x, (x i) judges whether patent text di belongs to patent classification x to role.
6. a kind of according to claim 1 document retrieval method towards patent field, it is characterized in that: described a plurality of different patent classification relevance ranking results are integrated, be the patent classification relevance ranking result who adopts after multiple different similarity values and multiple different classes of decision methods make up, as the feature of patent classification position, based on of the combination of Rank-SVM model to a plurality of patent classification relevance ranking results.
7. a kind of according to claim 1 document retrieval method towards patent field, it is characterized in that: described a plurality of different patent classification relevance ranking results are integrated, be to adopt according in a plurality of different patent classification correlation results, the positional value that classification occurs add and, calculate the value of new patent classification correlativity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810012248A CN101625680B (en) | 2008-07-09 | 2008-07-09 | Document retrieval method in patent field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810012248A CN101625680B (en) | 2008-07-09 | 2008-07-09 | Document retrieval method in patent field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101625680A true CN101625680A (en) | 2010-01-13 |
CN101625680B CN101625680B (en) | 2012-08-29 |
Family
ID=41521531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810012248A Active CN101625680B (en) | 2008-07-09 | 2008-07-09 | Document retrieval method in patent field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101625680B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102768679A (en) * | 2012-06-25 | 2012-11-07 | 深圳市汉络计算机技术有限公司 | Searching method and searching system |
CN102792262A (en) * | 2010-02-03 | 2012-11-21 | 汤姆森路透社全球资源公司 | Method and system for ranking intellectual property documents using claim analysis |
CN103455609A (en) * | 2013-09-05 | 2013-12-18 | 江苏大学 | New kernel function Luke kernel-based patent document similarity detection method |
CN103577462A (en) * | 2012-08-02 | 2014-02-12 | 北京百度网讯科技有限公司 | Document classification method and document classification device |
CN104778276A (en) * | 2015-04-29 | 2015-07-15 | 北京航空航天大学 | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) |
CN107153689A (en) * | 2017-04-29 | 2017-09-12 | 安徽富驰信息技术有限公司 | A kind of case search method based on Topic Similarity |
CN107193814A (en) * | 2016-03-14 | 2017-09-22 | 北京京东尚科信息技术有限公司 | The method and apparatus that the automatic taxonomic revision of books is realized in digital reading |
CN107256275A (en) * | 2011-11-02 | 2017-10-17 | 微软技术许可有限责任公司 | Routing inquiry result |
CN108090047A (en) * | 2018-01-10 | 2018-05-29 | 华南师范大学 | A kind of definite method and apparatus of text similarity |
US10073890B1 (en) | 2015-08-03 | 2018-09-11 | Marca Research & Development International, Llc | Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm |
CN109726401A (en) * | 2019-01-03 | 2019-05-07 | 中国联合网络通信集团有限公司 | A kind of patent portfolios generation method and platform |
CN109960757A (en) * | 2019-02-27 | 2019-07-02 | 北京搜狗科技发展有限公司 | Web search method and device |
CN110334269A (en) * | 2019-07-11 | 2019-10-15 | 中国船舶工业综合技术经济研究院 | A kind of information retrieval method and system |
CN110516062A (en) * | 2019-08-26 | 2019-11-29 | 腾讯科技(深圳)有限公司 | A kind of search processing method and device of document |
CN110633407A (en) * | 2018-06-20 | 2019-12-31 | 百度在线网络技术(北京)有限公司 | Information retrieval method, device, equipment and computer readable medium |
US10540439B2 (en) | 2016-04-15 | 2020-01-21 | Marca Research & Development International, Llc | Systems and methods for identifying evidentiary information |
US10621499B1 (en) | 2015-08-03 | 2020-04-14 | Marca Research & Development International, Llc | Systems and methods for semantic understanding of digital information |
US11281846B2 (en) | 2011-11-02 | 2022-03-22 | Microsoft Technology Licensing, Llc | Inheritance of rules across hierarchical levels |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7664735B2 (en) * | 2004-04-30 | 2010-02-16 | Microsoft Corporation | Method and system for ranking documents of a search result to improve diversity and information richness |
CN100442292C (en) * | 2007-03-22 | 2008-12-10 | 华中科技大学 | Method for indexing and acquiring semantic net information |
-
2008
- 2008-07-09 CN CN200810012248A patent/CN101625680B/en active Active
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102792262B (en) * | 2010-02-03 | 2016-08-10 | 汤姆森路透社全球资源公司 | Use the method and system of claim analysis sequence intellectual property document |
CN102792262A (en) * | 2010-02-03 | 2012-11-21 | 汤姆森路透社全球资源公司 | Method and system for ranking intellectual property documents using claim analysis |
US11281846B2 (en) | 2011-11-02 | 2022-03-22 | Microsoft Technology Licensing, Llc | Inheritance of rules across hierarchical levels |
CN107256275A (en) * | 2011-11-02 | 2017-10-17 | 微软技术许可有限责任公司 | Routing inquiry result |
CN102768679A (en) * | 2012-06-25 | 2012-11-07 | 深圳市汉络计算机技术有限公司 | Searching method and searching system |
CN102768679B (en) * | 2012-06-25 | 2015-04-22 | 深圳市汉络计算机技术有限公司 | Searching method and searching system |
CN103577462B (en) * | 2012-08-02 | 2018-10-16 | 北京百度网讯科技有限公司 | A kind of Document Classification Method and device |
CN103577462A (en) * | 2012-08-02 | 2014-02-12 | 北京百度网讯科技有限公司 | Document classification method and document classification device |
CN103455609B (en) * | 2013-09-05 | 2017-06-16 | 江苏大学 | A kind of patent document similarity detection method based on kernel function Luke cores |
WO2015032301A1 (en) * | 2013-09-05 | 2015-03-12 | 江苏大学 | Method for detecting the similarity of the patent documents on the basis of new kernel function luke kernel |
CN103455609A (en) * | 2013-09-05 | 2013-12-18 | 江苏大学 | New kernel function Luke kernel-based patent document similarity detection method |
CN104778276A (en) * | 2015-04-29 | 2015-07-15 | 北京航空航天大学 | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) |
US10073890B1 (en) | 2015-08-03 | 2018-09-11 | Marca Research & Development International, Llc | Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm |
US10621499B1 (en) | 2015-08-03 | 2020-04-14 | Marca Research & Development International, Llc | Systems and methods for semantic understanding of digital information |
CN107193814A (en) * | 2016-03-14 | 2017-09-22 | 北京京东尚科信息技术有限公司 | The method and apparatus that the automatic taxonomic revision of books is realized in digital reading |
CN107193814B (en) * | 2016-03-14 | 2020-07-31 | 北京京东尚科信息技术有限公司 | Method and device for realizing automatic book sorting in digital reading |
US10540439B2 (en) | 2016-04-15 | 2020-01-21 | Marca Research & Development International, Llc | Systems and methods for identifying evidentiary information |
CN107153689A (en) * | 2017-04-29 | 2017-09-12 | 安徽富驰信息技术有限公司 | A kind of case search method based on Topic Similarity |
CN108090047A (en) * | 2018-01-10 | 2018-05-29 | 华南师范大学 | A kind of definite method and apparatus of text similarity |
CN108090047B (en) * | 2018-01-10 | 2022-05-24 | 华南师范大学 | Text similarity determination method and equipment |
CN110633407A (en) * | 2018-06-20 | 2019-12-31 | 百度在线网络技术(北京)有限公司 | Information retrieval method, device, equipment and computer readable medium |
US11977589B2 (en) | 2018-06-20 | 2024-05-07 | Baidu Online Network Technology (Beijing) Co., Ltd. | Information search method, device, apparatus and computer-readable medium |
CN109726401A (en) * | 2019-01-03 | 2019-05-07 | 中国联合网络通信集团有限公司 | A kind of patent portfolios generation method and platform |
CN109726401B (en) * | 2019-01-03 | 2022-09-23 | 中国联合网络通信集团有限公司 | Patent combination generation method and system |
CN109960757A (en) * | 2019-02-27 | 2019-07-02 | 北京搜狗科技发展有限公司 | Web search method and device |
CN110334269A (en) * | 2019-07-11 | 2019-10-15 | 中国船舶工业综合技术经济研究院 | A kind of information retrieval method and system |
CN110334269B (en) * | 2019-07-11 | 2021-05-07 | 中国船舶工业综合技术经济研究院 | Information retrieval method and system |
CN110516062A (en) * | 2019-08-26 | 2019-11-29 | 腾讯科技(深圳)有限公司 | A kind of search processing method and device of document |
CN110516062B (en) * | 2019-08-26 | 2022-11-04 | 腾讯科技(深圳)有限公司 | Method and device for searching and processing document |
Also Published As
Publication number | Publication date |
---|---|
CN101625680B (en) | 2012-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101625680B (en) | Document retrieval method in patent field | |
CN101430695B (en) | System and method for computing difference affinities of word | |
CN104199857B (en) | A kind of tax document hierarchy classification method based on multi-tag classification | |
Sarkar | Sentence clustering-based summarization of multiple text documents | |
CN106095949A (en) | A kind of digital library's resource individuation recommendation method recommended based on mixing and system | |
Wang et al. | Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications | |
Zaw et al. | Web document clustering using cuckoo search clustering algorithm based on levy flight | |
CN101097570A (en) | Advertisement classification method capable of automatic recognizing classified advertisement type | |
CN106407182A (en) | A method for automatic abstracting for electronic official documents of enterprises | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN109840532A (en) | A kind of law court's class case recommended method based on k-means | |
Landthaler et al. | Extending Full Text Search for Legal Document Collections Using Word Embeddings. | |
CN100511214C (en) | Method and system for abstracting batch single document for document set | |
CN105279264A (en) | Semantic relevancy calculation method of document | |
CN109670014A (en) | A kind of Authors of Science Articles name disambiguation method of rule-based matching and machine learning | |
Wang et al. | Neural related work summarization with a joint context-driven attention mechanism | |
CN104778157A (en) | Multi-document abstract sentence generating method | |
Barla et al. | From ambiguous words to key-concept extraction | |
CN1916904A (en) | Method of abstracting single file based on expansion of file | |
Murthy et al. | A comparative study on term weighting methods for automated telugu text categorization with effective classifiers | |
Zhu et al. | Research on summary sentences extraction oriented to live sports text | |
Wang et al. | Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents. | |
Wang et al. | User intention-based document summarization on heterogeneous sentence networks | |
Kalita et al. | An extractive approach of text summarization of Assamese using WordNet | |
Nghiem et al. | Which one is better: presentation-based or content-based math search? |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |