CN110728143A - Method and equipment for identifying document key sentences - Google Patents

Method and equipment for identifying document key sentences Download PDF

Info

Publication number
CN110728143A
CN110728143A CN201910900141.1A CN201910900141A CN110728143A CN 110728143 A CN110728143 A CN 110728143A CN 201910900141 A CN201910900141 A CN 201910900141A CN 110728143 A CN110728143 A CN 110728143A
Authority
CN
China
Prior art keywords
document
sentence
entries
sentences
importance scores
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910900141.1A
Other languages
Chinese (zh)
Inventor
翟光景
田进太
赵庆平
刘益东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Midu Information Technology Co Ltd
Original Assignee
Shanghai Midu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Midu Information Technology Co Ltd filed Critical Shanghai Midu Information Technology Co Ltd
Priority to CN201910900141.1A priority Critical patent/CN110728143A/en
Publication of CN110728143A publication Critical patent/CN110728143A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application aims to provide a method and equipment for identifying document key sentences. Compared with the prior art, the method and the device have the advantages that the document is subjected to word segmentation processing based on the text content in the document, and a plurality of entries corresponding to the document are obtained; calculating the entry importance score of each entry, and determining M entries with the entry importance scores ranked at the top, wherein M is a preset value; performing sentence splitting processing on the document to obtain a sentence set related to the document; traversing the sentence set, and screening out sentences containing one or more of the M entries; and calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries, and determining one or more sentences with the highest sentence importance scores as document key sentences.

Description

Method and equipment for identifying document key sentences
Technical Field
The application relates to the technical field of computers, in particular to a technology for identifying key sentences of a document.
Background
Since a large amount of document data exists in a public website, for a document, a central sentence capable of representing the document information generally exists, that is, a key sentence in the document, and if the key sentence can be extracted, the document information can be quickly known, which is helpful for sharing or classifying the document, but there is no technology for identifying the key sentence in the document in the prior art.
Disclosure of Invention
The application aims to provide a method and equipment for identifying document key sentences.
According to one aspect of the application, a method for identifying a document key sentence is provided, wherein the method comprises the following steps:
performing word segmentation processing on a document based on the text content in the document to obtain a plurality of entries corresponding to the document;
calculating the entry importance score of each entry, and determining M entries with the entry importance scores ranked at the top, wherein M is a preset value;
performing sentence splitting processing on the document to obtain a sentence set related to the document;
traversing the sentence set, and screening out sentences containing one or more of the M entries;
and calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries, and determining one or more sentences with the highest sentence importance scores as document key sentences.
Further, the performing word segmentation processing on the document based on the text content in the document to obtain a plurality of entries corresponding to the document includes:
acquiring a title and a text of the document;
respectively carrying out word segmentation processing on the text contents of the title and the text of the document to obtain a plurality of title entries and text entries;
wherein the method further comprises:
and adding preset weight to the title entries to calculate entry importance scores of the weighted title entries.
Further, wherein the method further comprises:
performing semantic analysis on the screened sentences, and respectively endowing the screened sentences with preset weight values according to semantic analysis results;
wherein the calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries and the determining one or more sentences with the highest sentence importance scores as the document key sentences comprises:
and calculating the sentence importance scores of the screened sentences based on the importance scores of the M entries and the preset weight values, and determining one or more sentences with the highest sentence importance scores as document key sentences.
Further, wherein the method further comprises:
obtaining a public D document as a basic corpus set, wherein D is a preset value;
performing word segmentation processing on the documents in the basic corpus set to obtain basic entries;
the word segmentation processing of the document based on the text content in the document to obtain a plurality of entries corresponding to the document comprises:
and performing word segmentation processing on the document based on the text content in the document, and acquiring a plurality of entries corresponding to the document based on the basic entries.
Further, the formula for calculating the importance score of each entry is as follows:
fi=tfi,jmultiplying by idfiWherein, in the step (A),
Figure BDA0002211565480000021
where n represents the number of times a term appears in a document, D is the number of base corpora, | { j: ti∈djAnd represents the number of files containing the entry in the basic corpus.
Further, the formula for calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries is as follows:
Fifurther, based on the importance scores of the M entries and the preset weight values, a calculation formula corresponding to the sentence importance scores of the screened sentences is calculated as follows:
Si=Fi+Eiwherein E isiRepresenting the preset weight value of the ith sentence.
According to another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the operations of the method as described above.
According to still another aspect of the present application, there is also provided an apparatus for document key sentence recognition, wherein the apparatus includes:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform operations of the method as previously described.
Compared with the prior art, the method and the device have the advantages that the document is subjected to word segmentation processing based on the text content in the document, and a plurality of entries corresponding to the document are obtained; calculating the entry importance score of each entry, and determining M entries with the entry importance scores ranked at the top, wherein M is a preset value; performing sentence splitting processing on the document to obtain a sentence set related to the document; traversing the sentence set, and screening out sentences containing one or more of the M entries; and calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries, and determining one or more sentences with the highest sentence importance scores as document key sentences.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates a flow diagram of a method for document key sentence identification in accordance with an aspect of the subject application;
FIG. 2 illustrates a flow diagram of a method for word segmentation processing in accordance with a preferred embodiment of the present application.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
To further illustrate the technical means and effects adopted by the present application, the following description clearly and completely describes the technical solution of the present application with reference to the accompanying drawings and preferred embodiments.
FIG. 1 illustrates a flow diagram of a method for document key sentence identification provided in one aspect of the present application. The method is performed at a device 1, the method comprising the steps of:
s11, performing word segmentation processing on the document based on the text content in the document to obtain a plurality of entries corresponding to the document;
s12, calculating the entry importance score of each entry, and determining M entries with the entry importance scores ranked at the top, wherein M is a preset value;
s13, carrying out sentence splitting processing on the document to obtain a sentence set related to the document;
s14 traversing the sentence collection, and screening out sentences containing one or more of the M entries;
s15, based on the entry importance scores of the M entries, calculating the sentence importance scores of the screened sentences, and determining one or more sentences with the highest sentence importance scores as document key sentences.
In this embodiment, in step S11, the device 1 performs word segmentation processing on the document based on the text content in the document, and obtains a plurality of entries corresponding to the document.
In the present application, the device 1 includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. The specific device 1 is not limited in any way in this application.
Specifically, the device 1 obtains all the text contents of the document, and performs word segmentation processing on the document based on the text contents, and a specific word segmentation mode may perform word segmentation based on an existing word segmentation mode, and in addition, word segmentation modes that may appear in the future are also included in the scope of protection of the present application as applicable to the present application, and are included herein by reference.
Fig. 2 shows a flowchart of a method for word segmentation processing according to a preferred embodiment of the present application. In the first step, variable initialization is performed, wherein S1 is a word string to be segmented, S2 is a segmented word string to be output, and the maximum word length MaxLen of the segmented word string is set to control the length of the segmented word string.
Preferably, wherein the step S11 includes: acquiring a title and a text of the document; respectively carrying out word segmentation processing on the text contents of the title and the text of the document to obtain a plurality of title entries and text entries;
wherein the method further comprises: and adding preset weight to the title entries to calculate entry importance scores of the weighted title entries.
In this embodiment, the title and the text of the document may be respectively subjected to word segmentation, and since the segmentation of the title part may be more important for the document, a preset weight is added to the entry obtained after the title is segmented, for example, if the frequency of the title after the word segmentation is n, the preset weight a is added to the entry and then the entry becomes the product of a and n, where a is a number greater than 1, and here, the specific value of a is not specifically limited.
Continuing in this embodiment, in step S12, the device 1 calculates the entry importance score of each entry, and determines M entries with the entry importance scores ranked at the top, where M is a preset value.
Specifically, the device 1 may count the occurrence frequency of each segmented entry, and may represent the entry importance score of the entry by the occurrence frequency, for example, the occurrence frequency of each entry may be directly used as the entry importance score, or a value obtained by normalizing the occurrence frequency may be used as the entry importance score, which is not specifically limited herein. After the entry importance scores of the entries are calculated, M entries with the top rank are selected, wherein M can be preset, and specific numerical values are not limited.
Continuing in this embodiment, in said step S13, device 1 performs sentence segmentation processing on said document, obtaining a sentence set for the document. Specifically, a document may be sentence-divided by punctuation, for example, a sentence may be sentence-divided based on comma or period or other punctuation to obtain a sentence set about the document.
Continuing in this embodiment, in step S14, device 1 traverses the sentence set and screens out sentences containing one or more of the M entries. Specifically, after the document is subjected to sentence dividing processing, each sentence process is matched and searched, whether the sentence contains one or more of the M entries is checked, and if the sentence contains one or more of the M entries, the sentence is screened out.
Continuing in this embodiment, in said step S15, the device 1 calculates the sentence importance scores of the screened sentences based on the term importance scores of said M terms, and determines one or more sentences having the highest sentence importance scores as the document key sentences.
Specifically, the sentence importance score may be based on the sum of the importance scores of the terms, for example, if a sentence includes P terms of M terms, where P is a numerical value smaller than M, the sentence importance scores of the sentence may be obtained by adding the term importance scores of the P terms.
And calculating and sequencing the sentence importance scores of all the sentences, and taking one or more sentences with the highest sentence importance scores as the document key sentences.
Preferably, wherein the method further comprises: s16 (not shown) performing semantic analysis on the screened sentences, and assigning preset weight values to the screened sentences according to the semantic analysis result;
wherein the step S15 includes:
and calculating the sentence importance scores of the screened sentences based on the importance scores of the M entries and the preset weight values, and determining one or more sentences with the highest sentence importance scores as document key sentences.
In this embodiment, semantic analysis is also performed on the screened sentences, for example, whether the sentences contain a principal and a subordinate guest or not is analyzed, and then preset weight values are respectively assigned to the screened sentences according to the semantic analysis result. For example, a weight value is preset for a sentence with a leading and trailing object as Q, a weight value is preset for a sentence with only a leading and trailing object as Y, and a weight value is preset for other cases as 0, where Q > Y, and the specific numerical value is not limited. The above-mentioned assignment of weights is merely exemplary, and other existing or future assignments, as applicable to the present application, are also intended to be included herein by reference.
In this embodiment, the calculation of the sentence importance score may be based on a preset weight value of each sentence in addition to the importance scores of the M entries, for example, the sentence importance score of each sentence may be obtained by adding the importance scores of one or more of the M entries included in each sentence and then multiplying the added importance scores by the weight value, where the weight value is greater than one, or the sentence importance score of each sentence may be obtained by adding the importance scores of one or more of the M entries included in each sentence and then adding the added importance scores to the weight value. The method of calculating the importance score of the sentence is only an example, and other existing or future calculation methods, such as those applicable to the present application, are also included in the scope of the present application, and are hereby incorporated by reference.
Preferably, wherein the method further comprises: s17 (not shown) acquiring a public D document as a basic corpus set, wherein D is a preset value; performing word segmentation processing on the documents in the basic corpus set to obtain basic entries;
wherein the step S11 includes: and performing word segmentation processing on the document based on the text content in the document, and acquiring a plurality of entries corresponding to the document based on the basic entries.
In this embodiment, a basic vocabulary entry library is obtained by obtaining a published document to perform word segmentation, for example, 30 ten thousand news and information of each news website are collected as basic corpus sets, and word segmentation is performed on the basic corpus sets to obtain basic vocabulary entries, that is, a vocabulary entry dictionary is created to facilitate subsequent word segmentation of the document. The specific value of D is not limited, wherein the larger D, the better and the more comprehensive the base vocabulary entry base is constructed. The flow chart of the word segmentation process can be shown in fig. 2.
In an embodiment of the present application, the following steps are included for identifying the key sentences in the document:
the method comprises the following steps: preparing basic corpora, for example, collecting D pieces of news and information of each news website;
step two: the basic corpus is participled, and the processing flow is shown in fig. 2, where the participle result is represented as set W, W { { d { (d)1,(w1,w2,w3,...wn)},{d2,(w1,w2,w3,...wn)}...{dn,(w1,w2,w3,...wn) } where d is equal toiRepresenting a document, wiRepresenting an entry;
step three: setting a document needing keyword identification as X, firstly, performing word segmentation processing on a title and a text of the X according to a graph 2, and recording word segmentation results as:
title segmentation result Wt={(wt1,n1),(wt2,n2),..(wtn,nn)}、
Text word segmentation result Wc={(wc1,n1),(wc2,n2),..(wcn,nn)}
Wherein wiIs an entry, niIs the word frequency of the word.
Step four: to WtThe weight a is increased, that is, the result of the title segmentation after the preset weight a is increased is: wta={(wt1,a*n1),(wt2,a*n2),..(wtn,a*nn)};
Step five: using weighted WtaAnd WcAnd calculating the entry importance score of each entry in the X, wherein the entry importance score formula is as follows:
fi=tfi,jmultiplying by idfiWherein, in the step (A),
Figure BDA0002211565480000081
where n represents the number of times a term appears in a document, D is the number of base corpora, | { j: t is ti∈djAnd represents the number of files containing the entry in the basic corpus.
Determining top with M entries with highest importance scoresmIndividual entry topm={(wt1,f1),(wt2,f2),...(wtm,fm) Where wiIs an entry, fiEntry importance scores for corresponding entries;
step six: performing sentence division processing on the document X according to punctuation marks to obtain a sentence set S, traversing S, and including any one or more top in the sentencemThe entries in the sentence are screened out, and the screened sentence set is marked as St
Step seven: to StEach sentence S in (1)iCalculate its sentence importance score F in the documenti
FiThe sum of the entry importance scores of one or more entries in the M entries contained in the sentence is calculated as Stf={(S1,F1),(S2,F2),...(Sn,Fn)};
Step eight: to StEach sentence S in (1)iAnd performing semantic analysis. SiAll the major-minor guests are set as a weight Q, only the major-minor is set as a weight Y, wherein Q>Y, otherwise the weight is set to 0. Let SiCorresponding weight is EiThen S istIs set as Ste={(S1,Ε1),(S2,Ε2),...(Sn,Εn)};
Step nine: calculating the sentence importance scores of the screened sentences based on the importance scores of the M entries and the preset weight values to obtain: stfe={(S1,Ε1+F1),(S2,Ε2+F2),...(Sn,Εn+Fn) And determining one or more sentences of which the sentence importance scores are the highest as key sentences of the document.
Furthermore, the embodiment of the present application also provides a computer readable medium, on which computer readable instructions are stored, and the computer readable instructions can be executed by a processor to implement the foregoing method.
The embodiment of the present application further provides an apparatus for identifying a document key sentence, where the apparatus includes:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the foregoing method.
For example, the computer readable instructions, when executed, cause the one or more processors to: performing word segmentation processing on a document based on the text content in the document to obtain a plurality of entries corresponding to the document;
calculating the entry importance score of each entry, and determining M entries with the entry importance scores ranked at the top, wherein M is a preset value; performing sentence splitting processing on the document to obtain a sentence set related to the document; traversing the sentence set, and screening out sentences containing one or more of the M entries; and calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries, and determining one or more sentences with the highest sentence importance scores as document key sentences.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (9)

1. A method for document key sentence identification, wherein the method comprises:
performing word segmentation processing on a document based on the text content in the document to obtain a plurality of entries corresponding to the document;
calculating the entry importance score of each entry, and determining M entries with the entry importance scores ranked at the top, wherein M is a preset value;
performing sentence splitting processing on the document to obtain a sentence set related to the document;
traversing the sentence set, and screening out sentences containing one or more of the M entries;
and calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries, and determining one or more sentences with the highest sentence importance scores as document key sentences.
2. The method of claim 1, wherein the performing word segmentation processing on the document based on the text content in the document to obtain a plurality of entries corresponding to the document comprises:
acquiring a title and a text of the document;
respectively carrying out word segmentation processing on the text contents of the title and the text of the document to obtain a plurality of title entries and text entries;
wherein the method further comprises:
and adding preset weight to the title entries to calculate entry importance scores of the weighted title entries.
3. The method according to claim 1 or 2, wherein the method further comprises:
performing semantic analysis on the screened sentences, and respectively endowing the screened sentences with preset weight values according to semantic analysis results;
wherein the calculating the sentence importance scores of the screened sentences based on the entry importance scores of the M entries and the determining one or more sentences with the highest sentence importance scores as the document key sentences comprises:
and calculating the sentence importance scores of the screened sentences based on the importance scores of the M entries and the preset weight values, and determining one or more sentences with the highest sentence importance scores as document key sentences.
4. The method of any of claims 1-3, wherein the method further comprises:
obtaining a public D document as a basic corpus set, wherein D is a preset value;
performing word segmentation processing on the documents in the basic corpus set to obtain basic entries;
the word segmentation processing of the document based on the text content in the document to obtain a plurality of entries corresponding to the document comprises:
and performing word segmentation processing on the document based on the text content in the document, and acquiring a plurality of entries corresponding to the document based on the basic entries.
5. The method of claim 4, wherein the formula for calculating the importance score of each entry is:
fi=tfi,jmultiplying by idfiWherein, in the step (A),
Figure FDA0002211565470000021
where n represents the number of times a term appears in a document, D is the number of basic corpora, | { j: t is ti∈djAnd represents the number of files containing the entry in the basic corpus.
6. The method of claim 5, wherein the formula for calculating the sentence importance scores of the screened sentences based on the term importance scores of the M terms is:
Fithe term importance score is the sum of the term importance scores of one or more of the M terms contained in the sentence.
7. The method of claim 6, wherein based on the importance scores of the M entries and the preset weight values, the formula for calculating the sentence importance scores of the screened sentences is:
Si=Fi+Eiwherein E isiRepresenting the preset weight value of the ith sentence.
8. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 7.
9. An apparatus for document key sentence recognition, wherein the apparatus comprises:
one or more processors; and
a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 7.
CN201910900141.1A 2019-09-23 2019-09-23 Method and equipment for identifying document key sentences Pending CN110728143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910900141.1A CN110728143A (en) 2019-09-23 2019-09-23 Method and equipment for identifying document key sentences

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910900141.1A CN110728143A (en) 2019-09-23 2019-09-23 Method and equipment for identifying document key sentences

Publications (1)

Publication Number Publication Date
CN110728143A true CN110728143A (en) 2020-01-24

Family

ID=69218277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910900141.1A Pending CN110728143A (en) 2019-09-23 2019-09-23 Method and equipment for identifying document key sentences

Country Status (1)

Country Link
CN (1) CN110728143A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446940A (en) * 2007-11-27 2009-06-03 北京大学 Method and device of automatically generating a summary for document set
CN104361081A (en) * 2014-11-13 2015-02-18 河海大学 WEB document-based automatic abstracting method
CN107133213A (en) * 2017-05-06 2017-09-05 广东药科大学 A kind of text snippet extraction method and system based on algorithm
US20170277679A1 (en) * 2016-03-23 2017-09-28 Kabushiki Kaisha Toshiba Information processing device, information processing method, and computer program product
CN109472021A (en) * 2018-10-12 2019-03-15 北京诺道认知医学科技有限公司 Critical sentence screening technique and device in medical literature based on deep learning
CN110134792A (en) * 2019-05-22 2019-08-16 北京金山数字娱乐科技有限公司 Text recognition method, device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446940A (en) * 2007-11-27 2009-06-03 北京大学 Method and device of automatically generating a summary for document set
CN104361081A (en) * 2014-11-13 2015-02-18 河海大学 WEB document-based automatic abstracting method
US20170277679A1 (en) * 2016-03-23 2017-09-28 Kabushiki Kaisha Toshiba Information processing device, information processing method, and computer program product
CN107133213A (en) * 2017-05-06 2017-09-05 广东药科大学 A kind of text snippet extraction method and system based on algorithm
CN109472021A (en) * 2018-10-12 2019-03-15 北京诺道认知医学科技有限公司 Critical sentence screening technique and device in medical literature based on deep learning
CN110134792A (en) * 2019-05-22 2019-08-16 北京金山数字娱乐科技有限公司 Text recognition method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11514235B2 (en) Information extraction from open-ended schema-less tables
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
JP7289047B2 (en) Method, computer program and system for block-based document metadata extraction
CN106202124B (en) Webpage classification method and device
US9256649B2 (en) Method and system of filtering and recommending documents
US10740377B2 (en) Identifying categories within textual data
US20230177360A1 (en) Surfacing unique facts for entities
CN110162778B (en) Text abstract generation method and device
CN110019669B (en) Text retrieval method and device
CN110196910B (en) Corpus classification method and apparatus
CN109635157A (en) Model generating method, video searching method, device, terminal and storage medium
CN110750627A (en) Material retrieval method and device, electronic equipment and storage medium
CN112434533A (en) Entity disambiguation method, apparatus, electronic device, and computer-readable storage medium
CN109344397B (en) Text feature word extraction method and device, storage medium and program product
CN110955845A (en) User interest identification method and device, and search result processing method and device
CN112270177A (en) News cover mapping method and device based on content similarity and computing equipment
CN110728143A (en) Method and equipment for identifying document key sentences
WO2014049310A2 (en) Method and apparatuses for interactive searching of electronic documents
CN111708891B (en) Food material entity linking method and device between multi-source food material data
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph
CN110968691B (en) Judicial hotspot determination method and device
CN111723286A (en) Data processing method and device
CN111159331A (en) Text query method, text query device and computer storage medium
CN113515940B (en) Method and equipment for text search
CN113656574B (en) Method, computing device and storage medium for search result ranking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200124