CN108363682A - A kind of target text display methods and device - Google Patents
A kind of target text display methods and device Download PDFInfo
- Publication number
- CN108363682A CN108363682A CN201810142223.XA CN201810142223A CN108363682A CN 108363682 A CN108363682 A CN 108363682A CN 201810142223 A CN201810142223 A CN 201810142223A CN 108363682 A CN108363682 A CN 108363682A
- Authority
- CN
- China
- Prior art keywords
- participle
- paragraph
- document
- abstract
- corresponding document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/106—Display of layout of documents; Previewing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3325—Reformulation based on results of preceding query
- G06F16/3326—Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
- G06F16/3328—Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages using graphical result space presentation or visualisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application discloses a kind of target text display methods and device, includes to required text, this method for user's quick obtaining from magnanimity document:Inverted index is pre-established, inverted index includes the corresponding document identification of each participle and paragraph mark;Query word input by user is obtained, query word includes the first participle;According to inverted index, the corresponding document identification of the inquiry first participle and/or paragraph mark determine the corresponding document of the first participle and/or paragraph according to the corresponding document identification of the first participle and/or paragraph mark;The corresponding document of the first participle and/or paragraph are ranked up, the corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and shown according to ranking results;The document abstract or the corresponding document file page of paragraph abstract are sent to terminal for the display request that either paragraph is made a summary of the corresponding documentation summary of any first participle getting, so that terminal loads show the document page.
Description
Technical field
This application involves Internet technical fields, and in particular to a kind of target text display methods and device.
Background technology
With the development of Internet technology, user can face the reading of bulk information, document.Under normal conditions, Yong Huhui
It leaies through page by page in sequence document, completes quickly scanning and read.But when user needs in reading documents in a certain focus
Rong Shi, it is difficult to be quickly obtained required information.For example, financial analyst is in listed company's year report in face of many several louvers
When announcement, when such as leafing through each document page by page to find a certain focus, it can take a significant amount of time, and easily omit related content.
Therefore, user's Fast Reading from magnanimity document how to be allow to be a technical problem to be solved urgently to required content.
Invention content
In view of this, a kind of target text display methods of the embodiment of the present application offer and device, to solve in the prior art
User can not the technical issues of quick obtaining is to required text from magnanimity document.
To solve the above problems, technical solution provided by the embodiments of the present application is as follows:
A kind of target text display methods, pre-establishes inverted index, and the inverted index includes that each participle is corresponding
Document identification and paragraph mark, the method includes:
Query word input by user is obtained, the query word includes the first participle;
According to the inverted index, the corresponding document identification of the first participle and/or paragraph mark are inquired, according to described
The corresponding document identification of the first participle and/or paragraph mark determine the corresponding document of the first participle and/or paragraph;
The corresponding document of the first participle and/or paragraph are ranked up, according to ranking results by the first participle
Corresponding documentation summary and/or paragraph abstract are sent to terminal order and show;
The display request for being directed to the corresponding documentation summary of any first participle or paragraph abstract is being got, to institute
It states terminal and sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page
Face.
Optionally, the method further includes:
Determine the corresponding related term of the query word, sending the related term to the terminal is shown, the correlation
Word includes the second participle;
According to the inverted index, the corresponding document identification of inquiry second participle and/or paragraph mark, according to described
The corresponding document identification of second participle and/or paragraph mark determine the corresponding document of second participle and/or paragraph;
The corresponding document of second participle and/or paragraph are ranked up;
The inquiry request for any related term is obtained, the corresponding document of the second participle which includes is plucked
It wants and/or paragraph abstract is sent to terminal order according to ranking results and shows;
The display request for segmenting corresponding documentation summary or paragraph abstract for any described second is being got, to institute
It states terminal and sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page
Face.
Optionally, the corresponding related term of the determination query word, including:
The feature vector of the first participle and the feature vector of other each participles are determined according to participle characteristic model;
Calculate the similarity between the feature vector of the first participle and the feature vector of other each participles;
The participle that the similarity of feature vector with the first participle is met to preset condition is determined as related term.
Optionally, the feature vector and other each participles that the first participle is determined according to participle characteristic model
Feature vector, including:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document,
Using the initial characteristics vector segmented in the participle priority preset range as the input of the participle characteristic model, to the participle
Characteristic model is trained, and after the participle characteristic model reaches the condition of convergence, obtains the feature vector of the first participle
And the feature vector of other each participles, the participle characteristic model are neural network model.
Optionally, the method further includes:
It is recorded according to historical query word, determines that the corresponding predicted query word of the query word, the predicted query word include
Third segments;
According to the inverted index, inquires the third and segment corresponding document identification and/or paragraph mark, according to described
Third segments corresponding document identification and/or paragraph mark determines that the third segments corresponding document and/or paragraph;
Corresponding document is segmented to the third and/or paragraph is ranked up;
The inquiry request for any predicted query word is obtained, the third for including by the predicted query word, which segments, to be corresponded to
Documentation summary and/or paragraph abstract be sent to terminal order according to ranking results and show;
The display request that corresponding documentation summary or paragraph abstract are segmented for any third is being got, to institute
It states terminal and sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page
Face.
Optionally, described that the corresponding document of the first participle and/or paragraph are ranked up, including:
According to the Doctype of the corresponding document of the first participle, the first participle in every corresponding document
Appearance ratio in every corresponding document of occurrence number, the first participle, the first participle are in every corresponding text
It is one or more in the distance of appearance position, each first participle in every corresponding document in shelves, to described
The corresponding document of the first participle is ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle each corresponding
Appearance position in each corresponding paragraph of appearance ratio, the first participle in paragraph, each first participle exist
It is one or more in distance in each corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
A kind of target text display device, described device include:
Unit is established, for pre-establishing inverted index, the inverted index includes the corresponding document identification of each participle
And paragraph mark;
First acquisition unit, for obtaining query word input by user, the query word includes the first participle;
First query unit, for according to the inverted index, inquire the corresponding document identification of the first participle and/or
Paragraph identifies, and the corresponding text of the first participle is determined according to the corresponding document identification of the first participle and/or paragraph mark
Shelves and/or paragraph;
First sequencing unit is tied for being ranked up to the corresponding document of the first participle and/or paragraph according to sequence
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and shown by fruit;
First transmission unit, for being plucked for the corresponding documentation summary of any first participle or paragraph getting
The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal to the terminal
Loaded and displayed the document page.
Optionally, described device further includes:
First determination unit sends the related term for determining the corresponding related term of the query word to the terminal
It is shown, the related term includes the second participle;
Second query unit, for according to the inverted index, the corresponding document identification of inquiry second participle and/or
Paragraph identifies, and segments corresponding document identification according to described second and/or paragraph mark determines the corresponding text of second participle
Shelves and/or paragraph;
Second sequencing unit, for being ranked up to the corresponding document of second participle and/or paragraph;
Second acquisition unit, for obtaining the inquiry request for being directed to any related term, include by the related term the
The corresponding documentation summary of two participles and/or paragraph abstract are sent to terminal order according to ranking results and show;
Second transmission unit, for being plucked for any second corresponding documentation summary of participle or paragraph getting
The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal to the terminal
Loaded and displayed the document page.
Optionally, first determination unit includes:
First determination subelement, for according to participle characteristic model determine the first participle feature vector and other
The feature vector of each participle;
Computation subunit, between the feature vector and the feature vector of other each participles for calculating the first participle
Similarity;
Second determination subelement, point for the similarity of the feature vector with the first participle to be met to preset condition
Word is determined as related term.
Optionally, first determination subelement is specifically used for:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document,
Using the initial characteristics vector segmented in the participle priority preset range as the input of the participle characteristic model, to the participle
Characteristic model is trained, and after the participle characteristic model reaches the condition of convergence, obtains the feature vector of the first participle
And the feature vector of other each participles, the participle characteristic model are neural network model.
Optionally, described device further includes:
Second determination unit determines the corresponding predicted query word of the query word, institute for being recorded according to historical query word
It includes third participle to state predicted query word;
Third query unit, for according to the inverted index, inquire the third segment corresponding document identification and/or
Paragraph identifies, and segments corresponding document identification according to the third and/or paragraph mark determines that the third segments corresponding text
Shelves and/or paragraph;
Third sequencing unit, for being ranked up to the corresponding document of third participle and/or paragraph;
Third acquiring unit, for obtaining the inquiry request for any predicted query word, by the predicted query word
Including third segment corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows;
Third transmission unit, for segmenting corresponding documentation summary for any third or paragraph is plucked getting
The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal to the terminal
Loaded and displayed the document page.
Optionally, first sequencing unit is specifically used for:
According to the Doctype of the corresponding document of the first participle, the first participle in every corresponding document
Appearance ratio in every corresponding document of occurrence number, the first participle, the first participle are in every corresponding text
It is one or more in the distance of appearance position, each first participle in every corresponding document in shelves, to described
The corresponding document of the first participle is ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle each corresponding
Appearance position in each corresponding paragraph of appearance ratio, the first participle in paragraph, each first participle exist
It is one or more in distance in each corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
It can be seen that the embodiment of the present application has the advantages that:
The embodiment of the present application can be in advance by the document identification of document and institute at each participle in each document
Paragraph identification record in paragraph gets off, and establishes inverted index, when user requires to look up certain focus, can input and look into
Word is ask, which includes one or more first participles.It, can be according to the row's of the falling rope pre-established after obtaining query word
Draw, quickly finds the corresponding document identification of the first participle and/or paragraph mark, it is corresponding the first participle may further to be obtained
Document and/or paragraph are ranked up the corresponding document of the first participle and/or paragraph according to the relevance with query word, according to
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to the terminal of user and shown by ranking results, then user
Can trigger any documentation summary either paragraph abstract to by the document abstract or paragraph make a summary corresponding document file page into
Row display, the reading time of user can be substantial saved, is carried with a certain content in the multiple documents of fast browsing by realizing user
The high reading experience of user.
Description of the drawings
Fig. 1 is application scenarios schematic diagram provided by the embodiments of the present application;
Fig. 2 is a kind of flow chart of target text display methods embodiment provided by the embodiments of the present application;
Fig. 3 is the display result schematic diagram of target text display methods provided by the embodiments of the present application;
Fig. 4 is the flow chart of another target text display methods embodiment provided by the embodiments of the present application;
Fig. 5 is the flow chart of another target text display methods embodiment provided by the embodiments of the present application;
Fig. 6 is a kind of schematic diagram of target text display device embodiment provided by the embodiments of the present application.
Specific implementation mode
In order to make the above objects, features, and advantages of the present application more apparent, below in conjunction with the accompanying drawings and it is specific real
Mode is applied to be described in further detail the embodiment of the present application.
It is shown in Figure 1, show the block schematic illustration of an exemplary application scene of the embodiment of the present application.Wherein, it uses
Family can be with 10 input inquiry word of using terminal, which includes the first participle;Server 20 can obtain the query word, root
According to the inverted index pre-established, the corresponding document identification of the inquiry first participle and/or paragraph mark further determine that first point
The corresponding document of word and/or paragraph;Then the corresponding document of the first participle and/or paragraph are ranked up, according to ranking results
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal 10 and sequentially shown;User can pass through terminal 10
The documentation summary or paragraph abstract of any display are triggered, server 20 is being got for the corresponding document of any first participle
Can make a summary corresponding document after the abstract display request that either paragraph is made a summary to the transmission the document abstract of terminal 10 or paragraph
The page, terminal 10 can be with loaded and displayed the document pages, to realize user to the fast of the corresponding document of query word or paragraph
Speed is read.
It will be understood by those skilled in the art that block schematic illustration shown in FIG. 1 be only presently filed embodiment can be
An example being wherein achieved.The scope of application of embodiment of the present invention is not limited by any aspect of the frame.
It should be noted that the terminal 10 in the embodiment of the present application can be existing, researching and developing or research and develop in the future
, can by it is any type of wiredly and/or wirelessly connection (for example, Wi-Fi, LAN, honeycomb, coaxial cable etc.) realize with
Any user equipment that server 20 interacts, including but not limited to:Existing, researching and developing or research and development in the future intelligent hands
Machine, non-smart mobile phone, tablet computer, laptop PC, desktop personal computer, minicomputer, medium-sized calculating
Machine, mainframe computer etc..It is also to be noted that in the embodiment of the present application server 20 can it is existing, researching and developing
An or example of research and development, application service that information recommendation can be provided a user equipment in the future.The embodiment party of the application
Formula is unrestricted in this regard.
Target text display methods provided by the embodiments of the present application is described in detail below with reference to attached drawing.
It is shown in Figure 2, a kind of stream of the target text display methods embodiment provided in the embodiment of the present application is provided
Cheng Tu, this method can be applied to server.
In the embodiment of the present application, it before each step of performance objective text display method embodiment, can build in advance
Vertical inverted index, inverted index include the corresponding document identification of each participle and paragraph mark.
After obtaining magnanimity document, each piece document is segmented first, the participle repeated in a document can be by
It is considered a participle, the paragraph of the document identification of document and residing paragraph mark residing for each participle is recorded, from
And it includes the corresponding document identification of each participle and the inverted index that paragraph identifies to establish.Document identification can be that document corresponds to
Serial number either other identifier paragraph mark can be serial number or other identifier of the paragraph in residing document, the application is real
Example is applied to the form of document identification and paragraph mark without limiting.Inverted index is for example, participle 1 corresponds to document 001, text
Shelves 002, participle 1 correspond to 020 section, 021 section, 005 section, 007 section etc. in document 002 in document 001.
It is understood that with the continuous renewal of document, the content of inverted index also updates therewith, that is, establishes the row's of falling rope
The process drawn may include newly-built and inverted index is established in update.
Target text display methods embodiment provided in this embodiment may comprise steps of:
Step 201:Query word input by user is obtained, query word includes the first participle.
User can may include one or more first participles with input inquiry word, the query word by terminal, first point
Word is a participle in inverted index, and server can get the query word.When query word only includes a first participle
When, then query word includes the first participle is query word itself, such as query word is " patent ", the query word include first
Participle is " patent ";When query word includes multiple first participles, query word can be segmented, determine that query word is wrapped
The multiple first participles included, such as query word are " patent drafting ", then are segmented to query word, it may be determined that the query word packet
The first participle included is " patent " and " writing ".
Step 202:According to inverted index, the corresponding document identification of the inquiry first participle and/or paragraph mark, according to first
It segments corresponding document identification and/or paragraph mark determines the corresponding document of the first participle and/or paragraph.
According to the above description, after determining the first participle, can inquire inverted index obtain each first participle institute it is right
Document identification and/or the paragraph mark answered, may further determine the corresponding document of the first participle and/or paragraph.It can
The paragraph residing for the document and/or the first participle residing for the first participle is found in magnanimity document.
Step 203:The corresponding document of the first participle and/or paragraph are ranked up, according to ranking results by the first participle
Corresponding documentation summary and/or paragraph abstract are sent to terminal order and show.
Since the quantity of the corresponding document of the first participle and paragraph may be very much, needed when showing for user
First the corresponding document of the first participle and/or the corresponding paragraph of the first participle are ranked up, the sequence and the first participle and text
The correlation of shelves or paragraph is related, i.e. the correlation of the first participle and document is higher, then the first participle corresponding document
Sequence is higher, and the correlation of the first participle and paragraph is higher, then the sequence of the corresponding paragraph of the first participle is higher.
It is understood that it is more compared with document is usual with the text that paragraph includes, therefore to the corresponding text of the first participle
It, can be first only suitable according to sorting by the corresponding documentation summary of the first participle and/or paragraph abstract after shelves and/or paragraph are ranked up
Sequence is sent to terminal and is shown, so that user first can substantially understand the document and/or paragraph for being related to query word, then selects sense
The content of interest is further read, and user's browsing document required time is greatly reduced.
Documentation summary can be certain emphasis paragraph etc. in Document Title, documentation summary content, document, and paragraph abstract can be
Certain emphasis sentence in the content of text, paragraph of preset length, is also likely to be in paragraph when paragraph short period of time falls abstract in paragraph
Appearance itself.The embodiment of the present application is to the form of documentation summary and paragraph abstract without limiting.
The specific implementation being ranked up to the corresponding document of the first participle and/or paragraph is described in detail again below, can wrap
It includes:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the first participle, the first participle
Number, appearance position of appearance ratio, the first participle of the first participle in every corresponding document in every corresponding document,
It is one or more in distance of each first participle in every corresponding document, the corresponding document of the first participle is arranged
Sequence;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph
Appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle are in each corresponding paragraph
Distance in it is one or more, the corresponding paragraph of the first participle is ranked up.
It in the embodiment of the present application, can be to the first participle when needing to be ranked up the corresponding document of the first participle
Corresponding each piece document carries out scoring with the degree of correlation of the first participle respectively, according to appraisal result each piece corresponding to the first participle
Document is ranked up.
One or more of the following items can include but is not limited to the rule that document scores:
(1) can be that different bonus point values, such as the text of technical paper type is respectively set in different types of document in advance
Higher bonus point value, lower bonus point value of document setup of Internet news type etc. is arranged in shelves, then the corresponding text of the first participle
The Doctype of shelves can influence the sequence of document.
(2) occurrence number that can be according to the first participle in every corresponding document is the different bonus point of document setup
Value, such as occurrence number of the same first participle in first document are more than the occurrence number in second document, then right
The bonus point value of corresponding first document of the first participle is higher than the bonus point value of corresponding second document of the first participle, represent this
One document and the first participle are more relevant.
(3) can be the different bonus point of document setup according to appearance ratio of the first participle in every corresponding document
Value, such as occurrence number of the same first participle in first document are identical as the occurrence number in second document, together
When first document length be much smaller than the length of second document, i.e. appearance ratio of the first participle in first document
Higher than the appearance ratio in second document, then the bonus point value of first document corresponding to the first participle is higher than the first participle
The bonus point value of corresponding second document, represents first document and the first participle is more relevant.
(4) can be the different bonus point of document setup according to appearance position of the first participle in every corresponding document
Value, such as first section in document or endpiece can more represent document content, if the first participle occur first section in a document or
Person's endpiece can be that higher bonus point value is arranged in the document, and other paragraphs in a document occurs in the first participle, can be this article
Lower bonus point value is arranged in shelves.
It (5), can also be according to multiple first participles between every document when query word includes multiple first participles
Distance is the different bonus point value of document setup.If such as query word include two first participles, the two first participles be connected
Occur in a document, representing the query word and integrally occurring in a document, then document and the query word degree of correlation are higher, can be this article
The higher bonus point value of shelves setting, if farther out, can be that the document setting is lower adds distance the different first participles in a document
Score value.
Similar, it, can be each section corresponding to the first participle when needing to be ranked up the corresponding paragraph of the first participle
It falls and carries out scoring with the degree of correlation of the first participle respectively, be ranked up according to appraisal result each paragraph corresponding to the first participle.
One or more of the following items can include but is not limited to the rule that paragraph scores:
(1) different bonus point is arranged for paragraph in occurrence number that can be according to the first participle in each corresponding paragraph
Value, such as the occurrence number of the same first participle in the first paragraph are more than the occurrence number in the second paragraph, then to first
Segment corresponding first paragraph bonus point value be higher than corresponding second paragraph of the first participle bonus point value, represent first paragraph with
The first participle is more relevant.
(2) different bonus point is arranged for paragraph in appearance ratio that can be according to the first participle in each corresponding paragraph
Value, such as the occurrence number of the same first participle in the first paragraph are identical as the occurrence number in the second paragraph, while the
The length of one paragraph is much smaller than the length of the second paragraph, i.e. the appearance ratio ratio of the first participle in the first paragraph is in second segment
Appearance ratio in falling is high, then the bonus point value of the first paragraph corresponding to the first participle is higher than corresponding second paragraph of the first participle
Bonus point value, represent first paragraph and the first participle be more relevant.
(4) different bonus point is arranged for paragraph in appearance position that can be according to the first participle in each corresponding paragraph
Value, such as first sentence in paragraph or tail sentence can more represent paragraph content, if the first participle appear in first sentence in paragraph or
Person's tail sentence can be that higher bonus point value is arranged in the paragraph, and the first participle appears in the other positions in paragraph, can be the section
Fall the lower bonus point value of setting.
It (5), can also be according to multiple first participles between each paragraph when query word includes multiple first participles
Different bonus point values is arranged for paragraph in distance.If such as query word include two first participles, the two first participles be connected
It appears in paragraph, represents the query word and integrally appear in paragraph, then paragraph and the query word degree of correlation are higher, can be the section
The higher bonus point value of setting is fallen, if farther out, can be that paragraph setting is lower adds distance the different first participles in paragraph
Score value.
The corresponding document of the first participle and/or paragraph are ranked up in the above manner, can preferably reflect and look into
Word and the correlation between corresponding document and/or paragraph are ask, so that user is preferentially read more relevant with query word
Content.
Step 204:The display request made a summary for the corresponding documentation summary of any first participle or paragraph is being got,
The document abstract or the corresponding document file page of paragraph abstract are sent to terminal, so that terminal loads show the document page.
User's using terminal can check the corresponding documentation summary of the first participle and/or paragraph abstract, if to certain document
Either paragraph abstract can click the document abstract to abstract or paragraph abstract triggering is directed to the corresponding document of any first participle
The display request of abstract or paragraph abstract.Server getting for the corresponding documentation summary of any first participle or
After the display request of paragraph abstract, the document abstract or the corresponding document file page of paragraph abstract can be sent to terminal.At this
Apply in embodiment, can also paging be carried out to each document in advance.Getting the display request for a certain paragraph abstract
Afterwards, the document file page residing for the paragraph can be obtained and be sent to terminal, so that terminal is only loaded the document page and shown, at this
Other pages in document need not be loaded in the process, the Internet resources of occupancy are greatly reduced, and improve page loading velocity.
After getting the display request for a certain documentation summary, it can be sent out the homepage of the document as the document file page loaded first
Terminal is given, later with the navigation process of user, each document file page of the document is sent to terminal loads page by page.
In this way, the embodiment of the present application can in advance by the document identification of document at each participle in each document with
And the paragraph identification record of present paragraph gets off, and establishes inverted index, it, can be defeated when user requires to look up certain focus
Enter query word, which includes one or more first participles.After obtaining query word, it can be fallen according to what is pre-established
Row's index quickly finds the corresponding document identification of the first participle and/or paragraph mark, may further obtain the first participle pair
The document and/or paragraph answered are ranked up the corresponding document of the first participle and/or paragraph according to the relevance with query word,
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to the terminal of user and shown according to ranking results, then
User can trigger any documentation summary, and either paragraph is made a summary thus by the document abstract or the corresponding documentation page of paragraph abstract
Face is shown that realizing user can be with a certain content in the multiple documents of fast browsing, when substantial saved the reading of user
Between, improve the reading experience of user.
In order to make it easy to understand, it is shown in Figure 3, to the display knot of target text display methods provided by the embodiments of the present application
Fruit illustrates.
, can be corresponding by the first participle after if server is ranked up the corresponding document of the first participle and paragraph
Documentation summary and paragraph abstract are sent to terminal order and show.In the terminal, it can show that row 301 are shown by documentation summary
Documentation summary, such as show the documentation summary of document 1-5 successively in sequence;It can be made a summary by paragraph and show that row 302 are shown
Paragraph is made a summary, and the display of paragraph abstract could be provided as related to document, i.e. user, can be with when triggering a certain documentation summary
Make a summary in paragraph and show that row 302 show that the paragraph of paragraph included by corresponding document is made a summary according to ranking results sequence, for example, when with
When the documentation summary of document 2 is triggered at family, makes a summary in paragraph and show that row 302 show that the paragraph of paragraph included by document 2 is made a summary;In addition
The display of paragraph abstract could be provided as uncorrelated to document, i.e., paragraph between different document is aobvious according to ranking results sequence
Show, while the relationship between paragraph and document can also be identified.For example, user trigger paragraph 3 exercise abstract when, Ke Yiti
Show that user's paragraph 3 belongs to document 2.
After if server is ranked up the corresponding document of the first participle or paragraph, documentation summary can also be only shown
Show that row 301 or paragraph abstract show row 302.
It is made a summary when user needs to read a certain document either a certain paragraph by triggering documentation summary or paragraph, it can
Corresponding document file page is shown with request, to show corresponding document file page in the document file page display area 303 of terminal.
In the embodiment of the present application, user can be made to pass through documentation summary and paragraph abstract substantially understanding query word correspondence
Content can pass through if it should also be understood that detailed information and click a certain documentation summary or paragraph abstract, read pair
The document file page answered, with realizing user's higher efficiency fragment type brose and reading.
It is shown in Figure 4 based on above-described embodiment, show that another target text provided in the embodiment of the present application is aobvious
Show the flow chart of embodiment of the method, can also show query word to user after user has input query word in the present embodiment
Related term, and the corresponding document of related term and/or paragraph are precomputed, it, can be quick when user needs to inquire related term
Related content is sent to terminal, to realize that Fast Reading of the user to target text, the present embodiment may include following step
Suddenly:
Step 401:Determine the corresponding related term of query word, sending related term to terminal is shown, related term includes the
Two participles.
In the embodiment of the present application, user, can also be to the correlation of user terminal recommendation query word after input inquiry word
Word, similar, which may include that one or more second segments, and the second participle is a participle in inverted index.
For example, user input query word is " artificial intelligence ", then related term can be " robot ", " neural network " etc..
Recommend related term to user, user on the one hand can be made to find the content that may further inquire, on the other hand may be used
To remove process input by user from, user can be made directly by clicking inquiry of the related term completion to related term, to improve use
The usage experience at family.
In the embodiment of the present application in some possible realization methods, determine that the process of the corresponding related term of query word can wrap
It includes:
The feature vector of the first participle and the feature vector of other each participles are determined according to participle characteristic model;It calculates
Similarity between the feature vector of the first participle and the feature vector of other each participles;By the feature vector with the first participle
Similarity meet the participle of preset condition and be determined as related term.
Namely the feature vector of the first participle and the feature vector of other each participles can be got first, calculate the
Similarity between the feature vector and the feature vector of other each participles of one participle.In general, between feature vector
Similarity can be characterized with the Euclidean distance between feature vector.Then the similarity of the feature vector with the first participle is expired
The participle of sufficient preset condition is determined as related term, the preset condition can be relevancy ranking within preceding predetermined number, or
The degree of correlation reaches predetermined threshold value etc., and predetermined number, predetermined threshold value may be set according to actual conditions, the application to this without
It limits.
In the embodiment of the present application in some possible realization methods, the feature of the first participle is determined according to participle characteristic model
The specific implementation of feature vectors of vector and other each participles may include:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document,
Using the initial characteristics vector segmented in the participle priority preset range as the input of participle characteristic model, to segmenting characteristic model
It is trained, after participle characteristic model reaches the condition of convergence, obtains the feature vector and other each participles of the first participle
Feature vector, participle characteristic model be neural network model.
In the embodiment of the present application, large volume document can be used as training corpus first, and training corpus is segmented, can be with
Understand that each participle has word order relationship, such as language material " patent drafting is critically important " in document, can be divided by participle sequence
For " patent ", " writing ", " very " and " important ".Using the initial characteristics vector of any participle in language material as participle character modules
The output of type, for example, the output of the initial characteristics vector of this participle as participle characteristic model " will be write ", initial characteristics to
Amount can be a random feature vector, and feature vector can be n dimensional feature vectors, and for example, 128 dimensional feature vectors, n is just
Integer.Then using the initial characteristics vector segmented in this participle priority preset range as the input of participle characteristic model, example
Such as, the input by the initial characteristics vector of " patent ", " very " and " important " as participle characteristic model then obtains participle feature
One group of model is output and input.And so on, can be determined from large volume document participle characteristic model it is a large amount of input with
Output is realized and is trained to participle characteristic model, until participle characteristic model reaches the condition of convergence.To participle characteristic model
Training process is the training process to the feature vector of participle.After the completion of segmenting characteristic model training, then it can obtain each
The feature vector of a participle.
It can be deep learning network DNN (Deep Neural Network), Recognition with Recurrent Neural Network to segment characteristic model
RNN (Recurrent Neural Networks), long Memory Neural Networks LSTM (Long Short Memory in short-term
The combination of one or more of neural networks such as Network);Shallow-layer neural network algorithm model can also be used, such as
BP (Back Propagation) neural network, RBF (Radical Basis Function) neural network model etc..
The process of the embodiment of the present application training participle characteristic model, without being labeled to training corpus, you can to obtain
The feature vector of each participle.
Step 402:According to inverted index, the corresponding document identification of the second participle of inquiry and/or paragraph mark, according to second
It segments corresponding document identification and/or paragraph mark determines the corresponding document of the second participle and/or paragraph.
The corresponding document identification of the second participle of inquiry and/or paragraph mark, further determine that the corresponding document of the second participle
And/or paragraph, document identification corresponding with the inquiry first participle and/or paragraph identify, and further determine that the first participle is corresponding
Document and/or paragraph are similar, and related description may refer to above-described embodiment, and details are not described herein.
Step 403:The corresponding document of second participle and/or paragraph are ranked up.
In the embodiment of the present application in some possible realization methods, the corresponding document of the second participle and/or paragraph are carried out
The realization process of sequence may include:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the second participle, the second participle
Number, the second appearances ratio of the participle in every corresponding document, second segment appearance position in every corresponding document,
It is one or more in distance of each second participle in every corresponding document, the corresponding document of the second participle is arranged
Sequence;
And/or
According to the second occurrence number of the participle in each corresponding paragraph, the second participle in each corresponding paragraph
Appearance ratio, the second appearance position of the participle in each corresponding paragraph, each second participle are in each corresponding paragraph
Distance in it is one or more, the second corresponding paragraph of participle is ranked up.
The second corresponding document of participle and/or paragraph are ranked up, and to the corresponding document of the first participle and/or paragraph
It is ranked up similar, related description may refer to above-described embodiment, and details are not described herein.
Step 404:The inquiry request for any related term is obtained, the corresponding text of the second participle for including by the related term
Shelves abstract and/or paragraph abstract are sent to terminal order according to ranking results and show.
After terminal shows related term, user can initiate the inquiry for the related term by triggering any related term
Request, server can directly obtain the ranking results calculated after receiving the inquiry request for the related term,
The corresponding documentation summary of the second participle and/or paragraph that the related term includes are made a summary and be sent to terminal order according to ranking results
Display saves the time for calculating ranking results, user can be made quickly to obtain the documentation summary and/or paragraph of needs at this time
Abstract.
Step 405:The display request for segmenting corresponding documentation summary or paragraph abstract for any second is being got,
The document abstract or the corresponding document file page of paragraph abstract are sent to terminal, so that terminal loads show the document page.
The realization process of step 405 is similar with the realization process of step 204, and related description may refer to above-described embodiment,
Details are not described herein.
In the present embodiment, it may be determined that go out the related term of query word, and recommend related term to user, allow user again
It is inquired for related term, meanwhile, after determining related term, the corresponding document of precalculated each related term and/or section
It falls, and the corresponding document of each related term and/or paragraph is ranked up, when user inquires for some related term,
Ranking results directly can be sent to user, saved the plenty of time, further improve use with quick obtaining to ranking results
The speed of related content is read in magnanimity document in family.
It is shown in Figure 5 based on any of the above-described embodiment, another target text provided in the embodiment of the present application is provided
The flow chart of this display methods embodiment can also record in the present embodiment according to the historical query word of each user, determine
Go out user's query word that most probable inputs again after input inquiry word, that is, determines the corresponding predicted query word of query word, and pre-
The corresponding document of predicted query word and/or paragraph are first calculated, it, can be quickly by phase when user needs to inquire predicted query word
Hold inside the Pass and be sent to terminal, to realize that Fast Reading of the user to target text, the present embodiment may comprise steps of:
Step 501:It is recorded according to historical query word, determines that the corresponding predicted query word of query word, predicted query word include
Third segments.
Server record has the inquiry of each user to record, and can determine that different user input is looked into according to inquiry record
The sequence of word is ask, then is recorded according to historical query word, it may be determined that go out the query word that most probable inputs again after input inquiry word
As predicted query word.For example, by statistics, user is after input inquiry " artificial intelligence ", then input inquiry " neural network "
Probability highest, then the corresponding predicted query word of query word " artificial intelligence " be " neural network ".Similar, predicted query word can
To include one or more third participles, third participle is a participle in inverted index.
It is understood that predicted query word can be one or more of above-mentioned related term, it can also be with above-mentioned phase
It is different to close word.
Step 502:According to inverted index, inquiry third segments corresponding document identification and/or paragraph mark, according to third
It segments corresponding document identification and/or paragraph mark determines that third segments corresponding document and/or paragraph.
It inquires third and segments corresponding document identification and/or paragraph mark, further determine that third segments corresponding document
And/or paragraph, document identification corresponding with the inquiry first participle and/or paragraph identify, and further determine that the first participle is corresponding
Document and/or paragraph are similar, and related description may refer to above-described embodiment, and details are not described herein.
Step 503:Corresponding document is segmented to third and/or paragraph is ranked up.
In the embodiment of the present application in some possible realization methods, corresponding document is segmented to third and/or paragraph carries out
The realization process of sequence may include:
The Doctype of corresponding document is segmented according to third, third segments and goes out occurrence in every corresponding document
Appearance position in every corresponding document of number, appearances ratio of the third participle in every corresponding document, third participle,
It is one or more in distance of each third participle in every corresponding document, corresponding document is segmented to third and is arranged
Sequence;
And/or
It is segmented in each corresponding paragraph according to occurrence number, third of the third participle in each corresponding paragraph
Appearance ratio, appearance position of the third participle in each corresponding paragraph, each third segment in each corresponding paragraph
Distance in it is one or more, corresponding paragraph is segmented to third and is ranked up.
Corresponding document is segmented to third and/or paragraph is ranked up, and to the corresponding document of the first participle and/or paragraph
It is ranked up similar, related description may refer to above-described embodiment, and details are not described herein.
Step 504:The inquiry request for any predicted query word is obtained, the third for including by the predicted query word segments
Corresponding documentation summary and/or paragraph abstract are sent to terminal order according to ranking results and show.
When the query word that user further inputs is predicted query word, then server can be obtained pre- checks and examine for this
The inquiry request for asking word, can directly obtain the ranking results calculated, the third for including by the predicted query word at this time
It segments corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows, save calculating at this time
The time of ranking results can make user quickly obtain the documentation summary and/or paragraph abstract of needs.
Step 505:The display request that corresponding documentation summary or paragraph abstract are segmented for any third is being got,
The document abstract or the corresponding document file page of paragraph abstract are sent to terminal, so that terminal loads show the document page.
The realization process of step 505 is similar with the realization process of step 204, and related description may refer to above-described embodiment,
Details are not described herein.
In the present embodiment, it may be determined that go out the corresponding predicted query word of query word, precalculated each predicted query
The corresponding document of word and/or paragraph, and the corresponding document of each predicted query word and/or paragraph are ranked up, in user's needle
When inquiring some predicted query word, ranking results directly can be sent to user with quick obtaining to ranking results, saved
About the plenty of time, further improve the speed that user reads related content in magnanimity document.
Shown in Figure 6, the embodiment of the present application also provides a kind of target text display device embodiment, may include:
Establish unit 601, for pre-establishing inverted index, inverted index include the corresponding document identification of each participle with
And paragraph mark;
First acquisition unit 602, for obtaining query word input by user, query word includes the first participle;
First query unit 603, for according to inverted index, the corresponding document identification of the inquiry first participle and/or paragraph
Mark determines the corresponding document of the first participle and/or paragraph according to the corresponding document identification of the first participle and/or paragraph mark;
First sequencing unit 604 is tied for being ranked up to the corresponding document of the first participle and/or paragraph according to sequence
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and shown by fruit;
First transmission unit 605, for being plucked for the corresponding documentation summary of any first participle or paragraph getting
The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that terminal loads are shown to terminal
The document page.
In the embodiment of the present application in some possible realization methods, which further includes:
First determination unit sends related term to terminal and is shown for determining the corresponding related term of query word, related
Word includes the second participle;
Second query unit, for according to inverted index, the corresponding document identification of the second participle of inquiry and/or paragraph mark
Know, the corresponding document of the second participle and/or paragraph are determined according to the corresponding document identification of the second participle and/or paragraph mark;
Second sequencing unit, for being ranked up to the corresponding document of the second participle and/or paragraph;
Second acquisition unit, for obtaining the inquiry request for any related term, include by the related term second point
The corresponding documentation summary of word and/or paragraph abstract are sent to terminal order according to ranking results and show;
Second transmission unit, for segmenting corresponding documentation summary or paragraph abstract for any second getting
Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that terminal loads show this article to terminal
The shelves page.
In the embodiment of the present application in some possible realization methods, the first determination unit may include:
First determination subelement, for determining the feature vector of the first participle according to participle characteristic model and other are each
The feature vector of participle;
Computation subunit, for calculating the phase between the feature vector of the first participle and the feature vector of other each participles
Like degree;
Second determination subelement, the participle for the similarity of the feature vector with the first participle to be met to preset condition are true
It is set to related term.
In the embodiment of the present application in some possible realization methods, the first determination subelement can be specifically used for:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document,
Using the initial characteristics vector segmented in the participle priority preset range as the input of participle characteristic model, to segmenting characteristic model
It is trained, after participle characteristic model reaches the condition of convergence, obtains the feature vector and other each participles of the first participle
Feature vector, participle characteristic model be neural network model.
In the embodiment of the present application in some possible realization methods, which can also include:
Second determination unit determines the corresponding predicted query word of query word, checks and examine in advance for being recorded according to historical query word
It includes third participle to ask word;
Third query unit, for according to inverted index, inquiry third to segment corresponding document identification and/or paragraph mark
Know, corresponding document identification is segmented according to third and/or paragraph mark determines that third segments corresponding document and/or paragraph;
Third sequencing unit, for being ranked up to the corresponding document of third participle and/or paragraph;
Third acquiring unit is used to obtain the inquiry request for any predicted query word, includes by the predicted query word
Third segment corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows;
Third transmission unit, for get for any third segment corresponding documentation summary or paragraph abstract
Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that terminal loads show this article to terminal
The shelves page.
In the embodiment of the present application in some possible realization methods, the first sequencing unit can be specifically used for:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the first participle, the first participle
Number, appearance position of appearance ratio, the first participle of the first participle in every corresponding document in every corresponding document,
It is one or more in distance of each first participle in every corresponding document, the corresponding document of the first participle is arranged
Sequence;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph
Appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle are in each corresponding paragraph
Distance in it is one or more, the corresponding paragraph of the first participle is ranked up.
In the embodiment of the present application in some possible realization methods, the second sequencing unit can be specifically used for:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the second participle, the second participle
Number, the second appearances ratio of the participle in every corresponding document, second segment appearance position in every corresponding document,
It is one or more in distance of each second participle in every corresponding document, the corresponding document of the second participle is arranged
Sequence;
And/or
According to the second occurrence number of the participle in each corresponding paragraph, the second participle in each corresponding paragraph
Appearance ratio, the second appearance position of the participle in each corresponding paragraph, each second participle are in each corresponding paragraph
Distance in it is one or more, the second corresponding paragraph of participle is ranked up.
In the embodiment of the present application in some possible realization methods, third sequencing unit can be specifically used for:
The Doctype of corresponding document is segmented according to third, third segments and goes out occurrence in every corresponding document
Appearance position in every corresponding document of number, appearances ratio of the third participle in every corresponding document, third participle,
It is one or more in distance of each third participle in every corresponding document, corresponding document is segmented to third and is arranged
Sequence;
And/or
It is segmented in each corresponding paragraph according to occurrence number, third of the third participle in each corresponding paragraph
Appearance ratio, appearance position of the third participle in each corresponding paragraph, each third segment in each corresponding paragraph
Distance in it is one or more, corresponding paragraph is segmented to third and is ranked up.
The embodiment of the present application can be in advance by the document identification of document and institute at each participle in each document
Paragraph identification record in paragraph gets off, and establishes inverted index, when user requires to look up certain focus, can input and look into
Word is ask, which includes one or more first participles.It, can be according to the row's of the falling rope pre-established after obtaining query word
Draw, quickly finds the corresponding document identification of the first participle and/or paragraph mark, it is corresponding the first participle may further to be obtained
Document and/or paragraph are ranked up the corresponding document of the first participle and/or paragraph according to the relevance with query word, according to
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to the terminal of user and shown by ranking results, then user
Can trigger any documentation summary either paragraph abstract to by the document abstract or paragraph make a summary corresponding document file page into
Row display, the reading time of user can be substantial saved, is carried with a certain content in the multiple documents of fast browsing by realizing user
The high reading experience of user.
It should be noted that each embodiment is described by the way of progressive in this specification, each embodiment emphasis is said
Bright is all difference from other examples, and just to refer each other for identical similar portion between each embodiment.For reality
For applying system or device disclosed in example, since it is corresponded to the methods disclosed in the examples, so fairly simple, the phase of description
Place is closed referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one
Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation
There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the application.
Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein
General Principle can in other embodiments be realized in the case where not departing from spirit herein or range.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest range caused.
Claims (12)
1. a kind of target text display methods, which is characterized in that pre-establish inverted index, the inverted index includes each point
The corresponding document identification of word and paragraph mark, the method includes:
Query word input by user is obtained, the query word includes the first participle;
According to the inverted index, the corresponding document identification of the first participle and/or paragraph mark are inquired, according to described first
It segments corresponding document identification and/or paragraph mark determines the corresponding document of the first participle and/or paragraph;
The corresponding document of the first participle and/or paragraph are ranked up, correspond to the first participle according to ranking results
Documentation summary and/or paragraph abstract be sent to terminal order and show;
The display request for being directed to the corresponding documentation summary of any first participle or paragraph abstract is being got, to the end
End sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page.
2. according to the method described in claim 1, it is characterized in that, the method further includes:
Determine the corresponding related term of the query word, sending the related term to the terminal is shown, the related term packet
Include the second participle;
According to the inverted index, the corresponding document identification of inquiry second participle and/or paragraph mark, according to described second
It segments corresponding document identification and/or paragraph mark determines the corresponding document of second participle and/or paragraph;
The corresponding document of second participle and/or paragraph are ranked up;
The inquiry request for any related term is obtained, the corresponding documentation summary of the second participle for including by the related term
And/or paragraph abstract is sent to terminal order according to ranking results and shows;
The display request for segmenting corresponding documentation summary or paragraph abstract for any described second is being got, to the end
End sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page.
3. according to the method described in claim 2, it is characterized in that, the corresponding related term of the determination query word, including:
The feature vector of the first participle and the feature vector of other each participles are determined according to participle characteristic model;
Calculate the similarity between the feature vector of the first participle and the feature vector of other each participles;
The participle that the similarity of feature vector with the first participle is met to preset condition is determined as related term.
4. according to the method described in claim 3, it is characterized in that, described determine the first participle according to participle characteristic model
Feature vector and other each participles feature vector, including:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, by this
Input of the initial characteristics vector segmented in priority preset range as the participle characteristic model is segmented, to the participle feature
Model is trained, after the participle characteristic model reaches the condition of convergence, obtain the first participle feature vector and
The feature vector of other each participles, the participle characteristic model are neural network model.
5. according to the method described in claim 1, it is characterized in that, the method further includes:
It is recorded according to historical query word, determines the corresponding predicted query word of the query word, the predicted query word includes third
Participle;
According to the inverted index, inquires the third and segment corresponding document identification and/or paragraph mark, according to the third
It segments corresponding document identification and/or paragraph mark determines that the third segments corresponding document and/or paragraph;
Corresponding document is segmented to the third and/or paragraph is ranked up;
The inquiry request for any predicted query word is obtained, the third for including by the predicted query word segments corresponding text
Shelves abstract and/or paragraph abstract are sent to terminal order according to ranking results and show;
The display request that corresponding documentation summary or paragraph abstract are segmented for any third is being got, to the end
End sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page.
6. according to the method described in claim 1, it is characterized in that, described to the corresponding document of the first participle and/or section
It falls and is ranked up, including:
According to the appearance of the Doctype of the corresponding document of the first participle, the first participle in every corresponding document
Appearance ratio in every corresponding document of number, the first participle, the first participle are in every corresponding document
Distance in every corresponding document of appearance position, each first participle in it is one or more, to described first
Corresponding document is segmented to be ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph
In appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle is each
It is one or more in distance in corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
7. a kind of target text display device, which is characterized in that described device includes:
Establish unit, for pre-establishing inverted index, the inverted index include the corresponding document identification of each participle and
Paragraph identifies;
First acquisition unit, for obtaining query word input by user, the query word includes the first participle;
First query unit, for according to the inverted index, inquiring the corresponding document identification of the first participle and/or paragraph
Mark, according to the corresponding document identification of the first participle and/or paragraph mark determine the corresponding document of the first participle and/
Or paragraph;
First sequencing unit will according to ranking results for being ranked up to the corresponding document of the first participle and/or paragraph
The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and show;
First transmission unit, for getting for the corresponding documentation summary of any first participle or paragraph abstract
Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads to the terminal
Show the document page.
8. device according to claim 7, which is characterized in that described device further includes:
First determination unit sends the related term to the terminal and carries out for determining the corresponding related term of the query word
It has been shown that, the related term include the second participle;
Second query unit, for according to the inverted index, the corresponding document identification of inquiry second participle and/or paragraph
Mark, segment corresponding document identification according to described second and/or paragraph mark determine the corresponding document of second participle and/
Or paragraph;
Second sequencing unit, for being ranked up to the corresponding document of second participle and/or paragraph;
Second acquisition unit, for obtaining the inquiry request for any related term, include by the related term second point
The corresponding documentation summary of word and/or paragraph abstract are sent to terminal order according to ranking results and show;
Second transmission unit, for getting for any second corresponding documentation summary of participle or paragraph abstract
Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads to the terminal
Show the document page.
9. device according to claim 8, which is characterized in that first determination unit includes:
First determination subelement, for determining the feature vector of the first participle according to participle characteristic model and other are each
The feature vector of participle;
Computation subunit, for calculating the phase between the feature vector of the first participle and the feature vector of other each participles
Like degree;
Second determination subelement, the participle for the similarity of the feature vector with the first participle to be met to preset condition are true
It is set to related term.
10. device according to claim 9, which is characterized in that first determination subelement is specifically used for:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, by this
Input of the initial characteristics vector segmented in priority preset range as the participle characteristic model is segmented, to the participle feature
Model is trained, after the participle characteristic model reaches the condition of convergence, obtain the first participle feature vector and
The feature vector of other each participles, the participle characteristic model are neural network model.
11. device according to claim 7, which is characterized in that described device further includes:
Second determination unit, it is described pre- for according to historical query word record, determining the corresponding predicted query word of the query word
It includes third participle to survey query word;
Third query unit, for according to the inverted index, inquiring the third and segmenting corresponding document identification and/or paragraph
Mark, corresponding document identification is segmented according to the third and/or paragraph mark determine the third segment corresponding document and/
Or paragraph;
Third sequencing unit, for being ranked up to the corresponding document of third participle and/or paragraph;
Third acquiring unit is used to obtain the inquiry request for any predicted query word, includes by the predicted query word
Third segment corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows;
Third transmission unit, for get for any third segment corresponding documentation summary or paragraph abstract
Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads to the terminal
Show the document page.
12. device according to claim 7, which is characterized in that first sequencing unit is specifically used for:
According to the appearance of the Doctype of the corresponding document of the first participle, the first participle in every corresponding document
Appearance ratio in every corresponding document of number, the first participle, the first participle are in every corresponding document
Distance in every corresponding document of appearance position, each first participle in it is one or more, to described first
Corresponding document is segmented to be ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph
In appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle is each
It is one or more in distance in corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810142223.XA CN108363682A (en) | 2018-02-11 | 2018-02-11 | A kind of target text display methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810142223.XA CN108363682A (en) | 2018-02-11 | 2018-02-11 | A kind of target text display methods and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108363682A true CN108363682A (en) | 2018-08-03 |
Family
ID=63005884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810142223.XA Pending CN108363682A (en) | 2018-02-11 | 2018-02-11 | A kind of target text display methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363682A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109710844A (en) * | 2018-12-20 | 2019-05-03 | 中国银行业监督管理委员会福建监管局 | The method and apparatus for quick and precisely positioning file based on search engine |
CN110162617A (en) * | 2018-09-29 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Extract method, apparatus, language processing engine and the medium of summary info |
CN110795553A (en) * | 2019-09-09 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Abstract generation method and device |
CN113448984A (en) * | 2021-07-15 | 2021-09-28 | 中国银行股份有限公司 | Document positioning display method and device, server and electronic equipment |
CN114722194A (en) * | 2022-03-15 | 2022-07-08 | 电子科技大学 | Automatic construction method of emergency time sequence based on abstract generation algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149494A1 (en) * | 2002-01-16 | 2005-07-07 | Per Lindh | Information data retrieval, where the data is organized in terms, documents and document corpora |
CN101246492A (en) * | 2008-02-26 | 2008-08-20 | 华中科技大学 | Full text retrieval system based on natural language |
CN103617266A (en) * | 2013-12-03 | 2014-03-05 | 北京奇虎科技有限公司 | Personalized extension search method, device and system |
WO2017131753A1 (en) * | 2016-01-29 | 2017-08-03 | Entit Software Llc | Text search of database with one-pass indexing including filtering |
-
2018
- 2018-02-11 CN CN201810142223.XA patent/CN108363682A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149494A1 (en) * | 2002-01-16 | 2005-07-07 | Per Lindh | Information data retrieval, where the data is organized in terms, documents and document corpora |
CN101246492A (en) * | 2008-02-26 | 2008-08-20 | 华中科技大学 | Full text retrieval system based on natural language |
CN103617266A (en) * | 2013-12-03 | 2014-03-05 | 北京奇虎科技有限公司 | Personalized extension search method, device and system |
WO2017131753A1 (en) * | 2016-01-29 | 2017-08-03 | Entit Software Llc | Text search of database with one-pass indexing including filtering |
Non-Patent Citations (2)
Title |
---|
冯贵川: "基于Word2vec的文本建模及分类研究", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
杨沛: "单汉字和词索引机制的模式比较", 《集美航海学院学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162617A (en) * | 2018-09-29 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Extract method, apparatus, language processing engine and the medium of summary info |
CN110162617B (en) * | 2018-09-29 | 2022-11-04 | 腾讯科技(深圳)有限公司 | Method, apparatus, language processing engine and medium for extracting summary information |
CN109710844A (en) * | 2018-12-20 | 2019-05-03 | 中国银行业监督管理委员会福建监管局 | The method and apparatus for quick and precisely positioning file based on search engine |
CN110795553A (en) * | 2019-09-09 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Abstract generation method and device |
CN110795553B (en) * | 2019-09-09 | 2024-04-23 | 腾讯科技(深圳)有限公司 | Digest generation method and device |
CN113448984A (en) * | 2021-07-15 | 2021-09-28 | 中国银行股份有限公司 | Document positioning display method and device, server and electronic equipment |
CN113448984B (en) * | 2021-07-15 | 2024-03-26 | 中国银行股份有限公司 | Document positioning display method and device, server and electronic equipment |
CN114722194A (en) * | 2022-03-15 | 2022-07-08 | 电子科技大学 | Automatic construction method of emergency time sequence based on abstract generation algorithm |
CN114722194B (en) * | 2022-03-15 | 2023-05-09 | 电子科技大学 | Automatic construction method for emergency time sequence based on abstract generation algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363682A (en) | A kind of target text display methods and device | |
CN109871483A (en) | A kind of determination method and device of recommendation information | |
CN101119326B (en) | Method and device for managing instant communication conversation record | |
CN101641697B (en) | Related search queries for a webpage and their applications | |
US20160048754A1 (en) | Classifying resources using a deep network | |
CN105808590B (en) | Search engine implementation method, searching method and device | |
CN106547871A (en) | Method and apparatus is recalled based on the Search Results of neutral net | |
CN109241526B (en) | Paragraph segmentation method and device | |
US20150186938A1 (en) | Search service advertisement selection | |
CN111797214A (en) | FAQ database-based problem screening method and device, computer equipment and medium | |
CN104899322A (en) | Search engine and implementation method thereof | |
CN108319627A (en) | Keyword extracting method and keyword extracting device | |
CN107679082A (en) | Question and answer searching method, device and electronic equipment | |
CN106874292A (en) | Topic processing method and processing device | |
CN106776860A (en) | One kind search abstraction generating method and device | |
CN110727862A (en) | Method and device for generating query strategy of commodity search | |
CN108509499A (en) | A kind of searching method and device, electronic equipment | |
CN109271514A (en) | Generation method, classification method, device and the storage medium of short text disaggregated model | |
CN109948140B (en) | Word vector embedding method and device | |
CN110489638A (en) | A kind of searching method, device, server, system and storage medium | |
CN113342948A (en) | Intelligent question and answer method and device | |
CN108694183A (en) | A kind of search method and device | |
US11176209B2 (en) | Dynamically augmenting query to search for content not previously known to the user | |
CN106021615A (en) | Method and device for optimizing title search | |
CN117391824B (en) | Method and device for recommending articles based on large language model and search engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180803 |