CN108363682A - A kind of target text display methods and device - Google Patents

A kind of target text display methods and device Download PDF

Info

Publication number
CN108363682A
CN108363682A CN201810142223.XA CN201810142223A CN108363682A CN 108363682 A CN108363682 A CN 108363682A CN 201810142223 A CN201810142223 A CN 201810142223A CN 108363682 A CN108363682 A CN 108363682A
Authority
CN
China
Prior art keywords
participle
paragraph
document
abstract
corresponding document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810142223.XA
Other languages
Chinese (zh)
Inventor
张晓东
陈利人
翟忠武
苏波
李效云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Digital Science & Technology Co Ltd
Original Assignee
Guangzhou Digital Science & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Digital Science & Technology Co Ltd filed Critical Guangzhou Digital Science & Technology Co Ltd
Priority to CN201810142223.XA priority Critical patent/CN108363682A/en
Publication of CN108363682A publication Critical patent/CN108363682A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • G06F16/3326Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
    • G06F16/3328Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages using graphical result space presentation or visualisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses a kind of target text display methods and device, includes to required text, this method for user's quick obtaining from magnanimity document:Inverted index is pre-established, inverted index includes the corresponding document identification of each participle and paragraph mark;Query word input by user is obtained, query word includes the first participle;According to inverted index, the corresponding document identification of the inquiry first participle and/or paragraph mark determine the corresponding document of the first participle and/or paragraph according to the corresponding document identification of the first participle and/or paragraph mark;The corresponding document of the first participle and/or paragraph are ranked up, the corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and shown according to ranking results;The document abstract or the corresponding document file page of paragraph abstract are sent to terminal for the display request that either paragraph is made a summary of the corresponding documentation summary of any first participle getting, so that terminal loads show the document page.

Description

A kind of target text display methods and device
Technical field
This application involves Internet technical fields, and in particular to a kind of target text display methods and device.
Background technology
With the development of Internet technology, user can face the reading of bulk information, document.Under normal conditions, Yong Huhui It leaies through page by page in sequence document, completes quickly scanning and read.But when user needs in reading documents in a certain focus Rong Shi, it is difficult to be quickly obtained required information.For example, financial analyst is in listed company's year report in face of many several louvers When announcement, when such as leafing through each document page by page to find a certain focus, it can take a significant amount of time, and easily omit related content. Therefore, user's Fast Reading from magnanimity document how to be allow to be a technical problem to be solved urgently to required content.
Invention content
In view of this, a kind of target text display methods of the embodiment of the present application offer and device, to solve in the prior art User can not the technical issues of quick obtaining is to required text from magnanimity document.
To solve the above problems, technical solution provided by the embodiments of the present application is as follows:
A kind of target text display methods, pre-establishes inverted index, and the inverted index includes that each participle is corresponding Document identification and paragraph mark, the method includes:
Query word input by user is obtained, the query word includes the first participle;
According to the inverted index, the corresponding document identification of the first participle and/or paragraph mark are inquired, according to described The corresponding document identification of the first participle and/or paragraph mark determine the corresponding document of the first participle and/or paragraph;
The corresponding document of the first participle and/or paragraph are ranked up, according to ranking results by the first participle Corresponding documentation summary and/or paragraph abstract are sent to terminal order and show;
The display request for being directed to the corresponding documentation summary of any first participle or paragraph abstract is being got, to institute It states terminal and sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page Face.
Optionally, the method further includes:
Determine the corresponding related term of the query word, sending the related term to the terminal is shown, the correlation Word includes the second participle;
According to the inverted index, the corresponding document identification of inquiry second participle and/or paragraph mark, according to described The corresponding document identification of second participle and/or paragraph mark determine the corresponding document of second participle and/or paragraph;
The corresponding document of second participle and/or paragraph are ranked up;
The inquiry request for any related term is obtained, the corresponding document of the second participle which includes is plucked It wants and/or paragraph abstract is sent to terminal order according to ranking results and shows;
The display request for segmenting corresponding documentation summary or paragraph abstract for any described second is being got, to institute It states terminal and sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page Face.
Optionally, the corresponding related term of the determination query word, including:
The feature vector of the first participle and the feature vector of other each participles are determined according to participle characteristic model;
Calculate the similarity between the feature vector of the first participle and the feature vector of other each participles;
The participle that the similarity of feature vector with the first participle is met to preset condition is determined as related term.
Optionally, the feature vector and other each participles that the first participle is determined according to participle characteristic model Feature vector, including:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, Using the initial characteristics vector segmented in the participle priority preset range as the input of the participle characteristic model, to the participle Characteristic model is trained, and after the participle characteristic model reaches the condition of convergence, obtains the feature vector of the first participle And the feature vector of other each participles, the participle characteristic model are neural network model.
Optionally, the method further includes:
It is recorded according to historical query word, determines that the corresponding predicted query word of the query word, the predicted query word include Third segments;
According to the inverted index, inquires the third and segment corresponding document identification and/or paragraph mark, according to described Third segments corresponding document identification and/or paragraph mark determines that the third segments corresponding document and/or paragraph;
Corresponding document is segmented to the third and/or paragraph is ranked up;
The inquiry request for any predicted query word is obtained, the third for including by the predicted query word, which segments, to be corresponded to Documentation summary and/or paragraph abstract be sent to terminal order according to ranking results and show;
The display request that corresponding documentation summary or paragraph abstract are segmented for any third is being got, to institute It states terminal and sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page Face.
Optionally, described that the corresponding document of the first participle and/or paragraph are ranked up, including:
According to the Doctype of the corresponding document of the first participle, the first participle in every corresponding document Appearance ratio in every corresponding document of occurrence number, the first participle, the first participle are in every corresponding text It is one or more in the distance of appearance position, each first participle in every corresponding document in shelves, to described The corresponding document of the first participle is ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle each corresponding Appearance position in each corresponding paragraph of appearance ratio, the first participle in paragraph, each first participle exist It is one or more in distance in each corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
A kind of target text display device, described device include:
Unit is established, for pre-establishing inverted index, the inverted index includes the corresponding document identification of each participle And paragraph mark;
First acquisition unit, for obtaining query word input by user, the query word includes the first participle;
First query unit, for according to the inverted index, inquire the corresponding document identification of the first participle and/or Paragraph identifies, and the corresponding text of the first participle is determined according to the corresponding document identification of the first participle and/or paragraph mark Shelves and/or paragraph;
First sequencing unit is tied for being ranked up to the corresponding document of the first participle and/or paragraph according to sequence The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and shown by fruit;
First transmission unit, for being plucked for the corresponding documentation summary of any first participle or paragraph getting The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal to the terminal Loaded and displayed the document page.
Optionally, described device further includes:
First determination unit sends the related term for determining the corresponding related term of the query word to the terminal It is shown, the related term includes the second participle;
Second query unit, for according to the inverted index, the corresponding document identification of inquiry second participle and/or Paragraph identifies, and segments corresponding document identification according to described second and/or paragraph mark determines the corresponding text of second participle Shelves and/or paragraph;
Second sequencing unit, for being ranked up to the corresponding document of second participle and/or paragraph;
Second acquisition unit, for obtaining the inquiry request for being directed to any related term, include by the related term the The corresponding documentation summary of two participles and/or paragraph abstract are sent to terminal order according to ranking results and show;
Second transmission unit, for being plucked for any second corresponding documentation summary of participle or paragraph getting The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal to the terminal Loaded and displayed the document page.
Optionally, first determination unit includes:
First determination subelement, for according to participle characteristic model determine the first participle feature vector and other The feature vector of each participle;
Computation subunit, between the feature vector and the feature vector of other each participles for calculating the first participle Similarity;
Second determination subelement, point for the similarity of the feature vector with the first participle to be met to preset condition Word is determined as related term.
Optionally, first determination subelement is specifically used for:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, Using the initial characteristics vector segmented in the participle priority preset range as the input of the participle characteristic model, to the participle Characteristic model is trained, and after the participle characteristic model reaches the condition of convergence, obtains the feature vector of the first participle And the feature vector of other each participles, the participle characteristic model are neural network model.
Optionally, described device further includes:
Second determination unit determines the corresponding predicted query word of the query word, institute for being recorded according to historical query word It includes third participle to state predicted query word;
Third query unit, for according to the inverted index, inquire the third segment corresponding document identification and/or Paragraph identifies, and segments corresponding document identification according to the third and/or paragraph mark determines that the third segments corresponding text Shelves and/or paragraph;
Third sequencing unit, for being ranked up to the corresponding document of third participle and/or paragraph;
Third acquiring unit, for obtaining the inquiry request for any predicted query word, by the predicted query word Including third segment corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows;
Third transmission unit, for segmenting corresponding documentation summary for any third or paragraph is plucked getting The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal to the terminal Loaded and displayed the document page.
Optionally, first sequencing unit is specifically used for:
According to the Doctype of the corresponding document of the first participle, the first participle in every corresponding document Appearance ratio in every corresponding document of occurrence number, the first participle, the first participle are in every corresponding text It is one or more in the distance of appearance position, each first participle in every corresponding document in shelves, to described The corresponding document of the first participle is ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle each corresponding Appearance position in each corresponding paragraph of appearance ratio, the first participle in paragraph, each first participle exist It is one or more in distance in each corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
It can be seen that the embodiment of the present application has the advantages that:
The embodiment of the present application can be in advance by the document identification of document and institute at each participle in each document Paragraph identification record in paragraph gets off, and establishes inverted index, when user requires to look up certain focus, can input and look into Word is ask, which includes one or more first participles.It, can be according to the row's of the falling rope pre-established after obtaining query word Draw, quickly finds the corresponding document identification of the first participle and/or paragraph mark, it is corresponding the first participle may further to be obtained Document and/or paragraph are ranked up the corresponding document of the first participle and/or paragraph according to the relevance with query word, according to The corresponding documentation summary of the first participle and/or paragraph abstract are sent to the terminal of user and shown by ranking results, then user Can trigger any documentation summary either paragraph abstract to by the document abstract or paragraph make a summary corresponding document file page into Row display, the reading time of user can be substantial saved, is carried with a certain content in the multiple documents of fast browsing by realizing user The high reading experience of user.
Description of the drawings
Fig. 1 is application scenarios schematic diagram provided by the embodiments of the present application;
Fig. 2 is a kind of flow chart of target text display methods embodiment provided by the embodiments of the present application;
Fig. 3 is the display result schematic diagram of target text display methods provided by the embodiments of the present application;
Fig. 4 is the flow chart of another target text display methods embodiment provided by the embodiments of the present application;
Fig. 5 is the flow chart of another target text display methods embodiment provided by the embodiments of the present application;
Fig. 6 is a kind of schematic diagram of target text display device embodiment provided by the embodiments of the present application.
Specific implementation mode
In order to make the above objects, features, and advantages of the present application more apparent, below in conjunction with the accompanying drawings and it is specific real Mode is applied to be described in further detail the embodiment of the present application.
It is shown in Figure 1, show the block schematic illustration of an exemplary application scene of the embodiment of the present application.Wherein, it uses Family can be with 10 input inquiry word of using terminal, which includes the first participle;Server 20 can obtain the query word, root According to the inverted index pre-established, the corresponding document identification of the inquiry first participle and/or paragraph mark further determine that first point The corresponding document of word and/or paragraph;Then the corresponding document of the first participle and/or paragraph are ranked up, according to ranking results The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal 10 and sequentially shown;User can pass through terminal 10 The documentation summary or paragraph abstract of any display are triggered, server 20 is being got for the corresponding document of any first participle Can make a summary corresponding document after the abstract display request that either paragraph is made a summary to the transmission the document abstract of terminal 10 or paragraph The page, terminal 10 can be with loaded and displayed the document pages, to realize user to the fast of the corresponding document of query word or paragraph Speed is read.
It will be understood by those skilled in the art that block schematic illustration shown in FIG. 1 be only presently filed embodiment can be An example being wherein achieved.The scope of application of embodiment of the present invention is not limited by any aspect of the frame.
It should be noted that the terminal 10 in the embodiment of the present application can be existing, researching and developing or research and develop in the future , can by it is any type of wiredly and/or wirelessly connection (for example, Wi-Fi, LAN, honeycomb, coaxial cable etc.) realize with Any user equipment that server 20 interacts, including but not limited to:Existing, researching and developing or research and development in the future intelligent hands Machine, non-smart mobile phone, tablet computer, laptop PC, desktop personal computer, minicomputer, medium-sized calculating Machine, mainframe computer etc..It is also to be noted that in the embodiment of the present application server 20 can it is existing, researching and developing An or example of research and development, application service that information recommendation can be provided a user equipment in the future.The embodiment party of the application Formula is unrestricted in this regard.
Target text display methods provided by the embodiments of the present application is described in detail below with reference to attached drawing.
It is shown in Figure 2, a kind of stream of the target text display methods embodiment provided in the embodiment of the present application is provided Cheng Tu, this method can be applied to server.
In the embodiment of the present application, it before each step of performance objective text display method embodiment, can build in advance Vertical inverted index, inverted index include the corresponding document identification of each participle and paragraph mark.
After obtaining magnanimity document, each piece document is segmented first, the participle repeated in a document can be by It is considered a participle, the paragraph of the document identification of document and residing paragraph mark residing for each participle is recorded, from And it includes the corresponding document identification of each participle and the inverted index that paragraph identifies to establish.Document identification can be that document corresponds to Serial number either other identifier paragraph mark can be serial number or other identifier of the paragraph in residing document, the application is real Example is applied to the form of document identification and paragraph mark without limiting.Inverted index is for example, participle 1 corresponds to document 001, text Shelves 002, participle 1 correspond to 020 section, 021 section, 005 section, 007 section etc. in document 002 in document 001.
It is understood that with the continuous renewal of document, the content of inverted index also updates therewith, that is, establishes the row's of falling rope The process drawn may include newly-built and inverted index is established in update.
Target text display methods embodiment provided in this embodiment may comprise steps of:
Step 201:Query word input by user is obtained, query word includes the first participle.
User can may include one or more first participles with input inquiry word, the query word by terminal, first point Word is a participle in inverted index, and server can get the query word.When query word only includes a first participle When, then query word includes the first participle is query word itself, such as query word is " patent ", the query word include first Participle is " patent ";When query word includes multiple first participles, query word can be segmented, determine that query word is wrapped The multiple first participles included, such as query word are " patent drafting ", then are segmented to query word, it may be determined that the query word packet The first participle included is " patent " and " writing ".
Step 202:According to inverted index, the corresponding document identification of the inquiry first participle and/or paragraph mark, according to first It segments corresponding document identification and/or paragraph mark determines the corresponding document of the first participle and/or paragraph.
According to the above description, after determining the first participle, can inquire inverted index obtain each first participle institute it is right Document identification and/or the paragraph mark answered, may further determine the corresponding document of the first participle and/or paragraph.It can The paragraph residing for the document and/or the first participle residing for the first participle is found in magnanimity document.
Step 203:The corresponding document of the first participle and/or paragraph are ranked up, according to ranking results by the first participle Corresponding documentation summary and/or paragraph abstract are sent to terminal order and show.
Since the quantity of the corresponding document of the first participle and paragraph may be very much, needed when showing for user First the corresponding document of the first participle and/or the corresponding paragraph of the first participle are ranked up, the sequence and the first participle and text The correlation of shelves or paragraph is related, i.e. the correlation of the first participle and document is higher, then the first participle corresponding document Sequence is higher, and the correlation of the first participle and paragraph is higher, then the sequence of the corresponding paragraph of the first participle is higher.
It is understood that it is more compared with document is usual with the text that paragraph includes, therefore to the corresponding text of the first participle It, can be first only suitable according to sorting by the corresponding documentation summary of the first participle and/or paragraph abstract after shelves and/or paragraph are ranked up Sequence is sent to terminal and is shown, so that user first can substantially understand the document and/or paragraph for being related to query word, then selects sense The content of interest is further read, and user's browsing document required time is greatly reduced.
Documentation summary can be certain emphasis paragraph etc. in Document Title, documentation summary content, document, and paragraph abstract can be Certain emphasis sentence in the content of text, paragraph of preset length, is also likely to be in paragraph when paragraph short period of time falls abstract in paragraph Appearance itself.The embodiment of the present application is to the form of documentation summary and paragraph abstract without limiting.
The specific implementation being ranked up to the corresponding document of the first participle and/or paragraph is described in detail again below, can wrap It includes:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the first participle, the first participle Number, appearance position of appearance ratio, the first participle of the first participle in every corresponding document in every corresponding document, It is one or more in distance of each first participle in every corresponding document, the corresponding document of the first participle is arranged Sequence;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph Appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle are in each corresponding paragraph Distance in it is one or more, the corresponding paragraph of the first participle is ranked up.
It in the embodiment of the present application, can be to the first participle when needing to be ranked up the corresponding document of the first participle Corresponding each piece document carries out scoring with the degree of correlation of the first participle respectively, according to appraisal result each piece corresponding to the first participle Document is ranked up.
One or more of the following items can include but is not limited to the rule that document scores:
(1) can be that different bonus point values, such as the text of technical paper type is respectively set in different types of document in advance Higher bonus point value, lower bonus point value of document setup of Internet news type etc. is arranged in shelves, then the corresponding text of the first participle The Doctype of shelves can influence the sequence of document.
(2) occurrence number that can be according to the first participle in every corresponding document is the different bonus point of document setup Value, such as occurrence number of the same first participle in first document are more than the occurrence number in second document, then right The bonus point value of corresponding first document of the first participle is higher than the bonus point value of corresponding second document of the first participle, represent this One document and the first participle are more relevant.
(3) can be the different bonus point of document setup according to appearance ratio of the first participle in every corresponding document Value, such as occurrence number of the same first participle in first document are identical as the occurrence number in second document, together When first document length be much smaller than the length of second document, i.e. appearance ratio of the first participle in first document Higher than the appearance ratio in second document, then the bonus point value of first document corresponding to the first participle is higher than the first participle The bonus point value of corresponding second document, represents first document and the first participle is more relevant.
(4) can be the different bonus point of document setup according to appearance position of the first participle in every corresponding document Value, such as first section in document or endpiece can more represent document content, if the first participle occur first section in a document or Person's endpiece can be that higher bonus point value is arranged in the document, and other paragraphs in a document occurs in the first participle, can be this article Lower bonus point value is arranged in shelves.
It (5), can also be according to multiple first participles between every document when query word includes multiple first participles Distance is the different bonus point value of document setup.If such as query word include two first participles, the two first participles be connected Occur in a document, representing the query word and integrally occurring in a document, then document and the query word degree of correlation are higher, can be this article The higher bonus point value of shelves setting, if farther out, can be that the document setting is lower adds distance the different first participles in a document Score value.
Similar, it, can be each section corresponding to the first participle when needing to be ranked up the corresponding paragraph of the first participle It falls and carries out scoring with the degree of correlation of the first participle respectively, be ranked up according to appraisal result each paragraph corresponding to the first participle.
One or more of the following items can include but is not limited to the rule that paragraph scores:
(1) different bonus point is arranged for paragraph in occurrence number that can be according to the first participle in each corresponding paragraph Value, such as the occurrence number of the same first participle in the first paragraph are more than the occurrence number in the second paragraph, then to first Segment corresponding first paragraph bonus point value be higher than corresponding second paragraph of the first participle bonus point value, represent first paragraph with The first participle is more relevant.
(2) different bonus point is arranged for paragraph in appearance ratio that can be according to the first participle in each corresponding paragraph Value, such as the occurrence number of the same first participle in the first paragraph are identical as the occurrence number in the second paragraph, while the The length of one paragraph is much smaller than the length of the second paragraph, i.e. the appearance ratio ratio of the first participle in the first paragraph is in second segment Appearance ratio in falling is high, then the bonus point value of the first paragraph corresponding to the first participle is higher than corresponding second paragraph of the first participle Bonus point value, represent first paragraph and the first participle be more relevant.
(4) different bonus point is arranged for paragraph in appearance position that can be according to the first participle in each corresponding paragraph Value, such as first sentence in paragraph or tail sentence can more represent paragraph content, if the first participle appear in first sentence in paragraph or Person's tail sentence can be that higher bonus point value is arranged in the paragraph, and the first participle appears in the other positions in paragraph, can be the section Fall the lower bonus point value of setting.
It (5), can also be according to multiple first participles between each paragraph when query word includes multiple first participles Different bonus point values is arranged for paragraph in distance.If such as query word include two first participles, the two first participles be connected It appears in paragraph, represents the query word and integrally appear in paragraph, then paragraph and the query word degree of correlation are higher, can be the section The higher bonus point value of setting is fallen, if farther out, can be that paragraph setting is lower adds distance the different first participles in paragraph Score value.
The corresponding document of the first participle and/or paragraph are ranked up in the above manner, can preferably reflect and look into Word and the correlation between corresponding document and/or paragraph are ask, so that user is preferentially read more relevant with query word Content.
Step 204:The display request made a summary for the corresponding documentation summary of any first participle or paragraph is being got, The document abstract or the corresponding document file page of paragraph abstract are sent to terminal, so that terminal loads show the document page.
User's using terminal can check the corresponding documentation summary of the first participle and/or paragraph abstract, if to certain document Either paragraph abstract can click the document abstract to abstract or paragraph abstract triggering is directed to the corresponding document of any first participle The display request of abstract or paragraph abstract.Server getting for the corresponding documentation summary of any first participle or After the display request of paragraph abstract, the document abstract or the corresponding document file page of paragraph abstract can be sent to terminal.At this Apply in embodiment, can also paging be carried out to each document in advance.Getting the display request for a certain paragraph abstract Afterwards, the document file page residing for the paragraph can be obtained and be sent to terminal, so that terminal is only loaded the document page and shown, at this Other pages in document need not be loaded in the process, the Internet resources of occupancy are greatly reduced, and improve page loading velocity. After getting the display request for a certain documentation summary, it can be sent out the homepage of the document as the document file page loaded first Terminal is given, later with the navigation process of user, each document file page of the document is sent to terminal loads page by page.
In this way, the embodiment of the present application can in advance by the document identification of document at each participle in each document with And the paragraph identification record of present paragraph gets off, and establishes inverted index, it, can be defeated when user requires to look up certain focus Enter query word, which includes one or more first participles.After obtaining query word, it can be fallen according to what is pre-established Row's index quickly finds the corresponding document identification of the first participle and/or paragraph mark, may further obtain the first participle pair The document and/or paragraph answered are ranked up the corresponding document of the first participle and/or paragraph according to the relevance with query word, The corresponding documentation summary of the first participle and/or paragraph abstract are sent to the terminal of user and shown according to ranking results, then User can trigger any documentation summary, and either paragraph is made a summary thus by the document abstract or the corresponding documentation page of paragraph abstract Face is shown that realizing user can be with a certain content in the multiple documents of fast browsing, when substantial saved the reading of user Between, improve the reading experience of user.
In order to make it easy to understand, it is shown in Figure 3, to the display knot of target text display methods provided by the embodiments of the present application Fruit illustrates.
, can be corresponding by the first participle after if server is ranked up the corresponding document of the first participle and paragraph Documentation summary and paragraph abstract are sent to terminal order and show.In the terminal, it can show that row 301 are shown by documentation summary Documentation summary, such as show the documentation summary of document 1-5 successively in sequence;It can be made a summary by paragraph and show that row 302 are shown Paragraph is made a summary, and the display of paragraph abstract could be provided as related to document, i.e. user, can be with when triggering a certain documentation summary Make a summary in paragraph and show that row 302 show that the paragraph of paragraph included by corresponding document is made a summary according to ranking results sequence, for example, when with When the documentation summary of document 2 is triggered at family, makes a summary in paragraph and show that row 302 show that the paragraph of paragraph included by document 2 is made a summary;In addition The display of paragraph abstract could be provided as uncorrelated to document, i.e., paragraph between different document is aobvious according to ranking results sequence Show, while the relationship between paragraph and document can also be identified.For example, user trigger paragraph 3 exercise abstract when, Ke Yiti Show that user's paragraph 3 belongs to document 2.
After if server is ranked up the corresponding document of the first participle or paragraph, documentation summary can also be only shown Show that row 301 or paragraph abstract show row 302.
It is made a summary when user needs to read a certain document either a certain paragraph by triggering documentation summary or paragraph, it can Corresponding document file page is shown with request, to show corresponding document file page in the document file page display area 303 of terminal.
In the embodiment of the present application, user can be made to pass through documentation summary and paragraph abstract substantially understanding query word correspondence Content can pass through if it should also be understood that detailed information and click a certain documentation summary or paragraph abstract, read pair The document file page answered, with realizing user's higher efficiency fragment type brose and reading.
It is shown in Figure 4 based on above-described embodiment, show that another target text provided in the embodiment of the present application is aobvious Show the flow chart of embodiment of the method, can also show query word to user after user has input query word in the present embodiment Related term, and the corresponding document of related term and/or paragraph are precomputed, it, can be quick when user needs to inquire related term Related content is sent to terminal, to realize that Fast Reading of the user to target text, the present embodiment may include following step Suddenly:
Step 401:Determine the corresponding related term of query word, sending related term to terminal is shown, related term includes the Two participles.
In the embodiment of the present application, user, can also be to the correlation of user terminal recommendation query word after input inquiry word Word, similar, which may include that one or more second segments, and the second participle is a participle in inverted index. For example, user input query word is " artificial intelligence ", then related term can be " robot ", " neural network " etc..
Recommend related term to user, user on the one hand can be made to find the content that may further inquire, on the other hand may be used To remove process input by user from, user can be made directly by clicking inquiry of the related term completion to related term, to improve use The usage experience at family.
In the embodiment of the present application in some possible realization methods, determine that the process of the corresponding related term of query word can wrap It includes:
The feature vector of the first participle and the feature vector of other each participles are determined according to participle characteristic model;It calculates Similarity between the feature vector of the first participle and the feature vector of other each participles;By the feature vector with the first participle Similarity meet the participle of preset condition and be determined as related term.
Namely the feature vector of the first participle and the feature vector of other each participles can be got first, calculate the Similarity between the feature vector and the feature vector of other each participles of one participle.In general, between feature vector Similarity can be characterized with the Euclidean distance between feature vector.Then the similarity of the feature vector with the first participle is expired The participle of sufficient preset condition is determined as related term, the preset condition can be relevancy ranking within preceding predetermined number, or The degree of correlation reaches predetermined threshold value etc., and predetermined number, predetermined threshold value may be set according to actual conditions, the application to this without It limits.
In the embodiment of the present application in some possible realization methods, the feature of the first participle is determined according to participle characteristic model The specific implementation of feature vectors of vector and other each participles may include:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, Using the initial characteristics vector segmented in the participle priority preset range as the input of participle characteristic model, to segmenting characteristic model It is trained, after participle characteristic model reaches the condition of convergence, obtains the feature vector and other each participles of the first participle Feature vector, participle characteristic model be neural network model.
In the embodiment of the present application, large volume document can be used as training corpus first, and training corpus is segmented, can be with Understand that each participle has word order relationship, such as language material " patent drafting is critically important " in document, can be divided by participle sequence For " patent ", " writing ", " very " and " important ".Using the initial characteristics vector of any participle in language material as participle character modules The output of type, for example, the output of the initial characteristics vector of this participle as participle characteristic model " will be write ", initial characteristics to Amount can be a random feature vector, and feature vector can be n dimensional feature vectors, and for example, 128 dimensional feature vectors, n is just Integer.Then using the initial characteristics vector segmented in this participle priority preset range as the input of participle characteristic model, example Such as, the input by the initial characteristics vector of " patent ", " very " and " important " as participle characteristic model then obtains participle feature One group of model is output and input.And so on, can be determined from large volume document participle characteristic model it is a large amount of input with Output is realized and is trained to participle characteristic model, until participle characteristic model reaches the condition of convergence.To participle characteristic model Training process is the training process to the feature vector of participle.After the completion of segmenting characteristic model training, then it can obtain each The feature vector of a participle.
It can be deep learning network DNN (Deep Neural Network), Recognition with Recurrent Neural Network to segment characteristic model RNN (Recurrent Neural Networks), long Memory Neural Networks LSTM (Long Short Memory in short-term The combination of one or more of neural networks such as Network);Shallow-layer neural network algorithm model can also be used, such as BP (Back Propagation) neural network, RBF (Radical Basis Function) neural network model etc..
The process of the embodiment of the present application training participle characteristic model, without being labeled to training corpus, you can to obtain The feature vector of each participle.
Step 402:According to inverted index, the corresponding document identification of the second participle of inquiry and/or paragraph mark, according to second It segments corresponding document identification and/or paragraph mark determines the corresponding document of the second participle and/or paragraph.
The corresponding document identification of the second participle of inquiry and/or paragraph mark, further determine that the corresponding document of the second participle And/or paragraph, document identification corresponding with the inquiry first participle and/or paragraph identify, and further determine that the first participle is corresponding Document and/or paragraph are similar, and related description may refer to above-described embodiment, and details are not described herein.
Step 403:The corresponding document of second participle and/or paragraph are ranked up.
In the embodiment of the present application in some possible realization methods, the corresponding document of the second participle and/or paragraph are carried out The realization process of sequence may include:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the second participle, the second participle Number, the second appearances ratio of the participle in every corresponding document, second segment appearance position in every corresponding document, It is one or more in distance of each second participle in every corresponding document, the corresponding document of the second participle is arranged Sequence;
And/or
According to the second occurrence number of the participle in each corresponding paragraph, the second participle in each corresponding paragraph Appearance ratio, the second appearance position of the participle in each corresponding paragraph, each second participle are in each corresponding paragraph Distance in it is one or more, the second corresponding paragraph of participle is ranked up.
The second corresponding document of participle and/or paragraph are ranked up, and to the corresponding document of the first participle and/or paragraph It is ranked up similar, related description may refer to above-described embodiment, and details are not described herein.
Step 404:The inquiry request for any related term is obtained, the corresponding text of the second participle for including by the related term Shelves abstract and/or paragraph abstract are sent to terminal order according to ranking results and show.
After terminal shows related term, user can initiate the inquiry for the related term by triggering any related term Request, server can directly obtain the ranking results calculated after receiving the inquiry request for the related term, The corresponding documentation summary of the second participle and/or paragraph that the related term includes are made a summary and be sent to terminal order according to ranking results Display saves the time for calculating ranking results, user can be made quickly to obtain the documentation summary and/or paragraph of needs at this time Abstract.
Step 405:The display request for segmenting corresponding documentation summary or paragraph abstract for any second is being got, The document abstract or the corresponding document file page of paragraph abstract are sent to terminal, so that terminal loads show the document page.
The realization process of step 405 is similar with the realization process of step 204, and related description may refer to above-described embodiment, Details are not described herein.
In the present embodiment, it may be determined that go out the related term of query word, and recommend related term to user, allow user again It is inquired for related term, meanwhile, after determining related term, the corresponding document of precalculated each related term and/or section It falls, and the corresponding document of each related term and/or paragraph is ranked up, when user inquires for some related term, Ranking results directly can be sent to user, saved the plenty of time, further improve use with quick obtaining to ranking results The speed of related content is read in magnanimity document in family.
It is shown in Figure 5 based on any of the above-described embodiment, another target text provided in the embodiment of the present application is provided The flow chart of this display methods embodiment can also record in the present embodiment according to the historical query word of each user, determine Go out user's query word that most probable inputs again after input inquiry word, that is, determines the corresponding predicted query word of query word, and pre- The corresponding document of predicted query word and/or paragraph are first calculated, it, can be quickly by phase when user needs to inquire predicted query word Hold inside the Pass and be sent to terminal, to realize that Fast Reading of the user to target text, the present embodiment may comprise steps of:
Step 501:It is recorded according to historical query word, determines that the corresponding predicted query word of query word, predicted query word include Third segments.
Server record has the inquiry of each user to record, and can determine that different user input is looked into according to inquiry record The sequence of word is ask, then is recorded according to historical query word, it may be determined that go out the query word that most probable inputs again after input inquiry word As predicted query word.For example, by statistics, user is after input inquiry " artificial intelligence ", then input inquiry " neural network " Probability highest, then the corresponding predicted query word of query word " artificial intelligence " be " neural network ".Similar, predicted query word can To include one or more third participles, third participle is a participle in inverted index.
It is understood that predicted query word can be one or more of above-mentioned related term, it can also be with above-mentioned phase It is different to close word.
Step 502:According to inverted index, inquiry third segments corresponding document identification and/or paragraph mark, according to third It segments corresponding document identification and/or paragraph mark determines that third segments corresponding document and/or paragraph.
It inquires third and segments corresponding document identification and/or paragraph mark, further determine that third segments corresponding document And/or paragraph, document identification corresponding with the inquiry first participle and/or paragraph identify, and further determine that the first participle is corresponding Document and/or paragraph are similar, and related description may refer to above-described embodiment, and details are not described herein.
Step 503:Corresponding document is segmented to third and/or paragraph is ranked up.
In the embodiment of the present application in some possible realization methods, corresponding document is segmented to third and/or paragraph carries out The realization process of sequence may include:
The Doctype of corresponding document is segmented according to third, third segments and goes out occurrence in every corresponding document Appearance position in every corresponding document of number, appearances ratio of the third participle in every corresponding document, third participle, It is one or more in distance of each third participle in every corresponding document, corresponding document is segmented to third and is arranged Sequence;
And/or
It is segmented in each corresponding paragraph according to occurrence number, third of the third participle in each corresponding paragraph Appearance ratio, appearance position of the third participle in each corresponding paragraph, each third segment in each corresponding paragraph Distance in it is one or more, corresponding paragraph is segmented to third and is ranked up.
Corresponding document is segmented to third and/or paragraph is ranked up, and to the corresponding document of the first participle and/or paragraph It is ranked up similar, related description may refer to above-described embodiment, and details are not described herein.
Step 504:The inquiry request for any predicted query word is obtained, the third for including by the predicted query word segments Corresponding documentation summary and/or paragraph abstract are sent to terminal order according to ranking results and show.
When the query word that user further inputs is predicted query word, then server can be obtained pre- checks and examine for this The inquiry request for asking word, can directly obtain the ranking results calculated, the third for including by the predicted query word at this time It segments corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows, save calculating at this time The time of ranking results can make user quickly obtain the documentation summary and/or paragraph abstract of needs.
Step 505:The display request that corresponding documentation summary or paragraph abstract are segmented for any third is being got, The document abstract or the corresponding document file page of paragraph abstract are sent to terminal, so that terminal loads show the document page.
The realization process of step 505 is similar with the realization process of step 204, and related description may refer to above-described embodiment, Details are not described herein.
In the present embodiment, it may be determined that go out the corresponding predicted query word of query word, precalculated each predicted query The corresponding document of word and/or paragraph, and the corresponding document of each predicted query word and/or paragraph are ranked up, in user's needle When inquiring some predicted query word, ranking results directly can be sent to user with quick obtaining to ranking results, saved About the plenty of time, further improve the speed that user reads related content in magnanimity document.
Shown in Figure 6, the embodiment of the present application also provides a kind of target text display device embodiment, may include:
Establish unit 601, for pre-establishing inverted index, inverted index include the corresponding document identification of each participle with And paragraph mark;
First acquisition unit 602, for obtaining query word input by user, query word includes the first participle;
First query unit 603, for according to inverted index, the corresponding document identification of the inquiry first participle and/or paragraph Mark determines the corresponding document of the first participle and/or paragraph according to the corresponding document identification of the first participle and/or paragraph mark;
First sequencing unit 604 is tied for being ranked up to the corresponding document of the first participle and/or paragraph according to sequence The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and shown by fruit;
First transmission unit 605, for being plucked for the corresponding documentation summary of any first participle or paragraph getting The display request wanted sends the document abstract or the corresponding document file page of paragraph abstract, so that terminal loads are shown to terminal The document page.
In the embodiment of the present application in some possible realization methods, which further includes:
First determination unit sends related term to terminal and is shown for determining the corresponding related term of query word, related Word includes the second participle;
Second query unit, for according to inverted index, the corresponding document identification of the second participle of inquiry and/or paragraph mark Know, the corresponding document of the second participle and/or paragraph are determined according to the corresponding document identification of the second participle and/or paragraph mark;
Second sequencing unit, for being ranked up to the corresponding document of the second participle and/or paragraph;
Second acquisition unit, for obtaining the inquiry request for any related term, include by the related term second point The corresponding documentation summary of word and/or paragraph abstract are sent to terminal order according to ranking results and show;
Second transmission unit, for segmenting corresponding documentation summary or paragraph abstract for any second getting Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that terminal loads show this article to terminal The shelves page.
In the embodiment of the present application in some possible realization methods, the first determination unit may include:
First determination subelement, for determining the feature vector of the first participle according to participle characteristic model and other are each The feature vector of participle;
Computation subunit, for calculating the phase between the feature vector of the first participle and the feature vector of other each participles Like degree;
Second determination subelement, the participle for the similarity of the feature vector with the first participle to be met to preset condition are true It is set to related term.
In the embodiment of the present application in some possible realization methods, the first determination subelement can be specifically used for:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, Using the initial characteristics vector segmented in the participle priority preset range as the input of participle characteristic model, to segmenting characteristic model It is trained, after participle characteristic model reaches the condition of convergence, obtains the feature vector and other each participles of the first participle Feature vector, participle characteristic model be neural network model.
In the embodiment of the present application in some possible realization methods, which can also include:
Second determination unit determines the corresponding predicted query word of query word, checks and examine in advance for being recorded according to historical query word It includes third participle to ask word;
Third query unit, for according to inverted index, inquiry third to segment corresponding document identification and/or paragraph mark Know, corresponding document identification is segmented according to third and/or paragraph mark determines that third segments corresponding document and/or paragraph;
Third sequencing unit, for being ranked up to the corresponding document of third participle and/or paragraph;
Third acquiring unit is used to obtain the inquiry request for any predicted query word, includes by the predicted query word Third segment corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows;
Third transmission unit, for get for any third segment corresponding documentation summary or paragraph abstract Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that terminal loads show this article to terminal The shelves page.
In the embodiment of the present application in some possible realization methods, the first sequencing unit can be specifically used for:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the first participle, the first participle Number, appearance position of appearance ratio, the first participle of the first participle in every corresponding document in every corresponding document, It is one or more in distance of each first participle in every corresponding document, the corresponding document of the first participle is arranged Sequence;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph Appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle are in each corresponding paragraph Distance in it is one or more, the corresponding paragraph of the first participle is ranked up.
In the embodiment of the present application in some possible realization methods, the second sequencing unit can be specifically used for:
Go out occurrence in every corresponding document according to the Doctype of the corresponding document of the second participle, the second participle Number, the second appearances ratio of the participle in every corresponding document, second segment appearance position in every corresponding document, It is one or more in distance of each second participle in every corresponding document, the corresponding document of the second participle is arranged Sequence;
And/or
According to the second occurrence number of the participle in each corresponding paragraph, the second participle in each corresponding paragraph Appearance ratio, the second appearance position of the participle in each corresponding paragraph, each second participle are in each corresponding paragraph Distance in it is one or more, the second corresponding paragraph of participle is ranked up.
In the embodiment of the present application in some possible realization methods, third sequencing unit can be specifically used for:
The Doctype of corresponding document is segmented according to third, third segments and goes out occurrence in every corresponding document Appearance position in every corresponding document of number, appearances ratio of the third participle in every corresponding document, third participle, It is one or more in distance of each third participle in every corresponding document, corresponding document is segmented to third and is arranged Sequence;
And/or
It is segmented in each corresponding paragraph according to occurrence number, third of the third participle in each corresponding paragraph Appearance ratio, appearance position of the third participle in each corresponding paragraph, each third segment in each corresponding paragraph Distance in it is one or more, corresponding paragraph is segmented to third and is ranked up.
The embodiment of the present application can be in advance by the document identification of document and institute at each participle in each document Paragraph identification record in paragraph gets off, and establishes inverted index, when user requires to look up certain focus, can input and look into Word is ask, which includes one or more first participles.It, can be according to the row's of the falling rope pre-established after obtaining query word Draw, quickly finds the corresponding document identification of the first participle and/or paragraph mark, it is corresponding the first participle may further to be obtained Document and/or paragraph are ranked up the corresponding document of the first participle and/or paragraph according to the relevance with query word, according to The corresponding documentation summary of the first participle and/or paragraph abstract are sent to the terminal of user and shown by ranking results, then user Can trigger any documentation summary either paragraph abstract to by the document abstract or paragraph make a summary corresponding document file page into Row display, the reading time of user can be substantial saved, is carried with a certain content in the multiple documents of fast browsing by realizing user The high reading experience of user.
It should be noted that each embodiment is described by the way of progressive in this specification, each embodiment emphasis is said Bright is all difference from other examples, and just to refer each other for identical similar portion between each embodiment.For reality For applying system or device disclosed in example, since it is corresponded to the methods disclosed in the examples, so fairly simple, the phase of description Place is closed referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the application. Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or range.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest range caused.

Claims (12)

1. a kind of target text display methods, which is characterized in that pre-establish inverted index, the inverted index includes each point The corresponding document identification of word and paragraph mark, the method includes:
Query word input by user is obtained, the query word includes the first participle;
According to the inverted index, the corresponding document identification of the first participle and/or paragraph mark are inquired, according to described first It segments corresponding document identification and/or paragraph mark determines the corresponding document of the first participle and/or paragraph;
The corresponding document of the first participle and/or paragraph are ranked up, correspond to the first participle according to ranking results Documentation summary and/or paragraph abstract be sent to terminal order and show;
The display request for being directed to the corresponding documentation summary of any first participle or paragraph abstract is being got, to the end End sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page.
2. according to the method described in claim 1, it is characterized in that, the method further includes:
Determine the corresponding related term of the query word, sending the related term to the terminal is shown, the related term packet Include the second participle;
According to the inverted index, the corresponding document identification of inquiry second participle and/or paragraph mark, according to described second It segments corresponding document identification and/or paragraph mark determines the corresponding document of second participle and/or paragraph;
The corresponding document of second participle and/or paragraph are ranked up;
The inquiry request for any related term is obtained, the corresponding documentation summary of the second participle for including by the related term And/or paragraph abstract is sent to terminal order according to ranking results and shows;
The display request for segmenting corresponding documentation summary or paragraph abstract for any described second is being got, to the end End sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page.
3. according to the method described in claim 2, it is characterized in that, the corresponding related term of the determination query word, including:
The feature vector of the first participle and the feature vector of other each participles are determined according to participle characteristic model;
Calculate the similarity between the feature vector of the first participle and the feature vector of other each participles;
The participle that the similarity of feature vector with the first participle is met to preset condition is determined as related term.
4. according to the method described in claim 3, it is characterized in that, described determine the first participle according to participle characteristic model Feature vector and other each participles feature vector, including:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, by this Input of the initial characteristics vector segmented in priority preset range as the participle characteristic model is segmented, to the participle feature Model is trained, after the participle characteristic model reaches the condition of convergence, obtain the first participle feature vector and The feature vector of other each participles, the participle characteristic model are neural network model.
5. according to the method described in claim 1, it is characterized in that, the method further includes:
It is recorded according to historical query word, determines the corresponding predicted query word of the query word, the predicted query word includes third Participle;
According to the inverted index, inquires the third and segment corresponding document identification and/or paragraph mark, according to the third It segments corresponding document identification and/or paragraph mark determines that the third segments corresponding document and/or paragraph;
Corresponding document is segmented to the third and/or paragraph is ranked up;
The inquiry request for any predicted query word is obtained, the third for including by the predicted query word segments corresponding text Shelves abstract and/or paragraph abstract are sent to terminal order according to ranking results and show;
The display request that corresponding documentation summary or paragraph abstract are segmented for any third is being got, to the end End sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads show the document page.
6. according to the method described in claim 1, it is characterized in that, described to the corresponding document of the first participle and/or section It falls and is ranked up, including:
According to the appearance of the Doctype of the corresponding document of the first participle, the first participle in every corresponding document Appearance ratio in every corresponding document of number, the first participle, the first participle are in every corresponding document Distance in every corresponding document of appearance position, each first participle in it is one or more, to described first Corresponding document is segmented to be ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph In appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle is each It is one or more in distance in corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
7. a kind of target text display device, which is characterized in that described device includes:
Establish unit, for pre-establishing inverted index, the inverted index include the corresponding document identification of each participle and Paragraph identifies;
First acquisition unit, for obtaining query word input by user, the query word includes the first participle;
First query unit, for according to the inverted index, inquiring the corresponding document identification of the first participle and/or paragraph Mark, according to the corresponding document identification of the first participle and/or paragraph mark determine the corresponding document of the first participle and/ Or paragraph;
First sequencing unit will according to ranking results for being ranked up to the corresponding document of the first participle and/or paragraph The corresponding documentation summary of the first participle and/or paragraph abstract are sent to terminal order and show;
First transmission unit, for getting for the corresponding documentation summary of any first participle or paragraph abstract Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads to the terminal Show the document page.
8. device according to claim 7, which is characterized in that described device further includes:
First determination unit sends the related term to the terminal and carries out for determining the corresponding related term of the query word It has been shown that, the related term include the second participle;
Second query unit, for according to the inverted index, the corresponding document identification of inquiry second participle and/or paragraph Mark, segment corresponding document identification according to described second and/or paragraph mark determine the corresponding document of second participle and/ Or paragraph;
Second sequencing unit, for being ranked up to the corresponding document of second participle and/or paragraph;
Second acquisition unit, for obtaining the inquiry request for any related term, include by the related term second point The corresponding documentation summary of word and/or paragraph abstract are sent to terminal order according to ranking results and show;
Second transmission unit, for getting for any second corresponding documentation summary of participle or paragraph abstract Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads to the terminal Show the document page.
9. device according to claim 8, which is characterized in that first determination unit includes:
First determination subelement, for determining the feature vector of the first participle according to participle characteristic model and other are each The feature vector of participle;
Computation subunit, for calculating the phase between the feature vector of the first participle and the feature vector of other each participles Like degree;
Second determination subelement, the participle for the similarity of the feature vector with the first participle to be met to preset condition are true It is set to related term.
10. device according to claim 9, which is characterized in that first determination subelement is specifically used for:
Using the initial characteristics vector of any participle as the output of participle characteristic model, according to the word order in each document, by this Input of the initial characteristics vector segmented in priority preset range as the participle characteristic model is segmented, to the participle feature Model is trained, after the participle characteristic model reaches the condition of convergence, obtain the first participle feature vector and The feature vector of other each participles, the participle characteristic model are neural network model.
11. device according to claim 7, which is characterized in that described device further includes:
Second determination unit, it is described pre- for according to historical query word record, determining the corresponding predicted query word of the query word It includes third participle to survey query word;
Third query unit, for according to the inverted index, inquiring the third and segmenting corresponding document identification and/or paragraph Mark, corresponding document identification is segmented according to the third and/or paragraph mark determine the third segment corresponding document and/ Or paragraph;
Third sequencing unit, for being ranked up to the corresponding document of third participle and/or paragraph;
Third acquiring unit is used to obtain the inquiry request for any predicted query word, includes by the predicted query word Third segment corresponding documentation summary and/or paragraph abstract is sent to terminal order according to ranking results and shows;
Third transmission unit, for get for any third segment corresponding documentation summary or paragraph abstract Display request sends the document abstract or the corresponding document file page of paragraph abstract, so that the terminal loads to the terminal Show the document page.
12. device according to claim 7, which is characterized in that first sequencing unit is specifically used for:
According to the appearance of the Doctype of the corresponding document of the first participle, the first participle in every corresponding document Appearance ratio in every corresponding document of number, the first participle, the first participle are in every corresponding document Distance in every corresponding document of appearance position, each first participle in it is one or more, to described first Corresponding document is segmented to be ranked up;
And/or
According to occurrence number of the first participle in each corresponding paragraph, the first participle in each corresponding paragraph In appearance position in each corresponding paragraph of appearance ratio, the first participle, each first participle is each It is one or more in distance in corresponding paragraph, the corresponding paragraph of the first participle is ranked up.
CN201810142223.XA 2018-02-11 2018-02-11 A kind of target text display methods and device Pending CN108363682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810142223.XA CN108363682A (en) 2018-02-11 2018-02-11 A kind of target text display methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810142223.XA CN108363682A (en) 2018-02-11 2018-02-11 A kind of target text display methods and device

Publications (1)

Publication Number Publication Date
CN108363682A true CN108363682A (en) 2018-08-03

Family

ID=63005884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810142223.XA Pending CN108363682A (en) 2018-02-11 2018-02-11 A kind of target text display methods and device

Country Status (1)

Country Link
CN (1) CN108363682A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710844A (en) * 2018-12-20 2019-05-03 中国银行业监督管理委员会福建监管局 The method and apparatus for quick and precisely positioning file based on search engine
CN110162617A (en) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 Extract method, apparatus, language processing engine and the medium of summary info
CN110795553A (en) * 2019-09-09 2020-02-14 腾讯科技(深圳)有限公司 Abstract generation method and device
CN113448984A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Document positioning display method and device, server and electronic equipment
CN114722194A (en) * 2022-03-15 2022-07-08 电子科技大学 Automatic construction method of emergency time sequence based on abstract generation algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050149494A1 (en) * 2002-01-16 2005-07-07 Per Lindh Information data retrieval, where the data is organized in terms, documents and document corpora
CN101246492A (en) * 2008-02-26 2008-08-20 华中科技大学 Full text retrieval system based on natural language
CN103617266A (en) * 2013-12-03 2014-03-05 北京奇虎科技有限公司 Personalized extension search method, device and system
WO2017131753A1 (en) * 2016-01-29 2017-08-03 Entit Software Llc Text search of database with one-pass indexing including filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050149494A1 (en) * 2002-01-16 2005-07-07 Per Lindh Information data retrieval, where the data is organized in terms, documents and document corpora
CN101246492A (en) * 2008-02-26 2008-08-20 华中科技大学 Full text retrieval system based on natural language
CN103617266A (en) * 2013-12-03 2014-03-05 北京奇虎科技有限公司 Personalized extension search method, device and system
WO2017131753A1 (en) * 2016-01-29 2017-08-03 Entit Software Llc Text search of database with one-pass indexing including filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯贵川: "基于Word2vec的文本建模及分类研究", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 *
杨沛: "单汉字和词索引机制的模式比较", 《集美航海学院学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162617A (en) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 Extract method, apparatus, language processing engine and the medium of summary info
CN110162617B (en) * 2018-09-29 2022-11-04 腾讯科技(深圳)有限公司 Method, apparatus, language processing engine and medium for extracting summary information
CN109710844A (en) * 2018-12-20 2019-05-03 中国银行业监督管理委员会福建监管局 The method and apparatus for quick and precisely positioning file based on search engine
CN110795553A (en) * 2019-09-09 2020-02-14 腾讯科技(深圳)有限公司 Abstract generation method and device
CN110795553B (en) * 2019-09-09 2024-04-23 腾讯科技(深圳)有限公司 Digest generation method and device
CN113448984A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Document positioning display method and device, server and electronic equipment
CN113448984B (en) * 2021-07-15 2024-03-26 中国银行股份有限公司 Document positioning display method and device, server and electronic equipment
CN114722194A (en) * 2022-03-15 2022-07-08 电子科技大学 Automatic construction method of emergency time sequence based on abstract generation algorithm
CN114722194B (en) * 2022-03-15 2023-05-09 电子科技大学 Automatic construction method for emergency time sequence based on abstract generation algorithm

Similar Documents

Publication Publication Date Title
CN108363682A (en) A kind of target text display methods and device
CN109871483A (en) A kind of determination method and device of recommendation information
CN101119326B (en) Method and device for managing instant communication conversation record
CN101641697B (en) Related search queries for a webpage and their applications
US20160048754A1 (en) Classifying resources using a deep network
CN105808590B (en) Search engine implementation method, searching method and device
CN106547871A (en) Method and apparatus is recalled based on the Search Results of neutral net
CN109241526B (en) Paragraph segmentation method and device
US20150186938A1 (en) Search service advertisement selection
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN104899322A (en) Search engine and implementation method thereof
CN108319627A (en) Keyword extracting method and keyword extracting device
CN107679082A (en) Question and answer searching method, device and electronic equipment
CN106874292A (en) Topic processing method and processing device
CN106776860A (en) One kind search abstraction generating method and device
CN110727862A (en) Method and device for generating query strategy of commodity search
CN108509499A (en) A kind of searching method and device, electronic equipment
CN109271514A (en) Generation method, classification method, device and the storage medium of short text disaggregated model
CN109948140B (en) Word vector embedding method and device
CN110489638A (en) A kind of searching method, device, server, system and storage medium
CN113342948A (en) Intelligent question and answer method and device
CN108694183A (en) A kind of search method and device
US11176209B2 (en) Dynamically augmenting query to search for content not previously known to the user
CN106021615A (en) Method and device for optimizing title search
CN117391824B (en) Method and device for recommending articles based on large language model and search engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180803