CN113806491A - Information processing method, device, equipment and medium - Google Patents

Information processing method, device, equipment and medium Download PDF

Info

Publication number
CN113806491A
CN113806491A CN202111143741.1A CN202111143741A CN113806491A CN 113806491 A CN113806491 A CN 113806491A CN 202111143741 A CN202111143741 A CN 202111143741A CN 113806491 A CN113806491 A CN 113806491A
Authority
CN
China
Prior art keywords
document
text
query
similarity
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111143741.1A
Other languages
Chinese (zh)
Other versions
CN113806491B (en
Inventor
李舒
周永鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Comac Software Co ltd
Shanghai Aviation Industry Group Co ltd
Original Assignee
Comac Software Co ltd
Shanghai Aviation Industry Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Comac Software Co ltd, Shanghai Aviation Industry Group Co ltd filed Critical Comac Software Co ltd
Priority to CN202111143741.1A priority Critical patent/CN113806491B/en
Publication of CN113806491A publication Critical patent/CN113806491A/en
Application granted granted Critical
Publication of CN113806491B publication Critical patent/CN113806491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method, a device, equipment and a medium for information processing, wherein the method comprises the following steps: acquiring a query request sent by a query terminal; the query request carries a query text; according to the keywords in the query text and the first label of each document stored in a search library, carrying out similarity ranking on the documents in the search library; for each document, determining a target fragment text with similarity meeting preset requirements with the query text according to a second label corresponding to each fragment in the document; and sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence. By adopting the method, the problem that the retrieval result is not accurate enough is solved.

Description

Information processing method, device, equipment and medium
Technical Field
The present application relates to the field of information processing, and in particular, to a method, an apparatus, a device, and a medium for information processing.
Background
In the process of technological progress, scientific research data can be continuously accumulated, scientific research resources are basically stored in a database in a numerical form so as to be convenient for users to look up in the later production process, a search engine is developed as soon as possible in order to look up the data in the database, and the search engine is a system which collects information from the internet by using a specific computer program according to a certain strategy, provides retrieval service for the users after organizing and processing the information, and displays the retrieved related information to the users.
However, in the existing method of looking up data through a search engine, basically, a user inputs a query text, and a retrieval system directly provides a document related to the query text.
Disclosure of Invention
In view of the above, an object of the present application is to provide an information processing method, apparatus, device and medium, which are used to solve the problem in the prior art that search results are not accurate enough.
In a first aspect, an embodiment of the present application provides an information processing method, where the method includes:
acquiring a query request sent by a query terminal; the query request carries a query text;
according to the keywords in the query text and the first label of each document stored in a search library, carrying out similarity ranking on the documents in the search library;
for each document, determining a target fragment text with similarity meeting preset requirements with the query text according to a second label corresponding to each fragment in the document;
and sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
In one possible embodiment, the first tag of the document includes a second tag corresponding to each fragmented text in the document and a third tag corresponding to a headline of the document;
the second label comprises any one or more of the following words: the small-title key words and the text key words of the fragment texts, the first associated words with the small-title key words and the second associated words with the text key words are related;
the third label includes the following words: a headline keyword of the document and a third associated word having a correlation with the headline keyword.
In one possible embodiment, the first tag of each document stored in the search repository is obtained by:
for each document in the search library, fragmenting the document according to a preset segmentation requirement to obtain at least one fragmented text;
and integrating the second label of each fragment text in the document and the third label corresponding to the headline of the document into the first label of the document aiming at each document in the search library.
In one possible embodiment, the second label of the fragmented text is determined by:
determining at least one keyword of the fragment text based on content information of the fragment text;
determining relevant words having correlation with the keywords according to the similarity between the keywords and each candidate word in a relevant word bank;
and determining the keywords and associated words with correlation to the keywords as second labels of the fragment texts.
In a possible embodiment, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including:
for each document in the search library, if the document comprises a subheading, fragmenting the document according to the subheading to obtain at least one fragmented text; and if the document does not comprise the subtitles, fragmenting the document according to the segments to obtain at least one fragmented text.
In one possible real-time solution, the performing similarity ranking on the documents in the search base according to the keywords in the query text and the first tag of each document stored in the search base includes:
for each document stored in the search library, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text;
for each document stored in the search library, calculating the similarity between the keywords in the query text and the document of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, carrying out similarity ranking on the documents in the search library.
In one possible embodiment, the method further comprises:
and sending the keywords in the query text to a query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
In a second aspect, an embodiment of the present application provides an information processing apparatus, including:
the acquisition module is used for acquiring the query request sent by the query terminal; the query request carries a query text;
the ranking module is used for ranking the similarity of the documents in the search library according to the keywords in the query text and the first label of each document stored in the search library;
the determining module is used for determining a target fragment text with similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in each document;
and the sending module is used for sequencing the target fragment text of each document and the similarity of the documents and sending the sequence to the query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
In one possible embodiment, the first tag of the document includes a second tag corresponding to each fragmented text in the document and a third tag corresponding to a headline of the document;
the second label comprises any one or more of the following words: the small-title key words and the text key words of the fragment texts, the first associated words with the small-title key words and the second associated words with the text key words are related;
the third label includes the following words: a headline keyword of the document and a third associated word having a correlation with the headline keyword.
In one possible embodiment, the first tag of each document stored in the search pool in the ranking unit is obtained by:
for each document in the search library, fragmenting the document according to a preset segmentation requirement to obtain at least one fragmented text;
and integrating the second label of each fragment text in the document and the third label corresponding to the headline of the document into the first label of the document aiming at each document in the search library.
In a possible embodiment, the second label of the fragment text in the determination unit is determined by:
determining at least one keyword of the fragment text based on content information of the fragment text;
determining relevant words having correlation with the keywords according to the similarity between the keywords and each candidate word in a relevant word bank;
and determining the keywords and associated words with correlation to the keywords as second labels of the fragment texts.
In a possible embodiment, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including:
for each document in the search library, if the document comprises a subheading, fragmenting the document according to the subheading to obtain at least one fragmented text; and if the document does not comprise the subtitles, fragmenting the document according to the segments to obtain at least one fragmented text.
In a possible embodiment, the ranking module, when configured to rank the documents in the search base according to the similarity between the keywords in the query text and the first tag of each document stored in the search base, is specifically configured to:
for each document stored in the search library, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text;
for each document stored in the search library, calculating the similarity between the keywords in the query text and the document of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, carrying out similarity ranking on the documents in the search library.
In one possible embodiment, the apparatus further comprises:
and the display unit is used for sending the keywords in the query text to a query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the method in the first aspect are implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
According to the method and the device, the query request which is sent by the query terminal and carries the query text is obtained, and therefore the content which the user wants to search at the query terminal is determined according to the query text in the query request. And according to the keywords in the query text and the first label of each document stored in the search library, performing similarity sequencing on the documents in the search library, so that the documents are displayed on a display terminal in sequence according to the similarity of the documents and the query text. And for each document, determining a target fragment text with the similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in the document. And sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
Through the method, the target fragment text with high similarity to the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed according to the similarity from high to low.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of an information processing method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for determining a first tag of a document according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic flowchart of an information processing method according to an embodiment of the present application, and as shown in fig. 1, the method is implemented by the following steps:
step 101, acquiring a query request sent by a query terminal; the query request carries query text.
Specifically, the query text may be a whole document, or a word or a sentence, and the content of the query text is the content that the user wants to search through the query terminal. The query request comprises query text, time for submitting the query text, information of the user and the like, and also comprises one or more of document types selected by the user. The document types may include: information, bar, encyclopedia, library, web page. Specifically, when the user does not set the document search range, the search is performed in all documents in the search library by default.
And 102, carrying out similarity sequencing on the documents in the search library according to the keywords in the query text and the first label of each document stored in the search library.
Specifically, the keywords in the query text may be marked when the system implements marking or querying, or may be obtained by the system through retrieval and extraction according to big data or a search library. For example, when the query text is "how much the operating voltage of the single chip microcomputer is", if the first label about "single chip microcomputer" and "voltage" is marked in advance in the search library, the keywords in the query text are determined as "single chip microcomputer" and "operating voltage". Or, a keyword extraction formula or an extraction model is set for the query text, so that key words such as verbs and nouns in the query text are extracted. In the embodiment of the application, the documents in the retrieval library can be classified according to the first tag or the field corresponding to the documents, so that the user can perform classified retrieval during retrieval or search, and the data processing amount of the system is reduced. The first label of the document comprises a second label corresponding to each fragment text in the document and a third label corresponding to a headline of the document; the second label comprises any one or more of the following words: the small-title key words and the text key words of the fragment texts, the first associated words with the small-title key words and the second associated words with the text key words are related; the third label includes the following words: a headline keyword of the document and a third associated word having a correlation with the headline keyword.
According to the determined key words in the query text, determining the similarity between the key words and the first tags of all the documents stored in the search library, according to the key words and the first tags, determining the similarity between the documents in the search library and the query text, and sequencing the documents in the search library according to the numerical value of the similarity.
Step 103, determining a target fragment text with similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in each document.
Specifically, each fragment in the document corresponds to the key information of a section or a paragraph in the document, and the finer the fragment in the document is divided, the more the second tag of the document is stored in the search library.
And for each document, determining the similarity between the document and the query text according to the second label marked for the document and the keywords in the query text and the query text, and taking the text formed by the vocabulary, sentences or paragraphs in the document, the similarity of which with the query text meets the preset requirement, as the target fragment text.
And 104, sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
Specifically, after the target fragment text with the similarity to the query text meeting the preset requirement is obtained in step 103, the similarity between the target fragment text and the document corresponding to the target fragment text and the query text is ranked according to the similarity value of the query text, and the target fragment text arranged according to the similarity value and the document corresponding to each target fragment text are sent to the query terminal, so that the query terminal displays the target fragment text of each document according to the similarity ranking. In the embodiment of the application, the display terminal may have a plurality of ways of displaying the target fragment text, for example, a path capable of leading to a document corresponding to the target fragment text is provided in the display terminal; or directly positioning the region where the target fragment text is located in the document corresponding to the target fragment text, and highlighting the content in the target fragment text in the document.
Specifically, for each document, the fragment text is a text formed by content corresponding to each fragment in the document, and when each fragment is a paragraph in the document, the content in each paragraph forms a fragment text; when each fragment is a section of content in a document, all the content of each section is regarded as a fragment text, no matter how many sections of characters or images exist in each section. If information such as images and audios appears, the part of information can be deleted, or the part of information can be converted into text information through a specific means, subtitles in fragment texts, keywords in subtitles, text keywords of the fragment texts, first associated words related to the subtitles keywords, and second associated words related to the text keywords can be used as second labels of the fragment texts. The second label of the fragment text may also be at least two words and a correspondence between each word. At least one second label may be provided for each fragmented text.
According to the method and the device, the query request which is sent by the query terminal and carries the query text is obtained, and therefore the content which the user wants to search at the query terminal is determined according to the query text in the query request. And according to the keywords in the query text and the first label of each document stored in the search library, performing similarity sequencing on the documents in the search library, so that the documents are displayed on a display terminal in sequence according to the similarity of the documents and the query text. And for each document, determining a target fragment text with the similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in the document. And sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
Through the method, the target fragment text with high similarity to the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed according to the similarity from high to low.
In a possible implementation, fig. 2 is a flowchart illustrating a method for determining a first tag of a document according to an embodiment of the present application, where as shown in fig. 2, the first tag of each document stored in the search library is obtained by:
step 201, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text.
Specifically, before fragmenting the document according to a preset segmentation requirement, the method further includes: obtaining each document in a database, judging the type of the document aiming at each document in the database, and if the document is of a character type, storing the document into the search library. The documents in the database can be uploaded and acquired from public channels, or can be obtained by statistics according to big data and other modes. If the type of the document is a non-character type, the non-character type document can be arranged into a character type document through a character conversion plug-in, and then the character type document is stored in the database.
The number of documents in the search corpus is limited. For each document in a search base, fragmenting the document to divide the document into at least one fragmented text. According to the actual situation, the preset segmentation requirement is adjusted, for example, the preset segmentation requirement may be that a text composed of each paragraph in the document is used as a fragment text; it is also possible to make each text made up of the contents of each measure in the document as fragmented text. In the embodiment of the present application, the document that is judged to be the text type may also be stored in a preset area in the database, and an access interface of the preset area in the database is provided to the search library, so that the user can access the target document in the database through the interface in the search library.
Step 202, aiming at each document in the search library, integrating the second label of each fragment text in the document and the third label corresponding to the headline of the document into the first label of the document.
Specifically, after the fragmented texts are sorted out in step 201, at least one second tag and at least one third tag are marked for each fragmented text, and the corresponding second tag and third tag of the document are used as the first tag of the document.
In one possible embodiment, the second label of the fragmented text is determined by: determining at least one keyword of the fragment text based on content information of the fragment text; determining relevant words having correlation with the keywords according to the similarity between the keywords and each candidate word in a relevant word bank; and determining the keywords and associated words with correlation to the keywords as second labels of the fragment texts.
Specifically, the candidate words are words with similar or identical semantics such as similar words, synonyms, paraphrases, abbreviations and the like of each keyword in the fragmented text. The content information is all contents constituting the fragmented text, and the content information includes but is not limited to: subtitle, large title, text content, image and audio content, etc. Determining at least one keyword in the fragment text according to the content information in the fragment text, wherein the keyword comprises: a large title, a small title, a body word, and a summary word for the fragmented text. And determining the similarity between the keywords in the fragment text and each candidate word in the associated word bank aiming at each keyword determined for the fragment text, and determining the candidate words which accord with a preset association threshold value as the associated words of the keywords in the fragment text according to the obtained similarity result. And determining each keyword in the fragment text and the associated word with the correlation with each keyword as a second label of the fragment text.
In a possible embodiment, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including: for each document in the search library, if the document comprises a subheading, fragmenting the document according to the subheading to obtain at least one fragmented text; and if the document does not comprise the subtitles, fragmenting the document according to the segments to obtain at least one fragmented text.
Specifically, in the embodiment of the present application, a document is divided according to subtitles, and a text composed of each subtitle and corresponding content under the subtitle is used as a fragment text; when the document has no subtitles or only one title, the document is divided according to paragraphs, and texts formed by the contents corresponding to each paragraph are used as fragment texts. The embodiment of the application does not limit the division manner of the fragment text, and for example, the fragment text can be divided according to keywords, a description manner and description contents.
According to the method and the device, the query request which is sent by the query terminal and carries the query text is obtained, and therefore the content which the user wants to search at the query terminal is determined according to the query text in the query request. And according to the keywords in the query text and the first label of each document stored in the search library, performing similarity sequencing on the documents in the search library, so that the documents are displayed on a display terminal in sequence according to the similarity of the documents and the query text. And for each document, determining a target fragment text with the similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in the document. And sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
Through the method, the target fragment text with high similarity to the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed according to the similarity from high to low.
In one possible embodiment, the step 102 of ranking the similarity of the documents in the search base according to the keywords in the query text and the first tag of each document stored in the search base includes:
step 1021, calculating the vocabulary similarity between each vocabulary in the first label of the document and the keywords in the query text aiming at each document stored in the search library.
Specifically, the similarity calculation method may be to compare the vocabulary in the first tag with the vocabulary in the query text word by word, or may determine semantic similarity between the vocabulary in the first tag and the vocabulary in the query text through a model, and calculate a similarity value between the vocabulary in the first tag and the vocabulary in the query text.
Step 1022, for each document stored in the search library, according to the calculated vocabulary similarity corresponding to each vocabulary in the first tag of the document and the weight of each vocabulary in the first tag of the document, calculating the similarity between the keyword in the query text and the document of each document stored in the search library.
Specifically, the weight of the vocabulary may be preset or determined according to the tag group in which the vocabulary is located, for example, the weight of the vocabulary in the second tag is set to be different from the weight of the vocabulary in the third tag, or the vocabulary in the document is classified, and the weight of the headline is set to be the first numerical value, the weight of the subtitle is set to be the second numerical value, and the weight of the text keyword is set to be the third numerical value. After calculating the similarity value between each vocabulary in the first tags of all documents in the search base and the keywords in the query text in step 1021, calculating the similarity between the keywords in the query text and each document stored in the search base according to the calculated similarity value and weight. By adjusting the weight of each vocabulary in the first label, different document similarity ranks can be obtained.
And 1023, sorting the similarity of the documents in the search library according to the similarity of the documents.
Specifically, after the similarity between each document in the search corpus and the query text is calculated in step 1022, the documents are ranked. In the embodiment of the application, documents corresponding to the numerical value of the document similarity lower than the lowest similarity threshold are not displayed by setting a lowest similarity threshold, or words lower than the lowest similarity threshold in the first tag are removed and do not participate in document similarity calculation, so that the browsing time of a user on useless information is reduced or the calculation amount of a system is reduced.
In one possible embodiment, the method further comprises:
and sending the keywords in the query text to a query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
Specifically, at the query terminal, the keywords in the query text are displayed separately or highlighted, and the target fragment text searched for the keywords is displayed. And highlighting the vocabulary with the similarity of the keywords with the query text exceeding a preset threshold value in the target fragment text.
Fig. 3 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application, corresponding to the information processing method in fig. 1, and as shown in fig. 3, the information processing apparatus includes: an obtaining module 301, a sorting module 302, a determining module 303, and a sending module 304.
An obtaining module 301, configured to obtain a query request sent by a query terminal; the query request carries query text.
A ranking module 302, configured to rank, according to the keyword in the query text and the first tag of each document stored in the search library, the documents in the search library according to the similarity.
The determining module 303 is configured to determine, for each document, a target fragment text whose similarity to the query text meets a preset requirement according to the second tag corresponding to each fragment in the document.
A sending module 304, configured to rank the target fragment text of each document and the similarity of the document, and send the target fragment text of each document and the similarity to the query terminal, so that the query terminal displays the target fragment text of each document according to the similarity ranking.
In one possible embodiment, the first tag of the document includes a second tag corresponding to each fragmented text in the document and a third tag corresponding to a headline of the document.
The second label comprises any one or more of the following words: the fragment text comprises a subtitle keyword and a text keyword of the fragment text, a first associated word having correlation with the subtitle keyword, and a second associated word having correlation with the text keyword.
The third label includes the following words: a headline keyword of the document and a third associated word having a correlation with the headline keyword.
In a possible embodiment, the first tag of each document stored in the search pool in the ranking unit is obtained by the following steps.
And for each document in the search library, fragmenting the document according to a preset segmentation requirement to obtain at least one fragmented text.
And integrating the second label of each fragment text in the document and the third label corresponding to the headline of the document into the first label of the document aiming at each document in the search library.
In a possible embodiment, the second label of the fragment text in the determination unit is determined by the following steps.
Determining at least one keyword of the fragment text based on content information of the fragment text.
And determining the associated words having correlation with the keywords according to the similarity between the keywords and each candidate word in the associated word library.
And determining the keywords and associated words with correlation to the keywords as second labels of the fragment texts.
In a possible embodiment, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including:
for each document in the search library, if the document comprises a subheading, fragmenting the document according to the subheading to obtain at least one fragmented text; and if the document does not comprise the subtitles, fragmenting the document according to the segments to obtain at least one fragmented text.
In a possible embodiment, the ranking module, when configured to rank the documents in the search base according to the similarity between the keywords in the query text and the first tag of each document stored in the search base, is specifically configured to:
for each document stored in the search base, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text.
And aiming at each document stored in the search library, calculating the similarity between the keywords in the query text and the document of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document.
And according to the document similarity of each document, carrying out similarity ranking on the documents in the search library.
In one possible embodiment, the apparatus further comprises:
and the display unit is used for sending the keywords in the query text to a query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
According to the method and the device, the query request which is sent by the query terminal and carries the query text is obtained, and therefore the content which the user wants to search at the query terminal is determined according to the query text in the query request. And according to the keywords in the query text and the first label of each document stored in the search library, performing similarity sequencing on the documents in the search library, so that the documents are displayed on a display terminal in sequence according to the similarity of the documents and the query text. And for each document, determining a target fragment text with the similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in the document. And sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
Through the method, the target fragment text with high similarity to the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed according to the similarity from high to low.
Corresponding to the method in fig. 1, a computer device 400 is further provided in the embodiments of the present application, and fig. 4 is a schematic structural diagram of a computer device provided in the embodiments of the present application, where the device includes a memory 401, a processor 402, and a computer program stored in the memory 401 and executable on the processor 402, where the processor 402 implements the information processing method when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general-purpose memory and processor, which are not limited in particular, and when the processor 402 runs the computer program stored in the memory 401, the method for processing information can be executed, so as to solve the problem in the prior art that the search result is not accurate enough.
Corresponding to an information processing method in fig. 1, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the information processing method.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the method for processing information can be executed, so that the problem that a search result in the prior art is not accurate enough is solved.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of information processing, comprising:
acquiring a query request sent by a query terminal; the query request carries a query text;
according to the keywords in the query text and the first label of each document stored in a search library, carrying out similarity ranking on the documents in the search library;
for each document, determining a target fragment text with similarity meeting preset requirements with the query text according to a second label corresponding to each fragment in the document;
and sequencing the target fragment text of each document and the similarity of the documents, and sending the sequence to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
2. The method of claim 1, wherein the first tag of the document comprises a second tag corresponding to each fragmented text in the document and a third tag corresponding to a headline of the document;
the second label comprises any one or more of the following words: the small-title key words and the text key words of the fragment texts, the first associated words with the small-title key words and the second associated words with the text key words are related;
the third label includes the following words: a headline keyword of the document and a third associated word having a correlation with the headline keyword.
3. The method of claim 1, wherein the first tag for each document stored in the corpus is obtained by:
for each document in the search library, fragmenting the document according to a preset segmentation requirement to obtain at least one fragmented text;
and integrating the second label of each fragment text in the document and the third label corresponding to the headline of the document into the first label of the document aiming at each document in the search library.
4. The method of claim 3, wherein the second label of the fragmented text is determined by:
determining at least one keyword of the fragment text based on content information of the fragment text;
determining relevant words having correlation with the keywords according to the similarity between the keywords and each candidate word in a relevant word bank;
and determining the keywords and associated words with correlation to the keywords as second labels of the fragment texts.
5. The method of claim 3, wherein fragmenting each document in the search corpus according to a preset segmentation requirement to obtain at least one fragmented text comprises:
for each document in the search library, if the document comprises a subheading, fragmenting the document according to the subheading to obtain at least one fragmented text; and if the document does not comprise the subtitles, fragmenting the document according to the segments to obtain at least one fragmented text.
6. The method of claim 1, wherein the similarity ranking of the documents in the search corpus according to the keywords in the query text and the first tag of each document stored in the search corpus comprises:
for each document stored in the search library, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text;
for each document stored in the search library, calculating the similarity between the keywords in the query text and the document of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, carrying out similarity ranking on the documents in the search library.
7. The method of claim 1, further comprising:
and sending the keywords in the query text to a query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
8. An information processing apparatus, comprising:
the acquisition module is used for acquiring the query request sent by the query terminal; the query request carries a query text;
the ranking module is used for ranking the similarity of the documents in the search library according to the keywords in the query text and the first label of each document stored in the search library;
the determining module is used for determining a target fragment text with similarity meeting preset requirements with the query text according to the second label corresponding to each fragment in each document;
and the sending module is used for sequencing the target fragment text of each document and the similarity of the documents and sending the sequence to the query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of the preceding claims 1-7 are implemented by the processor when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of the preceding claims 1 to 7.
CN202111143741.1A 2021-09-28 2021-09-28 Information processing method, device, equipment and medium Active CN113806491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111143741.1A CN113806491B (en) 2021-09-28 2021-09-28 Information processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111143741.1A CN113806491B (en) 2021-09-28 2021-09-28 Information processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113806491A true CN113806491A (en) 2021-12-17
CN113806491B CN113806491B (en) 2024-06-25

Family

ID=78938891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111143741.1A Active CN113806491B (en) 2021-09-28 2021-09-28 Information processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113806491B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357511A (en) * 2021-12-30 2022-04-15 北京鼎普科技股份有限公司 Method and device for marking key content of document and user terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678412A (en) * 2012-09-21 2014-03-26 北京大学 Document retrieval method and device
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
US20190347281A1 (en) * 2016-11-11 2019-11-14 Dennemeyer Octimine Gmbh Apparatus and method for semantic search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678412A (en) * 2012-09-21 2014-03-26 北京大学 Document retrieval method and device
US20190347281A1 (en) * 2016-11-11 2019-11-14 Dennemeyer Octimine Gmbh Apparatus and method for semantic search
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357511A (en) * 2021-12-30 2022-04-15 北京鼎普科技股份有限公司 Method and device for marking key content of document and user terminal

Also Published As

Publication number Publication date
CN113806491B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN108829893B (en) Method and device for determining video label, storage medium and terminal equipment
TWI536181B (en) Language identification in multilingual text
CN106156204B (en) Text label extraction method and device
TWI431493B (en) Method, computer readable storage medium, and computer system for optimization of fact extraction using a multi-stage approach
CN103678576B (en) The text retrieval system analyzed based on dynamic semantics
KR101098703B1 (en) System and method for identifying related queries for languages with multiple writing systems
US20180004838A1 (en) System and method for language sensitive contextual searching
US8782049B2 (en) Keyword presenting device
CN109634436B (en) Method, device, equipment and readable storage medium for associating input method
CN110609998A (en) Data extraction method of electronic document information, electronic equipment and storage medium
CN102214189B (en) Data mining-based word usage knowledge acquisition system and method
CN111428494A (en) Intelligent error correction method, device and equipment for proper nouns and storage medium
JP4865526B2 (en) Data mining system, data mining method, and data search system
CN111460099A (en) Keyword extraction method, device and storage medium
CN115794995A (en) Target answer obtaining method and related device, electronic equipment and storage medium
CN113901173A (en) Retrieval method, retrieval device, electronic equipment and computer storage medium
US20150242493A1 (en) User-guided search query expansion
CN112487159B (en) Search method, search device, and computer-readable storage medium
CN113806491B (en) Information processing method, device, equipment and medium
CN111460177A (en) Method and device for searching film and television expression, storage medium and computer equipment
CN112269852B (en) Method, system and storage medium for generating public opinion themes
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN110807322B (en) Method, device, server and storage medium for identifying new words based on information entropy
JP2007323238A (en) Highlighting device and program
JP2006139484A (en) Information retrieval method, system therefor and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant