CN109147934B

CN109147934B - Inquiry data recommendation method, device, computer equipment and storage medium

Info

Publication number: CN109147934B
Application number: CN201810724291.7A
Authority: CN
Inventors: 高羽; 柳恭; 葛培明; 孙行智
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-07-04
Filing date: 2018-07-04
Publication date: 2023-04-11
Anticipated expiration: 2038-07-04
Also published as: CN109147934A; WO2020007028A1

Abstract

The application relates to an inquiry data recommendation method, an inquiry data recommendation device, computer equipment and a storage medium. The method comprises the following steps: obtaining a current question to be answered and performing word segmentation, extracting feature words according to word segmentation results, and obtaining a first feature word set corresponding to the current question to be answered; acquiring a second feature word set corresponding to each index node in a pre-established index; respectively calculating cosine similarity between the first feature word set and the second feature word set, and sorting the index nodes according to the first similarity calculation result to select a preset number of index nodes as target index nodes to obtain a target index node set; acquiring question-answer pairs corresponding to the target index nodes from a question-answer database; and respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting the question-answer pairs according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair.

Description

Inquiry data recommendation method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of online inquiry technology, and in particular, to an inquiry data recommendation method, apparatus, computer device, and storage medium.

Background

With the rapid development of internet technology, online inquiry and online health consultation based on the internet are favored by more and more people. In both online inquiry and online health consultation, each user expects the most prompt response from the physician after asking the question.

In the traditional technology, after a doctor sees a question of a user, the doctor needs to think, organize the language, write and answer and finally click and send, and the user can see a reply to the question, so that the inquiry efficiency is low.

Disclosure of Invention

In view of the above, it is necessary to provide an inquiry data recommendation method, apparatus, computer device and storage medium capable of improving inquiry efficiency.

An inquiry data recommendation method comprising:

obtaining a current question to be answered, segmenting words of the current question to be answered, extracting characteristic words according to word segmentation results, and obtaining a first characteristic word set corresponding to the current question to be answered;

acquiring a second feature word set corresponding to each index node in a pre-established index;

respectively calculating first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and sequencing the index nodes according to the first similarity calculation result to select a preset number of index nodes as target index nodes to obtain a target index node set;

obtaining question-answer pairs corresponding to each target index node in the target index node set from a question-answer database;

and respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting the question-answer pairs according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair.

In one embodiment, the step of obtaining the current question to be answered comprises:

acquiring an inquiry information set corresponding to previous inquiry, and preprocessing the inquiry information set;

extracting question-answer pairs from the preprocessed inquiry information set, and performing feature extraction on the extracted question-answer pairs;

correspondingly storing the question-answer pairs and the feature pairs corresponding to the question-answer pairs into a question-diagnosis database;

and establishing an index for the inquiry database according to the characteristics.

In one embodiment, the extracting challenge-answer pairs from the preprocessed inquiry information set includes:

acquiring a user identifier corresponding to each piece of inquiry information in the inquiry information set, wherein the user identifier is an inquiry user identifier or a doctor user identifier;

filtering inquiry information corresponding to the doctor user identification according to a preset rule;

and extracting question-answer pairs according to punctuation marks and question words for the filtered inquiry information set.

In one embodiment, the feature extraction of the extracted question-answer pairs includes:

performing word segmentation on the extracted questions in the question-answer pairs to obtain a word set corresponding to the questions;

and respectively matching each word in the word set with each word in a pre-established feature word library, and when the matching is successful, taking the word as the extracted feature.

In one embodiment, the step of respectively calculating a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node includes:

calculating a feature weight of each feature word in the first feature word set to obtain a first calculation result, and selecting a keyword according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered;

calculating the feature weight of each feature word in a second feature word set to obtain a second calculation result, and selecting keywords according to the second calculation result to obtain a second keyword set corresponding to each index node;

obtaining a first word frequency vector corresponding to the current question to be answered and a second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set;

and respectively calculating the cosine values of included angles between the first word frequency vectors and the second word frequency vectors to obtain first similarity.

In one embodiment, the calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes:

calculating the initial characteristic weight of each characteristic word in the first characteristic word set by adopting a word frequency-inverse document frequency algorithm;

when any one feature word in the first feature word set meets a preset adjusting rule, adjusting the initial feature weight of the feature word according to the preset adjusting rule to obtain a final feature weight;

and when any one feature word in the first feature word set does not meet a preset adjusting rule, taking the initial feature weight as a final feature weight.

An interrogation data recommendation apparatus, the apparatus comprising:

the first feature word set acquisition module is used for acquiring a current question to be answered, segmenting the current question to be answered, extracting feature words according to segmentation results and obtaining a first feature word set corresponding to the current question to be answered;

the second characteristic word set acquisition module is used for acquiring a second characteristic word set corresponding to each index node in the pre-established index;

a target index node set obtaining module, configured to calculate first similarities between a first feature word set corresponding to the current question to be answered and second feature word sets corresponding to the index nodes, respectively, and sort the index nodes according to a first similarity calculation result to select a preset number of index nodes as target index nodes, so as to obtain a target index node set;

the question-answer pair acquisition module is used for acquiring question-answer pairs corresponding to each target index node in the target index node set from the question-call database;

and the recommending module is used for respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair.

In one embodiment, the apparatus further comprises:

the system comprises a preprocessing module, a query module and a query module, wherein the preprocessing module is used for acquiring an inquiry information set corresponding to inquiry of each time and preprocessing the inquiry information set;

the characteristic extraction module is used for extracting question-answer pairs from the preprocessed inquiry information set and extracting the characteristics of the extracted question-answer pairs;

the storage module is used for correspondingly storing the question-answer pairs and the feature pairs corresponding to the question-answer pairs into a inquiry database;

and the index establishing module is used for establishing an index for the inquiry database according to the characteristics.

A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above-mentioned inquiry data recommendation method when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned inquiry data recommendation method.

According to the method, the device, the computer equipment and the storage medium for recommending the inquiry data, firstly, a feature word sequence set corresponding to a question to be answered is obtained, then, first similarity between the feature word sequence set of the question to be answered and a feature word sequence set of each node in an index is calculated, some nodes with the maximum similarity are selected as target nodes, then, question-answer pairs corresponding to the nodes are searched, second similarity between the question to be answered and the question in the question-answer pairs is calculated, some question-answer pairs with the maximum similarity are selected as the target question-answer pairs, and inquiry data are recommended according to the question-answer pairs.

Drawings

FIG. 1 is a diagram illustrating an exemplary scenario for a method for recommending inquiry data;

FIG. 2 is a schematic flow chart diagram of a method for recommending interrogation data in one embodiment;

FIG. 3 is a flowchart illustrating a step S202 before according to an embodiment;

FIG. 4 is a flowchart illustrating a step S304 according to an embodiment;

FIG. 5 is a flowchart illustrating step S206 according to an embodiment;

FIG. 6 is a flowchart illustrating a step S502 according to an embodiment;

FIG. 7 is a block diagram showing the construction of an apparatus for recommending inquiry data in one embodiment;

FIG. 8 is a block diagram showing the construction of an apparatus for recommending inquiry data in another embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The inquiry data recommendation method provided by the application can be applied to the application environment shown in fig. 1. The inquiry terminal 102 and the doctor terminal 104 communicate with the server 106 via a network, respectively. After receiving the questions to be answered sent by the inquiry terminal, the server 106 performs word segmentation on the current question to be answered, extracts feature words according to word segmentation results, obtains a first feature word set corresponding to the current question to be answered, obtains a second feature word set corresponding to each index node in a pre-established index library, calculates first similarities between the first feature word set corresponding to the current question to be answered and the second feature word sets corresponding to the index nodes respectively, sequences the index nodes according to the first similarity calculation results to select a preset number of indexes as target index nodes to obtain a target index node set, searches for answer pairs corresponding to each target index node from the inquiry information database, calculates second similarities between the current question to be answered and the questions corresponding to the question-answer pairs respectively, sequences the question-answer pairs according to the second similarity calculation results to select the target answer pairs, and recommends the question-answer pairs according to the selected target answer pairs, and performs inquiry data recommendation on the doctor terminal, wherein the inquiry data can be the whole target question-answer pairs or only be the target answer information.

The inquiry terminal 102 and the doctor terminal 104 may be, but not limited to, various personal computers, notebook computers, smart phones, and tablet computers, and the server 106 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, there is provided an inquiry data recommendation method, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step S202, obtaining a current question to be answered, performing word segmentation on the current question to be answered, extracting feature words according to word segmentation results, and obtaining a first feature word set corresponding to the current question to be answered.

Specifically, the questions to be answered refer to the inquiry questions input by the inquiry user at the inquiry terminal. When an inquiry user inputs an inquiry question at an inquiry terminal, the server receives the inquiry question sent by the inquiry terminal, and performs word segmentation on the inquiry question to obtain a word segmentation result, wherein the word segmentation result refers to a word sequence formed by words obtained after word segmentation. For example, the word segmentation result obtained after the word segmentation of "how do my belly pain" can be: i/belly pain/what do.

Segmenting words of the current question to be answered, namely segmenting the question to be answered into a complete sentence according to punctuation marks, and then segmenting the words of each segmented sentence, for example, segmenting the words of each segmented sentence by using a word segmentation method matched with character strings, for example, segmenting the character strings in one segmented sentence from left to right by using a forward maximum matching method; or, the reverse maximum matching method, which divides the character string in a segmented sentence from right to left; or, the shortest path word segmentation method, the number of words required to be cut out in a character string in a segmented sentence is minimum; or, the bidirectional maximum matching method carries out word segmentation matching in forward and reverse directions simultaneously. The word segmentation processing can be carried out on each segmented sentence by utilizing a word meaning word segmentation method, wherein the word meaning word segmentation method is a word segmentation method for machine voice judgment, and words are segmented by utilizing syntactic information and semantic information to process an ambiguity phenomenon. And performing word segmentation processing on each segmented sentence by using a statistical word segmentation method, and performing word segmentation by taking two adjacent words as phrases according to statistics of the phrases from historical search records of current users or historical search records of public users.

Further, the server extracts the feature words according to the word segmentation result. In one embodiment, the extracting of the feature words may specifically be that each word in the word segmentation result is matched with each word in a pre-established feature word library one by one, and the matched word is used as the feature word. In one embodiment, a match may be that two words are identical. In another embodiment, matching may be that the similarity between two words exceeds a preset threshold, such as "belly pain" and "belly pain" as the two words match each other. The characteristic vocabulary library may be authoritative explanations of various diseases obtained from an existing medical database, including professional information such as profiles, symptoms, complications, therapeutic drugs and general examinations corresponding to the authoritative explanations, or medical information corresponding to various drugs, such as types of diseases for which the drugs are mainly used, or specific types of information (for example, treatment schemes, treatment drugs, departments to which the medical data belong, clinical manifestations corresponding to different diseases, and the like) obtained from an open-source medical data source (for example, questions and answers, discussions and the like about different diseases on various forums, or various new medical cases, medical question and answer texts and the like) on the internet in real time or at regular time through a tool such as a web crawler and the like.

Step S204, acquiring a second feature word set corresponding to each index node in the pre-established index.

Specifically, for historical inquiry data, question-answer pairs are extracted in advance, then feature extraction is performed on the question-answer pairs, the extracted features at least comprise feature words corresponding to questions in the question-answer pairs, the feature words form a second feature word set, the question-answer pairs and features corresponding to the question-answer pairs are stored in the same row of a data table of an inquiry database, finally an index is established on the inquiry database according to column data where the features are located, each index node in the index comprises an index value and a pointer, the index value at least comprises the second feature word set corresponding to each question-answer pair, the pointer refers to a memory area, and reference of data of the corresponding row recorded on a hard disk is recorded in the memory area. Wherein, the question-answer pair refers to an information pair formed by the question of the user and the answer of the doctor. The question-answer pair can be composed of one question of the inquiry user and one answer of the doctor, can also be composed of one question of the inquiry user and a plurality of answers of the doctor, can also be composed of a plurality of continuous questions of the inquiry user and one answer of the doctor, and can also be composed of a plurality of continuous questions of the inquiry user and a plurality of continuous answers of the doctor.

In this embodiment, the server sequentially traverses each index node in the index, and retrieves an index value of the index node to obtain a second feature word set corresponding to each index node.

Step S206, respectively calculating first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and sequencing the index nodes according to the first similarity calculation result to select a preset number of index nodes as target index nodes to obtain a target index node set.

Specifically, the first similarity is used for representing the similarity degree of the first characteristic word set and the second characteristic word set. In one embodiment, the first similarity may be cosine similarity, the cosine similarity between a first feature word set corresponding to a current question to be answered and a second feature word set corresponding to any index node is calculated, keywords may be extracted from the first feature word set and the second feature word set respectively to obtain a first keyword set corresponding to the question to be answered and a second keyword set corresponding to the index node, then respective word frequency vectors of the first keyword set and the second keyword set are calculated, and finally an included angle cosine value of the two word frequency vectors is calculated to obtain the cosine similarity.

Further, the server sorts the index nodes of the index database according to the cosine similarity, and selects a preset number of index nodes as target index nodes according to the sorting result to obtain a target index node set. In an embodiment, the server may perform descending order arrangement on the index nodes according to the magnitude of the cosine similarity, and select the index node of the TOPN1 as the target index node, where N1 is a preset value set in advance, and may be set and adjusted according to experience.

And step S208, obtaining question-answer pairs corresponding to each target index node in the target index node set from the question-diagnosis database.

Specifically, each index node in the index has stored therein a pointer to the corresponding row in the table in the interrogation database. The data of the corresponding row corresponding to the index node can be obtained through the pointer, and the question-answer pair is data of one column in the row of data, so that the question-answer pair corresponding to the index node can be obtained through the index node.

Step S210, respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair.

Specifically, the second similarity is used to represent the degree of similarity between the current question to be answered and the question corresponding to each question-answer pair. In one embodiment, the second similarity may be a string similarity. Calculating a second similarity between the current question to be answered and the question corresponding to each question-answer pair, specifically, the method may include the following steps: after the server acquires the question-answer pairs corresponding to each target index node, an editing distance between the current question to be answered and the question in each obtained question-answer pair is calculated, wherein the editing distance refers to the minimum number of times required for editing a single character (such as modification, insertion and deletion) when one character string is modified to another character string. And then calculating the similarity of the character strings between the current question to be answered and the question in each question-answer pair in the obtained question-answer pairs according to the editing formula, wherein the formula is as follows: simiarity = (Max (x, y) -Levenshtein)/Max (x, y), wherein x is the length of the character string corresponding to the question to be answered, y is the length of the character string corresponding to the question in the question-answer pair, and Levenshtein is the edit distance.

Further, the server sorts the question-answer pairs obtained in step S208 according to the similarity of the character strings, selects a preset number of question-answer pairs as target question-answer pairs according to the sorting result, and performs the inquiry data recommendation according to the target question-answer pairs. In an embodiment, the server may perform descending order arrangement on the question-answer pairs obtained in step S208 according to the similarity of the character strings, and select the question-answer pair of TOPN2 as the target question-answer pair, where N2 is a preset value and may be adjusted according to experience.

In an embodiment, the server may recommend all the target question-answer pairs to the doctor terminal according to the target question-answer pairs, or may select any one of the question-answer pairs to be recommended to the doctor terminal, or recommend the first question-answer pair to the doctor terminal, and how to recommend the first question-answer pair is specifically, the application is not limited herein.

In another embodiment, the server may also directly select the answers in the target question-answer pairs to recommend to the doctor terminal, may recommend the answers in all the target question-answer pairs to the doctor terminal, may recommend the answer in any one question-answer pair to the doctor terminal, or recommend the answer in the first question-answer pair to the doctor terminal, and how to recommend the answer is not limited herein.

In the method for recommending the inquiry data, a server firstly obtains a feature word set corresponding to a question to be answered, then calculates first similarity between the feature word set of the question to be answered and feature word sets of index nodes in an index, selects some nodes with the maximum similarity as target nodes, then searches question-answer pairs corresponding to the nodes, calculates second similarity between the question to be answered and questions in the question-answer pairs, selects some question-answer pairs with the maximum character string similarity as target question-answer pairs, and recommends the inquiry data according to the question-answer pairs.

In one embodiment, as shown in fig. 3, step S202 is preceded by:

step S302, acquiring an inquiry information set corresponding to each inquiry, and preprocessing the inquiry information set.

Specifically, the historical inquiry refers to each inquiry completed before the current time, and the inquiry information set refers to the information set inquiry information consisting of the inquiry information of the inquiry user and the reply information of the doctor user in one complete inquiry.

In the present embodiment, the preprocessing includes clause, resolution of reference, context processing, and the like. The sentence segmentation refers to segmenting one piece of information into single sentences; the reference resolution refers to calculating the reference content of pronouns in sentences, and calculation can be carried out through syntactic analysis and editing distance; context processing refers to completing a context. For example: d: is you dizziness? U: what is, is extended to that i is dizziness. The meaning expressed by the second sentence is more comprehensive; context processing uses syntactic analysis and sentence judgment.

And step S304, extracting question-answer pairs from the preprocessed inquiry information set, and performing feature extraction on the extracted question-answer pairs.

Specifically, in a complete inquiry of an inquiry user, a question is usually presented many times, a doctor replies after the inquiry user presents the question each time, and the question of the inquiry user during each question presentation and the doctor reply corresponding to the question form a question-answer pair. And (4) extracting question-answer pairs, namely extracting the question-answer pairs from the inquiry information corresponding to one complete inquiry.

Further, the server performs feature extraction on the extracted question-answer pairs. In one embodiment, the feature extraction may be extracting keywords for questions in question-answer pairs. In another embodiment, the extracted features may be, for example, the number of sentences in the question-answer pair, the number of adjectives, the interrogative words, and so on.

And S306, correspondingly storing the question-answer pairs and the characteristic pairs corresponding to the question-answer pairs into a question-diagnosis database.

Specifically, the server correspondingly stores the characteristics corresponding to the question-answer pairs and the question-answer pairs in the inquiry database, namely, the characteristics corresponding to the question-answer pairs and the question-answer pairs are stored as different columns in the same row of a table in the database.

In one embodiment, the inquiry user communicates with the doctor through an instant message during inquiry, and the message carries respective user identifiers of both communication parties, including an inquiry user identifier and a doctor user identifier, specifically, the information sent by the inquiry terminal carries the inquiry user identifier, and the information sent by the doctor terminal carries the doctor user identifier.

And step S308, establishing an index for the inquiry database according to the characteristics.

Specifically, the server establishes an index according to the column data of the features in the inquiry database, and each node in the index corresponds to a row of data in the inquiry database respectively, and at least comprises an inquiry-answer pair and characteristics corresponding to the inquiry-answer pair.

In one embodiment, the server may also build an index based on the user identification, characteristics.

In the embodiment, by extracting features of the inquiry information and establishing the index, when the similarity between the question to be answered and each question-answer pair is calculated, the whole database does not need to be traversed, and the calculation is only carried out according to the question to be answered and the index value, so that the calculation efficiency is remarkably improved.

In one embodiment, as shown in fig. 4, the extracting of question-answer pairs for the preprocessed inquiry information includes:

step S304A, a user identification corresponding to each piece of inquiry information in the inquiry information set is obtained, and the user identification is an inquiry user identification or a doctor user identification.

Specifically, each piece of inquiry information corresponds to a user identifier, the corresponding user identifier of the message sent by the inquiry terminal is the inquiry user identifier, and the corresponding user identifier of the message sent by the doctor terminal is the doctor user identifier.

And S306B, filtering the inquiry information corresponding to the doctor user identification according to a preset rule.

Specifically, the preset rule at least includes: messages ending with the query word are filtered out, as well as messages matching the pre-set conversation. The query words may be, for example, "how to do", "how to return", "why", and the like. The preset dialogs are statements preset by the doctor terminal for saving the reply time, such as "please wait slightly", "good, i am not on duty at present", and the like.

And S308C, extracting question-answer pairs according to punctuation marks and question words for the filtered inquiry text inquiry information set.

Specifically, traversing the filtered inquiry information from the first inquiry information, sequentially acquiring a user identifier corresponding to each inquiry information, judging whether the inquiry information contains an inquiry sentence or not when the user identifier corresponding to the inquiry information is the inquiry user identifier, if so, taking the inquiry sentence as one of questions in an inquiry-answer pair, acquiring inquiry information corresponding to all continuous doctor user identifiers from the inquiry information corresponding to the first doctor user identifier below and corresponding to the question until the inquiry information corresponding to the next inquiry user identifier appears, and taking the acquired inquiry information corresponding to the doctor user identifier as an answer of the inquiry sentence to form the inquiry-answer pair. Specifically, the extracted question-answer pair may include one question and one answer, or a plurality of questions in series, or a plurality of answers in series, and the specific combination depends on the specific inquiry situation, and the application is not limited herein.

In one embodiment, the feature extraction is performed on the extracted question-answer pairs, and comprises the following steps: performing word segmentation on the questions in the extracted question-answer pairs to obtain a word set corresponding to the questions; and respectively matching each word in the word set with each word in a pre-established feature word library, and when the matching is successful, taking the word as the extracted feature.

Specifically, the server may first perform word segmentation on the questions in the extracted question-answer pairs to obtain a word set corresponding to the questions. The questions in the extracted question-answer pairs are segmented, the questions can be segmented into a complete sentence according to punctuation marks, and then each segmented sentence is segmented, for example, each segmented sentence can be segmented by using a character string matching segmentation method, for example, a forward maximum matching method, and a character string in one segmented sentence is segmented from left to right; or, the reverse maximum matching method, which divides the character string in a segmented sentence from right to left; or, the shortest path word segmentation method, the number of words required to be cut out in a character string in a segmented sentence is minimum; or, the bidirectional maximum matching method carries out word segmentation matching in forward and reverse directions simultaneously. The word segmentation processing can be carried out on each segmented sentence by utilizing a word meaning word segmentation method, wherein the word meaning word segmentation method is a word segmentation method for machine voice judgment, and words are segmented by utilizing syntactic information and semantic information to process an ambiguity phenomenon. The words of each segmented sentence can be segmented by using a statistical word segmentation method, and the occurrence frequency of some two adjacent words can be counted according to the statistics of the word groups from the historical search records of the current user or the historical search records of the public user, so that the two adjacent words can be used as the word groups for word segmentation.

Further, each word in the word set obtained by word segmentation is matched with each word in a pre-established feature word library one by one, and the matched words are used as feature words. In one embodiment, a match may be that two words are identical. In another embodiment, matching may be that the similarity between two words exceeds a preset threshold, such as "belly pain" and "belly pain" as the two words match each other. The characteristic vocabulary library may be authoritative explanations of various diseases obtained from an existing medical database, including professional information such as profiles, symptoms, complications, therapeutic drugs, general examinations and the like corresponding to the information, or medical information corresponding to various drugs, such as information of disease types and the like mainly treated by the drugs, or specific types of information (for example, treatment schemes, treatment drugs, departments to which the diseases belong, clinical manifestations and the like corresponding to different diseases) obtained from an open source medical data source (for example, questions and answers, discussions and the like about different diseases on various large forums, or various new medical cases, medical question and answer texts and the like) on the internet in real time or at regular time through a tool such as a web crawler and the like.

In one embodiment, as shown in fig. 5, the step of calculating a first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node respectively includes:

step S502, calculating the feature weight of each feature word in the first feature word set to obtain a first calculation result, and selecting a keyword according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered.

Specifically, the feature weight is used for representing the importance degree of a certain feature, and the greater the feature weight is, the more important the feature word is, the more representative the meaning of the word set is. In one embodiment, calculating the feature weight for each feature word may employ a term frequency-inverse document frequency (TF-IDF) algorithm. In this embodiment, a first calculation result is obtained after the feature weight is calculated, where the first calculation result refers to a weight value corresponding to each feature word in the first word set. The feature words can be sorted according to the weighted values, and then keywords are selected according to sorting results, so that a first keyword set is obtained.

In an embodiment, the server may perform descending order arrangement on each feature word in the first feature word set according to the feature weight, and then select a preset number of feature words ranked at the top as the key words, thereby obtaining the first key word set.

Step S504, calculating the feature weight of each feature word in the second feature word set to obtain a second calculation result, and selecting the keywords according to the second calculation result to obtain a second keyword set corresponding to each index node.

Specifically, a word frequency-inverse document frequency algorithm may be adopted to calculate a feature weight for each feature word in the second feature word set to obtain a second calculation result, where the second calculation result refers to the feature weight of each feature word in the second word set, the feature words may be sorted according to the weight, and then a keyword may be selected according to the sorting result, so as to obtain a second keyword set.

In an embodiment, the server may perform descending order arrangement on each feature word in the second feature word set according to the feature weight, and then select a preset number of feature words ranked at the top as the key words, thereby obtaining the second key word set.

Step S506, a first word frequency vector corresponding to the current question to be answered and a second word frequency vector corresponding to each index node are obtained according to the first keyword set and the second keyword set.

Specifically, a first keyword set and a second keyword set are combined to obtain a union set, word frequencies of all keywords in the union set in a first characteristic word set and word frequencies of all keywords in a second characteristic word set are respectively calculated, and a first word frequency vector and a second word frequency vector are respectively generated according to the word frequencies. For example, if the first set of feature words is: cough/smoking/insomnia, and the corresponding keyword set is { cough, smoking }; the second feature word set is: headache/cough/snivel/cooling, the corresponding keyword is { headache, snivel }, the two keywords are combined to obtain { cough, smoking, headache, snivel }, then, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 1, headache 0, snivel 0, the word frequency of each word in the set in the first feature word set is: cough 1, smoking 0, headache 1, and nasal discharge 1, the first word frequency vector is [1, 0], and the second word frequency vector is [1,0, 1].

Step S508, calculating cosine values of included angles between the first word frequency vectors and the second word frequency vectors respectively to obtain first similarities.

Specifically, the formula for calculating the cosine similarity is as follows:

wherein n (n is more than or equal to 2) is the dimension of the word frequency vector, A _i Is a first word frequency vector, B _i Is a second word frequency vector.

In the embodiment, the cosine similarity of the two feature word sets is calculated by extracting the keywords from the feature word sets and obtaining the word frequency vectors, so that compared with the calculation of the similarity of the questions to be answered and the similarity of the questions and the answers to the two documents, the calculation amount is saved, and the calculation efficiency is improved.

In one embodiment, as shown in fig. 6, calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result includes:

step S602, calculating the initial characteristic weight of each characteristic word in the first characteristic word set by adopting a word frequency-inverse document frequency algorithm.

Specifically, the word frequency TF is first calculated, and the calculation can be performed with reference to the following formula:

word frequency TF = the number of times a word appears in a document/the total number of words of the document;

then, the inverse document word frequency IDF is calculated by referring to the following formula:

inverse document word frequency

And finally, calculating an initial characteristic weight: w = TF IDF.

Step S604, sequentially judging whether each feature word in the first feature word set meets a preset adjusting rule, if so, entering step S606; if not, the process proceeds to step S608.

Step S606, the initial weight of the feature words is adjusted according to the adjustment rule, and the final feature weight is obtained.

In step S608, the initial feature weight is used as the final feature weight.

Specifically, the preset adjustment rule is a rule that is manually set to adjust the feature weight of the feature word. In one embodiment, the preset adjustment rule may be that, when two feature words appear simultaneously and a difference between corresponding feature weights is smaller than a preset threshold, the weight of one of the words is adjusted so that the difference between the weights is not smaller than the preset threshold, for example, when headache and hand pain appear simultaneously as feature words and a difference between corresponding feature weights is smaller than 0.2, the feature weight of headache is adjusted so that the difference between the feature weights of headache and hand pain is larger than 0.2, which is intended to increase the weight of the feature word with a greater influence of symptoms, thereby improving accuracy in keyword selection.

In this embodiment, the accuracy of selecting the keyword can be improved by adjusting the feature weight.

It should be understood that although the various steps in the flow diagrams of fig. 2-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided an inquiry data recommending apparatus 700 including:

a first feature word set obtaining module 702, configured to obtain a current question to be answered, perform word segmentation on the current question to be answered, extract feature words according to word segmentation results, and obtain a first feature word set corresponding to the current question to be answered;

a second feature word set obtaining module 704, configured to obtain a second feature word set corresponding to each index node in a pre-established index;

a target index node set obtaining module 706, configured to calculate first similarities between a first feature word set corresponding to the current question to be answered and second feature word sets corresponding to the index nodes, respectively, and sort the index nodes according to the first similarity calculation result to select a preset number of index nodes as target index nodes, so as to obtain a target index node set;

a question-answer pair obtaining module 708, configured to obtain, from the inquiry database, a question-answer pair corresponding to each target index node in the target index node set;

the recommending module 710 is configured to calculate second similarities between the current question to be answered and the questions corresponding to the question-answer pairs, sort the question-answer pairs according to the second similarity calculation result to select a target question-answer pair, and recommend the inquiry data according to the selected target question-answer pair.

In one embodiment, as shown in fig. 8, the apparatus further comprises:

the preprocessing module 802 is configured to obtain an inquiry information set corresponding to previous inquiries, and preprocess the inquiry information set;

the feature extraction module 804 is configured to extract question-answer pairs from the preprocessed inquiry information set, and perform feature extraction on the extracted question-answer pairs;

a storage module 806, configured to correspondingly store the question-answer pairs and the feature pairs corresponding to the question-answer pairs in a inquiry database;

and an index establishing module 808, configured to establish an index for the inquiry database according to the characteristics.

In one embodiment, the feature extraction module 804 is further configured to obtain a user identifier corresponding to each piece of inquiry information in the inquiry information set, where the user identifier is an inquiry user identifier or a doctor user identifier; filtering inquiry information corresponding to the doctor user identification according to a preset rule; and extracting question-answer pairs according to punctuation marks and question words for the filtered inquiry information set.

In one embodiment, the feature extraction module 804 is further configured to perform word segmentation on the questions in the extracted question-answer pairs to obtain a word set corresponding to the questions; and respectively matching each word in the word set with each word in a pre-established feature word library, and when the matching is successful, taking the word as the extracted feature.

In an embodiment, the target index node set obtaining module 706 is further configured to calculate a feature weight for each feature word in the first feature word set to obtain a first calculation result, and select a keyword according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; calculating the feature weight of each feature word in the second feature word set to obtain a second calculation result, and selecting a keyword according to the second calculation result to obtain a second keyword set corresponding to each index node; obtaining a first word frequency vector corresponding to the current question to be answered and a second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and respectively calculating the cosine values of included angles between the first word frequency vectors and the second word frequency vectors to obtain first similarity.

In one embodiment, the target inode set obtaining module 706 is further configured to calculate an initial feature weight of each feature word in the first feature word set by using a word frequency-inverse document frequency algorithm; when any one feature word in the first feature word set meets a preset adjusting rule, adjusting the initial feature weight of the feature word according to the preset adjusting rule to obtain a final feature weight; and when any one feature word in the first feature word set does not meet a preset adjusting rule, taking the initial feature weight as a final feature weight.

For specific limitations of the inquiry data recommendation device, reference may be made to the above limitations of the inquiry data recommendation method, which are not described herein again. The modules in the above-mentioned inquiry data recommending device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as question-answer pairs, characteristics corresponding to the question-answer pairs and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a XXX method.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: the method comprises the steps of obtaining a current question to be answered, performing word segmentation on the current question to be answered, extracting feature words according to word segmentation results, and obtaining a first feature word set corresponding to the current question to be answered; acquiring a second feature word set corresponding to each index node in a pre-established index; respectively calculating first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and sequencing the index nodes according to the first similarity calculation result to select a preset number of index nodes as target index nodes to obtain a target index node set; obtaining question-answer pairs corresponding to each target index node in the target index node set from a question-answer database; and respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting the question-answer pairs according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair.

In one embodiment, the step of obtaining the current question to be answered is preceded by the step of executing the computer program by the processor further performing the steps of: acquiring an inquiry information set corresponding to previous inquiry, and preprocessing the inquiry information set; extracting question-answer pairs from the preprocessed inquiry information set, and performing feature extraction on the extracted question-answer pairs; correspondingly storing the question-answer pairs and the characteristic pairs corresponding to the question-answer pairs into a question-call database; and establishing an index for the inquiry database according to the characteristics.

In one embodiment, extracting challenge-response pairs for the pre-processed inquiry information comprises: acquiring a user identifier corresponding to each piece of inquiry information in an inquiry information set, wherein the user identifier is an inquiry user identifier or a doctor user identifier; filtering inquiry information corresponding to the doctor user identification according to a preset rule; and extracting question-answer pairs according to punctuation marks and question words for the filtered inquiry information set.

In one embodiment, the step of calculating a first similarity between a first feature word set corresponding to a current question to be answered and a second feature word set corresponding to each index node includes: calculating a feature weight of each feature word in the first feature word set to obtain a first calculation result, and selecting a keyword according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; calculating the feature weight of each feature word in the second feature word set to obtain a second calculation result, and selecting keywords according to the second calculation result to obtain a second keyword set corresponding to each index node; obtaining a first word frequency vector corresponding to the current question to be answered and a second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; and respectively calculating the cosine values of included angles between the first word frequency vectors and the second word frequency vectors to obtain first similarity.

In one embodiment, calculating the feature weight for each feature word in the first feature word set to obtain a first calculation result includes: calculating the initial characteristic weight of each characteristic word in the first characteristic word set by adopting a word frequency-inverse document frequency algorithm; when any one feature word in the first feature word set meets a preset adjusting rule, adjusting the initial feature weight of the feature word according to the preset adjusting rule to obtain a final feature weight; and when any one feature word in the first feature word set does not meet the preset adjustment rule, taking the initial feature weight as the final feature weight.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, performs the steps of: acquiring a current question to be answered, segmenting words of the current question to be answered, extracting feature words according to a word segmentation result, and acquiring a first feature word set corresponding to the current question to be answered; acquiring a second feature word set corresponding to each index node in a pre-established index; respectively calculating first similarity between a first feature word set corresponding to the current question to be answered and a second feature word set corresponding to each index node, and sequencing the index nodes according to the first similarity calculation result to select a preset number of index nodes as target index nodes to obtain a target index node set; obtaining question-answer pairs corresponding to all target index nodes in a target index node set from a question database; and respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting the question-answer pairs according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair.

In one embodiment, the step of obtaining the current question to be answered is preceded by the computer program when executed by the processor further implementing the steps of: acquiring an inquiry information set corresponding to previous inquiry, and preprocessing the inquiry information set; extracting question-answer pairs from the preprocessed inquiry information set, and performing feature extraction on the extracted question-answer pairs; correspondingly storing the question-answer pairs and the characteristic pairs corresponding to the question-answer pairs into a question-call database; and establishing an index for the inquiry database according to the characteristics.

In one embodiment, extracting challenge-response pairs for the pre-processed inquiry information comprises: acquiring a user identifier corresponding to each piece of inquiry information in the inquiry information set, wherein the user identifier is an inquiry user identifier or a doctor user identifier; filtering inquiry information corresponding to the doctor user identification according to a preset rule; and extracting question-answer pairs according to punctuation marks and question words for the filtered inquiry information set.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An inquiry data recommendation method, the method comprising:

obtaining question-answer pairs corresponding to all target index nodes in a target index node set from a question database;

respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting the question-answer pairs according to the second similarity calculation result to select a target question-answer pair, and recommending question and diagnosis data according to the selected target question-answer pair;

the calculating the first similarity between the first feature word set corresponding to the current question to be answered and the second feature word set corresponding to each index node respectively comprises: calculating a feature weight of each feature word in the first feature word set to obtain a first calculation result, and selecting a keyword according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; calculating the feature weight of each feature word in a second feature word set to obtain a second calculation result, and selecting keywords according to the second calculation result to obtain a second keyword set corresponding to each index node; obtaining a first word frequency vector corresponding to the current question to be answered and a second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; respectively calculating the cosine values of included angles between the first word frequency vectors and the second word frequency vectors to obtain first similarity;

the calculating the feature weight of each feature word in the first feature word set to obtain a first calculation result includes: calculating the initial characteristic weight of each characteristic word in the first characteristic word set by adopting a word frequency-inverse document frequency algorithm; when any one feature word in the first feature word set meets a preset adjusting rule, adjusting the initial feature weight of the feature word according to the preset adjusting rule to obtain a final feature weight; and when any one feature word in the first feature word set does not meet a preset adjusting rule, taking the initial feature weight as a final feature weight.

2. The method of claim 1, wherein the step of obtaining the current question to be answered is preceded by the steps of:

3. The method of claim 2, wherein extracting question-answer pairs from the pre-processed inquiry information comprises:

4. The method according to claim 2 or 3, wherein the extracting features of the extracted question-answer pairs comprises:

5. An interrogation data recommendation apparatus, the apparatus comprising:

the first feature word set acquisition module is used for acquiring a current question to be answered, segmenting words of the current question to be answered, extracting feature words according to word segmentation results and obtaining a first feature word set corresponding to the current question to be answered;

the recommendation module is used for respectively calculating second similarity between the current question to be answered and the question corresponding to each question-answer pair, sorting each question-answer pair according to the second similarity calculation result to select a target question-answer pair, and recommending the inquiry data according to the selected target question-answer pair;

the target index node set acquisition module is further used for calculating a feature weight for each feature word in the first feature word set to obtain a first calculation result, and selecting a keyword according to the first calculation result to obtain a first keyword set corresponding to the current question to be answered; calculating the feature weight of each feature word in a second feature word set to obtain a second calculation result, and selecting keywords according to the second calculation result to obtain a second keyword set corresponding to each index node; obtaining a first word frequency vector corresponding to the current question to be answered and a second word frequency vector corresponding to each index node according to the first keyword set and the second keyword set; respectively calculating the cosine values of included angles between the first word frequency vectors and the second word frequency vectors to obtain first similarity;

the target index node set acquisition module is also used for calculating the initial characteristic weight of each characteristic word in the first characteristic word set by adopting a word frequency-inverse document frequency algorithm; when any one feature word in the first feature word set meets a preset adjusting rule, adjusting the initial feature weight of the feature word according to the preset adjusting rule to obtain a final feature weight; and when any one feature word in the first feature word set does not meet a preset adjusting rule, taking the initial feature weight as a final feature weight.

6. The apparatus of claim 5, further comprising:

the storage module is used for correspondingly storing the question-answer pairs and the feature pairs corresponding to the question-answer pairs into a query database;

7. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program performs the steps of the method according to any of claims 1 to 4.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.