CN112800209A - Conversation corpus recommendation method and device, storage medium and electronic equipment - Google Patents
Conversation corpus recommendation method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN112800209A CN112800209A CN202110118764.0A CN202110118764A CN112800209A CN 112800209 A CN112800209 A CN 112800209A CN 202110118764 A CN202110118764 A CN 202110118764A CN 112800209 A CN112800209 A CN 112800209A
- Authority
- CN
- China
- Prior art keywords
- corpus
- similarity
- calculating
- content
- session
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a conversation corpus recommendation method, a device, a storage medium and an electronic device, comprising: obtaining the questioning content and the session history of the session; calculating a first similarity between each corpus in the recommended corpus and the question content; calculating a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending corpora based on the matching degree. By calculating the similarity between the materials to be matched and the historical chat records of the clients and combining the similarity scores of the questioning contents and the materials to be matched, the comprehensive scores of the questioning contents and the materials are given, the interested topics of the clients are considered, and the recommendation accuracy is improved.
Description
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for recommending a corpus of conversations, a storage medium, and an electronic device.
Background
In the process of communicating with an external client through enterprise WeChat, the staff recommend the existing solutions, dialects, articles and other information of the enterprise related to the topic of interest of the client to the staff in real time through the sidebar according to the chat content of the client, so that the staff can conveniently and selectively send the information to the client, the communication efficiency of the staff and the external client can be greatly improved, and the sign-in willingness of the client is improved.
In the real-time recommendation process, the existing technical method matches the title of the material according to the questioning content. And calculating the similarity between the questioning content and the material by adopting a similarity calculation method, and then displaying the questioning content to the side bar from high to low according to the similarity. However, the inventor finds that when scoring and recommending are carried out based on the prior art, the requirements of the chat counterpart are difficult to be matched accurately.
Therefore, how to match the requirement of the other party more accurately becomes a technical problem to be solved urgently.
Disclosure of Invention
The application provides a conversation corpus recommendation method, a conversation corpus recommendation device, a storage medium and electronic equipment, and at least solves the technical problem of how to accurately match the requirement of an opposite side in the related technology.
According to an aspect of an embodiment of the present application, a method for recommending conversation corpora is provided, including: obtaining the questioning content and the session history of the session; calculating a first similarity between each corpus in the recommended corpus and the question content; calculating a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending the corpus based on the matching degree.
Optionally, the calculating a first similarity between each corpus in the recommended corpus and the question content includes: selecting a first important word set based on each corpus and the questioning content; calculating the word frequency vector of each corpus and the questioning content relative to the words in the first important word set; and calculating the similarity between the word frequency vectors to obtain the first similarity.
Optionally, the selecting a first important word set based on each corpus and the questioning content includes: calculating a first importance degree value of each corpus and each word in the questioning content; and selecting the words with the importance degree values larger than a preset value as a first important word set.
Optionally, the calculating the second similarity of each corpus in the recommended corpus to the session history includes: selecting a second important word set based on each corpus and the session history record; calculating a word frequency vector of each corpus and the conversation history relative to the words in the second important word set; and calculating the similarity between the word frequency vectors to obtain the second similarity.
Optionally, the selecting a second important word set based on each corpus and the session history includes: calculating a second importance value of each corpus and each word in the conversation history; and selecting the words with the importance degree values larger than a preset value as a second important word set.
Optionally, the conversation corpus recommendation method further includes: performing word segmentation on the questioning content, each corpus and the historical conversation record respectively; and removing the preset auxiliary words.
Optionally, the calculating the matching degree of each corpus to the questioning content based on the first similarity and the second similarity includes: and calculating the product of the first similarity and the second similarity as the matching degree.
According to another aspect of the embodiments of the present application, there is also provided a conversational corpus recommending apparatus, including: the acquisition module is used for acquiring the questioning content and the session history of the session; the first calculation module is used for calculating the first similarity between each corpus in the recommended corpus and the questioning content; the second calculation module is used for calculating a second similarity between each corpus in the recommended corpus and the session history record; a third calculating module, configured to calculate a matching degree between each corpus and the question content based on the first similarity and the second similarity; and the recommending module is used for recommending the linguistic data based on the matching degree.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory communicate with each other through the communication bus; wherein the memory is used for storing the computer program; a processor for performing the method steps in any of the above embodiments by running the computer program stored on the memory.
According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method steps of any of the above embodiments when the computer program is executed.
In the embodiment of the application, the questioning content and the session history of the session are acquired; respectively calculating a first similarity between each corpus in the recommended corpus and the question content and a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending corpora based on the matching degree. By calculating the similarity between the materials to be matched and the historical chat records of the clients and combining the similarity scores of the questioning contents and the materials to be matched, the comprehensive scores of the questioning contents and the materials are given, the interested topics of the clients are considered, and the recommendation accuracy is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a diagram illustrating a hardware environment of an alternative corpus recommendation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an alternative corpus recommendation method according to an embodiment of the present application;
FIG. 3 is a block diagram illustrating an alternative corpus recommendation device according to an embodiment of the present application;
fig. 4 is a block diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of an embodiment of the present application, a method for recommending conversation corpora is provided. Optionally, in this embodiment, the above-mentioned conversation corpus recommendation method may be applied to a hardware environment as shown in fig. 1. As shown in figure 1 of the drawings, in which,
according to an aspect of an embodiment of the present application, a method for recommending conversation corpora is provided. Alternatively, in this embodiment, the above-mentioned conversation corpus recommendation method may be applied to a hardware environment formed by the terminal 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal 102 through a network, which may be used to provide services for the terminal or a client installed on the terminal, may be provided with a database on the server or independent from the server, may be used to provide data storage services for the server 104, and may also be used to handle cloud services, and the network includes but is not limited to: the terminal 102 is not limited to a PC, a mobile phone, a tablet computer, etc. the terminal may be a wide area network, a metropolitan area network, or a local area network. The conversation corpus recommendation method in the embodiment of the application may be executed by the server 104, or may be executed by the terminal 102, or may be executed by both the server 104 and the terminal 102. The terminal 102 may execute the conversation corpus recommendation method according to the embodiment of the present application, or may execute the conversation corpus recommendation method by a client installed thereon.
Taking the example of the method for recommending dialog corpuses in the present embodiment executed by the terminal 102 and/or the server 104 as an example, fig. 2 is a schematic flow chart of an optional method for recommending dialog corpuses according to the present embodiment, as shown in fig. 2, the flow of the method may include the following steps:
step S202, obtaining the questioning content and the session history record of the session;
step S204, calculating a first similarity between each corpus in the recommended corpus and the question content;
step S206, calculating a second similarity between each corpus in the recommended corpus and the session history record;
step S208, calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and step S210, recommending corpora based on the matching degree.
Through the steps S202 to S210, the session questioning content and the session history are acquired; respectively calculating a first similarity between each corpus in the recommended corpus and the question content and a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending corpora based on the matching degree. By calculating the similarity between the materials to be matched and the historical chat records of the clients and combining the similarity scores of the questioning contents and the materials to be matched, the comprehensive scores of the questioning contents and the materials are given, the interested topics of the clients are considered, and the recommendation accuracy is improved.
As for the technical solution in step S202, in this embodiment, the session questioning content may include questioning content of the other party, and in this embodiment, the session content may be analyzed in real time, for example, the session content is classified, and keyword recognition is performed to obtain the session questioning content. As an exemplary embodiment, the session history may be based on basic information of the current counterpart, such as basic personal information of name, age, gender, account ID, etc., may parse semantic information in the content of the historical group chat session, and context information, and mine a session history related to the content of the current session challenge based on the semantic information and the context information. In this embodiment, all session records of the current counterpart within a preset time period may also be acquired as the session history records, for example, the session history records of the previous 3 days or the previous week or the previous N previous days or weeks with the current counterpart may be acquired.
For the technical solution in step S204, a first similarity between each corpus in the recommended corpus and the question content is calculated. In this embodiment, a recommended corpus may be obtained first, where the recommended corpus may be a recommended corpus preset in advance, for example, corpora such as a marketing technique, product data, a product use manual, and an enterprise introduction. In this embodiment, the first similarity between the query content and the identification information of each corpus, such as the corpus title, can be calculated for each corpus in the recommended corpus. The first similarity may characterize a similarity between the corpus and the current query content.
For the technical solution in step S206, a second similarity between each corpus in the recommended corpus and the session history is calculated. As an exemplary embodiment, the recommendation corpus may refer to the description of the recommendation corpus in the above embodiments, in this embodiment, a similarity between the session history record and each corpus may be calculated, for example, the topic of interest of the chat counterpart may be determined based on the session history record, for example, the topic of interest of the counterpart in the history session may be determined based on parsing semantic information and context information in the content of the history session, and a second similarity between the history record and each corpus in the recommendation corpus may be calculated based on the current topic of interest, and in this embodiment, the second similarity may be used to represent a similarity between the topic of interest of the user history and each corpus in the recommendation corpus.
For the technical solution in step S208, a matching degree between each corpus and the question content is calculated based on the first similarity and the second similarity. In this embodiment, a first similarity calculated based on the current session question content and a second similarity calculated based on the historical session record may be calculated to obtain a matching degree between each expected and the current session question content.
For the technical solution in step S208, after the matching degree is obtained, each corpus may be sorted based on the score of the matching degree, exemplarily, the corpus is recommended in real time from high to low according to the matching score of each material, and the corpus is displayed to a sidebar of a conversation chat window or otherwise reminds the user for selection.
As an exemplary embodiment, for the calculation of the first similarity, the similarity between the session query content and the word frequency in the corpus may be calculated, and for example, a first important word set may be selected based on each corpus and the query content; calculating the word frequency vector of each corpus and the questioning content relative to the words in the first important word set; and calculating the similarity between the word frequency vectors to obtain the first similarity. The obtaining mode of the first important word set may be: calculating a first importance degree value of each corpus and each word in the questioning content; and selecting the words with the importance degree values larger than a preset value as a first important word set. As an illustrative example, specifically, the tf-idf value of each word in the material and the questioning content is calculated; selecting important words (such as 20, 30 or more or less) in the material and the question content according to the tf-idf value from high to low, combining the words into a set, and calculating the word frequency of the material and the question content for the words in the set to obtain a corpus word frequency vector Vcorpus and a conversation question content word frequency vector Vquery; calculating the similarity between the word frequency vectors by adopting cosine similarity:
where sim (Vcorpus, Vqurey) is the first similarity, Vcorpus is the corpus word frequency vector, and Vqurey is the session questioning content word frequency vector.
As an exemplary embodiment, the calculation of the second similarity may be similar to the calculation of the first similarity, and for example, the second important word set may be selected based on each corpus and the session history; calculating the word frequency vector of each corpus and the questioning content relative to the words in the second important word set; and calculating the similarity between the word frequency vectors to obtain the second similarity. The obtaining mode of the second important word set may be: calculating a second importance value of each corpus and each word in the conversation history; and selecting the words with the importance degree values larger than a preset value as a second important word set. As an illustrative example, specifically, the tf-idf value of each word in the material and session history is calculated; selecting important words (such as 20, 30 or more or less) in the material and the question content according to the tf-idf value from high to low, combining the words into a set, and calculating the word frequency of the material and the question content for the words in the set to obtain a corpus word frequency vector Vcorpus and a session history record word frequency vector Vrrecds; calculating the similarity between the word frequency vectors by adopting cosine similarity:
where sim (Vcorpus, vrecordis) is the second similarity, Vcorpus is the corpus word frequency vector, and vrecordis is the session question content word frequency vector.
As an illustrative example, after obtaining the session questioning content and the session history, the session questioning content, the history session records and the titles of all the materials may be participled, and auxiliary words such as "o", "what", etc. having no practical meaning may be removed. As an illustrative example, semantic recognition may be performed on the session question content to obtain intention information of the session object, semantic information and context information in the historical session content are analyzed to determine an interest topic of the opposite party in the historical session, and a recommendation corpus may be determined based on the intention information and the interest topic in the historical session record.
As an exemplary embodiment, a composite score of the content of the question and the material is calculated:
score(query,corpus,records)=sim(Vcorpus,Vquery)×(Vcorpus,Vrecords)
where sim (Vcorpus, Vqurey) is the first similarity, and sim (Vcorpus, vrecordis) is the second similarity.
After the matching score is obtained, real-time recommendation can be performed from high to low according to the matching score of each material, and the recommendation is displayed to a sidebar.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, an optical disk) and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the methods according to the embodiments of the present application.
According to another aspect of the embodiment of the present application, there is also provided a corpus recommendation device for implementing the corpus recommendation method. Fig. 3 is a schematic diagram of an alternative corpus recommendation device according to an embodiment of the present application, and as shown in fig. 3, the device may include:
an obtaining module 302, configured to obtain session question content and a session history;
a first calculating module 304, configured to calculate a first similarity between each corpus in the recommended corpus and the question content;
a second calculating module 306, configured to calculate a second similarity between each corpus in the recommended corpus and the session history;
a third calculating module 308, configured to calculate a matching degree between each corpus and the question content based on the first similarity and the second similarity;
and the recommending module 310 is configured to recommend the corpus based on the matching degree.
It should be noted that the obtaining module 302 in this embodiment may be configured to execute the step S202, the first calculating module 304 in this embodiment may be configured to execute the step S204, the result second calculating module 306 in this embodiment may be configured to execute the step S206, the third calculating module 308 in this embodiment may be configured to execute the step S208, and the result recommending module 310 in this embodiment may be configured to execute the step S210.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.
According to another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the conversational corpus pushing method, where the electronic device may be a server, a terminal, or a combination thereof.
Fig. 4 is a block diagram of an alternative electronic device according to an embodiment of the present application, as shown in fig. 4, including a processor 402, a communication interface 404, a memory 406, and a communication bus 408, where the processor 402, the communication interface 404, and the memory 406 communicate with each other via the communication bus 408, where,
a memory 406 for storing a computer program;
the processor 402, when executing the computer program stored in the memory 406, performs the following steps:
acquiring session questioning content and session history records;
calculating a first similarity between each corpus in the recommended corpus and the question content;
calculating a second similarity between each corpus in the recommended corpus and the session history;
calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and recommending corpora based on the matching degree.
Alternatively, in this embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
As an example, as shown in fig. 4, the memory 406 may include, but is not limited to, the obtaining module 302, the first calculating module 304, the second calculating module 306, the third calculating module 308, and the recommending module 310 of the conversational corpus recommending apparatus. In addition, the apparatus may further include, but is not limited to, other module units in the conversation corpus recommendation device, which is not described in detail in this example.
The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
It can be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration, and the device implementing the above-mentioned session corpus recommendation method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 4 is a diagram illustrating the structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.
According to still another aspect of an embodiment of the present application, there is also provided a storage medium. Optionally, in this embodiment, the storage medium may be a program code for executing the conversation corpus recommendation method.
Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:
acquiring session questioning content and session history records;
calculating a first similarity between each corpus in the recommended corpus and the question content;
calculating a second similarity between each corpus in the recommended corpus and the session history;
calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and recommending corpora based on the matching degree.
Optionally, the specific example in this embodiment may refer to the example described in the above embodiment, which is not described again in this embodiment.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, and may also be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.
Claims (10)
1. A conversation corpus recommendation method is characterized by comprising the following steps:
obtaining the questioning content and the session history of the session;
calculating a first similarity between each corpus in the recommended corpus and the question content;
calculating a second similarity between each corpus in the recommended corpus and the session history;
calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and recommending corpora based on the matching degree.
2. The method according to claim 1, wherein the calculating a first similarity between each corpus in the recommended corpus and the query content comprises:
selecting a first important word set based on each corpus and the questioning content;
calculating the word frequency vector of each corpus and the questioning content relative to the words in the first important word set;
and calculating the similarity between the word frequency vectors to obtain the first similarity.
3. The method according to claim 2, wherein said selecting a first set of important words based on said each corpus and said query comprises:
calculating a first importance degree value of each corpus and each word in the questioning content;
and selecting the words with the importance degree values larger than a preset value as a first important word set.
4. The method according to claim 1, wherein the calculating the second similarity between each corpus in the recommended corpus and the session history comprises:
selecting a second important word set based on each corpus and the session history record;
calculating a word frequency vector of each corpus and the conversation history relative to the words in the second important word set;
and calculating the similarity between the word frequency vectors to obtain the second similarity.
5. The method according to claim 4, wherein said selecting a second set of important words based on said each corpus and said session history comprises:
calculating a second importance value of each corpus and each word in the conversation history;
and selecting the words with the importance degree values larger than a preset value as a second important word set.
6. The corpus recommendation method of claim 1, further comprising:
performing word segmentation on the questioning content, each corpus and the historical conversation record respectively;
and removing the preset auxiliary words.
7. The method for recommending corpus according to claim 1, wherein said calculating a matching degree of each corpus to said query content based on said first similarity and said second similarity comprises:
and calculating the product of the first similarity and the second similarity as the matching degree.
8. A conversational corpus recommendation device, comprising:
the acquisition module is used for acquiring the questioning content and the session history of the session;
the first calculation module is used for calculating the first similarity between each corpus in the recommended corpus and the questioning content;
the second calculation module is used for calculating a second similarity between each corpus in the recommended corpus and the session history record;
a third calculating module, configured to calculate a matching degree between each corpus and the question content based on the first similarity and the second similarity;
and the recommending module is used for recommending the linguistic data based on the matching degree.
9. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein said processor, said communication interface and said memory communicate with each other via said communication bus,
the memory for storing a computer program;
the processor is configured to execute the steps of the conversation corpus recommendation method according to any one of claims 1 to 7 by executing the computer program stored in the memory.
10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the conversational corpus recommendation method steps of any one of claims 1 to 7 when the computer program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110118764.0A CN112800209A (en) | 2021-01-28 | 2021-01-28 | Conversation corpus recommendation method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110118764.0A CN112800209A (en) | 2021-01-28 | 2021-01-28 | Conversation corpus recommendation method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112800209A true CN112800209A (en) | 2021-05-14 |
Family
ID=75812462
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110118764.0A Pending CN112800209A (en) | 2021-01-28 | 2021-01-28 | Conversation corpus recommendation method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800209A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157896A (en) * | 2021-05-26 | 2021-07-23 | 中国平安人寿保险股份有限公司 | Voice conversation generation method and device, computer equipment and storage medium |
JP7192039B1 (en) | 2021-06-14 | 2022-12-19 | 株式会社大和総研 | Matching system and program |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989040A (en) * | 2015-02-03 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Intelligent question-answer method, device and system |
CN106354856A (en) * | 2016-09-05 | 2017-01-25 | 北京百度网讯科技有限公司 | Enhanced deep neural network search method and device based on artificial intelligence |
CN110795618A (en) * | 2019-09-12 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Content recommendation method, device, equipment and computer readable storage medium |
CN110795542A (en) * | 2019-08-28 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Dialogue method and related device and equipment |
CN112100354A (en) * | 2020-09-16 | 2020-12-18 | 北京奇艺世纪科技有限公司 | Man-machine conversation method, device, equipment and storage medium |
-
2021
- 2021-01-28 CN CN202110118764.0A patent/CN112800209A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989040A (en) * | 2015-02-03 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Intelligent question-answer method, device and system |
CN106354856A (en) * | 2016-09-05 | 2017-01-25 | 北京百度网讯科技有限公司 | Enhanced deep neural network search method and device based on artificial intelligence |
CN110795542A (en) * | 2019-08-28 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Dialogue method and related device and equipment |
CN110795618A (en) * | 2019-09-12 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Content recommendation method, device, equipment and computer readable storage medium |
CN112100354A (en) * | 2020-09-16 | 2020-12-18 | 北京奇艺世纪科技有限公司 | Man-machine conversation method, device, equipment and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157896A (en) * | 2021-05-26 | 2021-07-23 | 中国平安人寿保险股份有限公司 | Voice conversation generation method and device, computer equipment and storage medium |
CN113157896B (en) * | 2021-05-26 | 2024-03-29 | 中国平安人寿保险股份有限公司 | Voice dialogue generation method and device, computer equipment and storage medium |
JP7192039B1 (en) | 2021-06-14 | 2022-12-19 | 株式会社大和総研 | Matching system and program |
JP2022190557A (en) * | 2021-06-14 | 2022-12-26 | 株式会社大和総研 | Matching system and program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180336193A1 (en) | Artificial Intelligence Based Method and Apparatus for Generating Article | |
CN110298029B (en) | Friend recommendation method, device, equipment and medium based on user corpus | |
CN113283238B (en) | Text data processing method and device, electronic equipment and storage medium | |
CN111310440A (en) | Text error correction method, device and system | |
CN107862058B (en) | Method and apparatus for generating information | |
CN112732893B (en) | Text information extraction method and device, storage medium and electronic equipment | |
CN110427453B (en) | Data similarity calculation method, device, computer equipment and storage medium | |
CN112199588A (en) | Public opinion text screening method and device | |
CN112800209A (en) | Conversation corpus recommendation method and device, storage medium and electronic equipment | |
CN109190123B (en) | Method and apparatus for outputting information | |
CN112765364A (en) | Group chat session ordering method and device, storage medium and electronic equipment | |
WO2021118746A1 (en) | Systems and methods for generating labeled short text sequences | |
CN112597292B (en) | Question reply recommendation method, device, computer equipment and storage medium | |
CN113934834A (en) | Question matching method, device, equipment and storage medium | |
CN112100491A (en) | Information recommendation method, device and equipment based on user data and storage medium | |
CN112784032A (en) | Conversation corpus recommendation evaluation method and device, storage medium and electronic equipment | |
CN111079854A (en) | Information identification method, device and storage medium | |
CN116775815B (en) | Dialogue data processing method and device, electronic equipment and storage medium | |
CN114528851B (en) | Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium | |
CN112615774B (en) | Instant messaging information processing method and device, instant messaging system and electronic equipment | |
CN113010664B (en) | Data processing method and device and computer equipment | |
CN110535749B (en) | Dialogue pushing method and device, electronic equipment and storage medium | |
CN110502698B (en) | Information recommendation method, device, equipment and storage medium | |
CN110413637B (en) | Information recommendation method, device and equipment | |
CN110737750B (en) | Data processing method and device for analyzing text audience and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |