CN112800209A - Conversation corpus recommendation method and device, storage medium and electronic equipment - Google Patents

Conversation corpus recommendation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112800209A
CN112800209A CN202110118764.0A CN202110118764A CN112800209A CN 112800209 A CN112800209 A CN 112800209A CN 202110118764 A CN202110118764 A CN 202110118764A CN 112800209 A CN112800209 A CN 112800209A
Authority
CN
China
Prior art keywords
corpus
similarity
calculating
content
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110118764.0A
Other languages
Chinese (zh)
Inventor
王毅君
徐凯波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202110118764.0A priority Critical patent/CN112800209A/en
Publication of CN112800209A publication Critical patent/CN112800209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a conversation corpus recommendation method, a device, a storage medium and an electronic device, comprising: obtaining the questioning content and the session history of the session; calculating a first similarity between each corpus in the recommended corpus and the question content; calculating a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending corpora based on the matching degree. By calculating the similarity between the materials to be matched and the historical chat records of the clients and combining the similarity scores of the questioning contents and the materials to be matched, the comprehensive scores of the questioning contents and the materials are given, the interested topics of the clients are considered, and the recommendation accuracy is improved.

Description

Conversation corpus recommendation method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for recommending a corpus of conversations, a storage medium, and an electronic device.
Background
In the process of communicating with an external client through enterprise WeChat, the staff recommend the existing solutions, dialects, articles and other information of the enterprise related to the topic of interest of the client to the staff in real time through the sidebar according to the chat content of the client, so that the staff can conveniently and selectively send the information to the client, the communication efficiency of the staff and the external client can be greatly improved, and the sign-in willingness of the client is improved.
In the real-time recommendation process, the existing technical method matches the title of the material according to the questioning content. And calculating the similarity between the questioning content and the material by adopting a similarity calculation method, and then displaying the questioning content to the side bar from high to low according to the similarity. However, the inventor finds that when scoring and recommending are carried out based on the prior art, the requirements of the chat counterpart are difficult to be matched accurately.
Therefore, how to match the requirement of the other party more accurately becomes a technical problem to be solved urgently.
Disclosure of Invention
The application provides a conversation corpus recommendation method, a conversation corpus recommendation device, a storage medium and electronic equipment, and at least solves the technical problem of how to accurately match the requirement of an opposite side in the related technology.
According to an aspect of an embodiment of the present application, a method for recommending conversation corpora is provided, including: obtaining the questioning content and the session history of the session; calculating a first similarity between each corpus in the recommended corpus and the question content; calculating a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending the corpus based on the matching degree.
Optionally, the calculating a first similarity between each corpus in the recommended corpus and the question content includes: selecting a first important word set based on each corpus and the questioning content; calculating the word frequency vector of each corpus and the questioning content relative to the words in the first important word set; and calculating the similarity between the word frequency vectors to obtain the first similarity.
Optionally, the selecting a first important word set based on each corpus and the questioning content includes: calculating a first importance degree value of each corpus and each word in the questioning content; and selecting the words with the importance degree values larger than a preset value as a first important word set.
Optionally, the calculating the second similarity of each corpus in the recommended corpus to the session history includes: selecting a second important word set based on each corpus and the session history record; calculating a word frequency vector of each corpus and the conversation history relative to the words in the second important word set; and calculating the similarity between the word frequency vectors to obtain the second similarity.
Optionally, the selecting a second important word set based on each corpus and the session history includes: calculating a second importance value of each corpus and each word in the conversation history; and selecting the words with the importance degree values larger than a preset value as a second important word set.
Optionally, the conversation corpus recommendation method further includes: performing word segmentation on the questioning content, each corpus and the historical conversation record respectively; and removing the preset auxiliary words.
Optionally, the calculating the matching degree of each corpus to the questioning content based on the first similarity and the second similarity includes: and calculating the product of the first similarity and the second similarity as the matching degree.
According to another aspect of the embodiments of the present application, there is also provided a conversational corpus recommending apparatus, including: the acquisition module is used for acquiring the questioning content and the session history of the session; the first calculation module is used for calculating the first similarity between each corpus in the recommended corpus and the questioning content; the second calculation module is used for calculating a second similarity between each corpus in the recommended corpus and the session history record; a third calculating module, configured to calculate a matching degree between each corpus and the question content based on the first similarity and the second similarity; and the recommending module is used for recommending the linguistic data based on the matching degree.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory communicate with each other through the communication bus; wherein the memory is used for storing the computer program; a processor for performing the method steps in any of the above embodiments by running the computer program stored on the memory.
According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method steps of any of the above embodiments when the computer program is executed.
In the embodiment of the application, the questioning content and the session history of the session are acquired; respectively calculating a first similarity between each corpus in the recommended corpus and the question content and a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending corpora based on the matching degree. By calculating the similarity between the materials to be matched and the historical chat records of the clients and combining the similarity scores of the questioning contents and the materials to be matched, the comprehensive scores of the questioning contents and the materials are given, the interested topics of the clients are considered, and the recommendation accuracy is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a diagram illustrating a hardware environment of an alternative corpus recommendation method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an alternative corpus recommendation method according to an embodiment of the present application;
FIG. 3 is a block diagram illustrating an alternative corpus recommendation device according to an embodiment of the present application;
fig. 4 is a block diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of an embodiment of the present application, a method for recommending conversation corpora is provided. Optionally, in this embodiment, the above-mentioned conversation corpus recommendation method may be applied to a hardware environment as shown in fig. 1. As shown in figure 1 of the drawings, in which,
according to an aspect of an embodiment of the present application, a method for recommending conversation corpora is provided. Alternatively, in this embodiment, the above-mentioned conversation corpus recommendation method may be applied to a hardware environment formed by the terminal 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal 102 through a network, which may be used to provide services for the terminal or a client installed on the terminal, may be provided with a database on the server or independent from the server, may be used to provide data storage services for the server 104, and may also be used to handle cloud services, and the network includes but is not limited to: the terminal 102 is not limited to a PC, a mobile phone, a tablet computer, etc. the terminal may be a wide area network, a metropolitan area network, or a local area network. The conversation corpus recommendation method in the embodiment of the application may be executed by the server 104, or may be executed by the terminal 102, or may be executed by both the server 104 and the terminal 102. The terminal 102 may execute the conversation corpus recommendation method according to the embodiment of the present application, or may execute the conversation corpus recommendation method by a client installed thereon.
Taking the example of the method for recommending dialog corpuses in the present embodiment executed by the terminal 102 and/or the server 104 as an example, fig. 2 is a schematic flow chart of an optional method for recommending dialog corpuses according to the present embodiment, as shown in fig. 2, the flow of the method may include the following steps:
step S202, obtaining the questioning content and the session history record of the session;
step S204, calculating a first similarity between each corpus in the recommended corpus and the question content;
step S206, calculating a second similarity between each corpus in the recommended corpus and the session history record;
step S208, calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and step S210, recommending corpora based on the matching degree.
Through the steps S202 to S210, the session questioning content and the session history are acquired; respectively calculating a first similarity between each corpus in the recommended corpus and the question content and a second similarity between each corpus in the recommended corpus and the session history; calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity; and recommending corpora based on the matching degree. By calculating the similarity between the materials to be matched and the historical chat records of the clients and combining the similarity scores of the questioning contents and the materials to be matched, the comprehensive scores of the questioning contents and the materials are given, the interested topics of the clients are considered, and the recommendation accuracy is improved.
As for the technical solution in step S202, in this embodiment, the session questioning content may include questioning content of the other party, and in this embodiment, the session content may be analyzed in real time, for example, the session content is classified, and keyword recognition is performed to obtain the session questioning content. As an exemplary embodiment, the session history may be based on basic information of the current counterpart, such as basic personal information of name, age, gender, account ID, etc., may parse semantic information in the content of the historical group chat session, and context information, and mine a session history related to the content of the current session challenge based on the semantic information and the context information. In this embodiment, all session records of the current counterpart within a preset time period may also be acquired as the session history records, for example, the session history records of the previous 3 days or the previous week or the previous N previous days or weeks with the current counterpart may be acquired.
For the technical solution in step S204, a first similarity between each corpus in the recommended corpus and the question content is calculated. In this embodiment, a recommended corpus may be obtained first, where the recommended corpus may be a recommended corpus preset in advance, for example, corpora such as a marketing technique, product data, a product use manual, and an enterprise introduction. In this embodiment, the first similarity between the query content and the identification information of each corpus, such as the corpus title, can be calculated for each corpus in the recommended corpus. The first similarity may characterize a similarity between the corpus and the current query content.
For the technical solution in step S206, a second similarity between each corpus in the recommended corpus and the session history is calculated. As an exemplary embodiment, the recommendation corpus may refer to the description of the recommendation corpus in the above embodiments, in this embodiment, a similarity between the session history record and each corpus may be calculated, for example, the topic of interest of the chat counterpart may be determined based on the session history record, for example, the topic of interest of the counterpart in the history session may be determined based on parsing semantic information and context information in the content of the history session, and a second similarity between the history record and each corpus in the recommendation corpus may be calculated based on the current topic of interest, and in this embodiment, the second similarity may be used to represent a similarity between the topic of interest of the user history and each corpus in the recommendation corpus.
For the technical solution in step S208, a matching degree between each corpus and the question content is calculated based on the first similarity and the second similarity. In this embodiment, a first similarity calculated based on the current session question content and a second similarity calculated based on the historical session record may be calculated to obtain a matching degree between each expected and the current session question content.
For the technical solution in step S208, after the matching degree is obtained, each corpus may be sorted based on the score of the matching degree, exemplarily, the corpus is recommended in real time from high to low according to the matching score of each material, and the corpus is displayed to a sidebar of a conversation chat window or otherwise reminds the user for selection.
As an exemplary embodiment, for the calculation of the first similarity, the similarity between the session query content and the word frequency in the corpus may be calculated, and for example, a first important word set may be selected based on each corpus and the query content; calculating the word frequency vector of each corpus and the questioning content relative to the words in the first important word set; and calculating the similarity between the word frequency vectors to obtain the first similarity. The obtaining mode of the first important word set may be: calculating a first importance degree value of each corpus and each word in the questioning content; and selecting the words with the importance degree values larger than a preset value as a first important word set. As an illustrative example, specifically, the tf-idf value of each word in the material and the questioning content is calculated; selecting important words (such as 20, 30 or more or less) in the material and the question content according to the tf-idf value from high to low, combining the words into a set, and calculating the word frequency of the material and the question content for the words in the set to obtain a corpus word frequency vector Vcorpus and a conversation question content word frequency vector Vquery; calculating the similarity between the word frequency vectors by adopting cosine similarity:
Figure BDA0002921699760000081
where sim (Vcorpus, Vqurey) is the first similarity, Vcorpus is the corpus word frequency vector, and Vqurey is the session questioning content word frequency vector.
As an exemplary embodiment, the calculation of the second similarity may be similar to the calculation of the first similarity, and for example, the second important word set may be selected based on each corpus and the session history; calculating the word frequency vector of each corpus and the questioning content relative to the words in the second important word set; and calculating the similarity between the word frequency vectors to obtain the second similarity. The obtaining mode of the second important word set may be: calculating a second importance value of each corpus and each word in the conversation history; and selecting the words with the importance degree values larger than a preset value as a second important word set. As an illustrative example, specifically, the tf-idf value of each word in the material and session history is calculated; selecting important words (such as 20, 30 or more or less) in the material and the question content according to the tf-idf value from high to low, combining the words into a set, and calculating the word frequency of the material and the question content for the words in the set to obtain a corpus word frequency vector Vcorpus and a session history record word frequency vector Vrrecds; calculating the similarity between the word frequency vectors by adopting cosine similarity:
Figure BDA0002921699760000082
where sim (Vcorpus, vrecordis) is the second similarity, Vcorpus is the corpus word frequency vector, and vrecordis is the session question content word frequency vector.
As an illustrative example, after obtaining the session questioning content and the session history, the session questioning content, the history session records and the titles of all the materials may be participled, and auxiliary words such as "o", "what", etc. having no practical meaning may be removed. As an illustrative example, semantic recognition may be performed on the session question content to obtain intention information of the session object, semantic information and context information in the historical session content are analyzed to determine an interest topic of the opposite party in the historical session, and a recommendation corpus may be determined based on the intention information and the interest topic in the historical session record.
As an exemplary embodiment, a composite score of the content of the question and the material is calculated:
score(query,corpus,records)=sim(Vcorpus,Vquery)×(Vcorpus,Vrecords)
where sim (Vcorpus, Vqurey) is the first similarity, and sim (Vcorpus, vrecordis) is the second similarity.
After the matching score is obtained, real-time recommendation can be performed from high to low according to the matching score of each material, and the recommendation is displayed to a sidebar.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, an optical disk) and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the methods according to the embodiments of the present application.
According to another aspect of the embodiment of the present application, there is also provided a corpus recommendation device for implementing the corpus recommendation method. Fig. 3 is a schematic diagram of an alternative corpus recommendation device according to an embodiment of the present application, and as shown in fig. 3, the device may include:
an obtaining module 302, configured to obtain session question content and a session history;
a first calculating module 304, configured to calculate a first similarity between each corpus in the recommended corpus and the question content;
a second calculating module 306, configured to calculate a second similarity between each corpus in the recommended corpus and the session history;
a third calculating module 308, configured to calculate a matching degree between each corpus and the question content based on the first similarity and the second similarity;
and the recommending module 310 is configured to recommend the corpus based on the matching degree.
It should be noted that the obtaining module 302 in this embodiment may be configured to execute the step S202, the first calculating module 304 in this embodiment may be configured to execute the step S204, the result second calculating module 306 in this embodiment may be configured to execute the step S206, the third calculating module 308 in this embodiment may be configured to execute the step S208, and the result recommending module 310 in this embodiment may be configured to execute the step S210.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.
According to another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the conversational corpus pushing method, where the electronic device may be a server, a terminal, or a combination thereof.
Fig. 4 is a block diagram of an alternative electronic device according to an embodiment of the present application, as shown in fig. 4, including a processor 402, a communication interface 404, a memory 406, and a communication bus 408, where the processor 402, the communication interface 404, and the memory 406 communicate with each other via the communication bus 408, where,
a memory 406 for storing a computer program;
the processor 402, when executing the computer program stored in the memory 406, performs the following steps:
acquiring session questioning content and session history records;
calculating a first similarity between each corpus in the recommended corpus and the question content;
calculating a second similarity between each corpus in the recommended corpus and the session history;
calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and recommending corpora based on the matching degree.
Alternatively, in this embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
As an example, as shown in fig. 4, the memory 406 may include, but is not limited to, the obtaining module 302, the first calculating module 304, the second calculating module 306, the third calculating module 308, and the recommending module 310 of the conversational corpus recommending apparatus. In addition, the apparatus may further include, but is not limited to, other module units in the conversation corpus recommendation device, which is not described in detail in this example.
The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
It can be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration, and the device implementing the above-mentioned session corpus recommendation method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 4 is a diagram illustrating the structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.
According to still another aspect of an embodiment of the present application, there is also provided a storage medium. Optionally, in this embodiment, the storage medium may be a program code for executing the conversation corpus recommendation method.
Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:
acquiring session questioning content and session history records;
calculating a first similarity between each corpus in the recommended corpus and the question content;
calculating a second similarity between each corpus in the recommended corpus and the session history;
calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and recommending corpora based on the matching degree.
Optionally, the specific example in this embodiment may refer to the example described in the above embodiment, which is not described again in this embodiment.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, and may also be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A conversation corpus recommendation method is characterized by comprising the following steps:
obtaining the questioning content and the session history of the session;
calculating a first similarity between each corpus in the recommended corpus and the question content;
calculating a second similarity between each corpus in the recommended corpus and the session history;
calculating the matching degree of each corpus and the questioning content based on the first similarity and the second similarity;
and recommending corpora based on the matching degree.
2. The method according to claim 1, wherein the calculating a first similarity between each corpus in the recommended corpus and the query content comprises:
selecting a first important word set based on each corpus and the questioning content;
calculating the word frequency vector of each corpus and the questioning content relative to the words in the first important word set;
and calculating the similarity between the word frequency vectors to obtain the first similarity.
3. The method according to claim 2, wherein said selecting a first set of important words based on said each corpus and said query comprises:
calculating a first importance degree value of each corpus and each word in the questioning content;
and selecting the words with the importance degree values larger than a preset value as a first important word set.
4. The method according to claim 1, wherein the calculating the second similarity between each corpus in the recommended corpus and the session history comprises:
selecting a second important word set based on each corpus and the session history record;
calculating a word frequency vector of each corpus and the conversation history relative to the words in the second important word set;
and calculating the similarity between the word frequency vectors to obtain the second similarity.
5. The method according to claim 4, wherein said selecting a second set of important words based on said each corpus and said session history comprises:
calculating a second importance value of each corpus and each word in the conversation history;
and selecting the words with the importance degree values larger than a preset value as a second important word set.
6. The corpus recommendation method of claim 1, further comprising:
performing word segmentation on the questioning content, each corpus and the historical conversation record respectively;
and removing the preset auxiliary words.
7. The method for recommending corpus according to claim 1, wherein said calculating a matching degree of each corpus to said query content based on said first similarity and said second similarity comprises:
and calculating the product of the first similarity and the second similarity as the matching degree.
8. A conversational corpus recommendation device, comprising:
the acquisition module is used for acquiring the questioning content and the session history of the session;
the first calculation module is used for calculating the first similarity between each corpus in the recommended corpus and the questioning content;
the second calculation module is used for calculating a second similarity between each corpus in the recommended corpus and the session history record;
a third calculating module, configured to calculate a matching degree between each corpus and the question content based on the first similarity and the second similarity;
and the recommending module is used for recommending the linguistic data based on the matching degree.
9. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein said processor, said communication interface and said memory communicate with each other via said communication bus,
the memory for storing a computer program;
the processor is configured to execute the steps of the conversation corpus recommendation method according to any one of claims 1 to 7 by executing the computer program stored in the memory.
10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the conversational corpus recommendation method steps of any one of claims 1 to 7 when the computer program is executed.
CN202110118764.0A 2021-01-28 2021-01-28 Conversation corpus recommendation method and device, storage medium and electronic equipment Pending CN112800209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110118764.0A CN112800209A (en) 2021-01-28 2021-01-28 Conversation corpus recommendation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110118764.0A CN112800209A (en) 2021-01-28 2021-01-28 Conversation corpus recommendation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112800209A true CN112800209A (en) 2021-05-14

Family

ID=75812462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110118764.0A Pending CN112800209A (en) 2021-01-28 2021-01-28 Conversation corpus recommendation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112800209A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157896A (en) * 2021-05-26 2021-07-23 中国平安人寿保险股份有限公司 Voice conversation generation method and device, computer equipment and storage medium
JP7192039B1 (en) 2021-06-14 2022-12-19 株式会社大和総研 Matching system and program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989040A (en) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 Intelligent question-answer method, device and system
CN106354856A (en) * 2016-09-05 2017-01-25 北京百度网讯科技有限公司 Enhanced deep neural network search method and device based on artificial intelligence
CN110795618A (en) * 2019-09-12 2020-02-14 腾讯科技(深圳)有限公司 Content recommendation method, device, equipment and computer readable storage medium
CN110795542A (en) * 2019-08-28 2020-02-14 腾讯科技(深圳)有限公司 Dialogue method and related device and equipment
CN112100354A (en) * 2020-09-16 2020-12-18 北京奇艺世纪科技有限公司 Man-machine conversation method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989040A (en) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 Intelligent question-answer method, device and system
CN106354856A (en) * 2016-09-05 2017-01-25 北京百度网讯科技有限公司 Enhanced deep neural network search method and device based on artificial intelligence
CN110795542A (en) * 2019-08-28 2020-02-14 腾讯科技(深圳)有限公司 Dialogue method and related device and equipment
CN110795618A (en) * 2019-09-12 2020-02-14 腾讯科技(深圳)有限公司 Content recommendation method, device, equipment and computer readable storage medium
CN112100354A (en) * 2020-09-16 2020-12-18 北京奇艺世纪科技有限公司 Man-machine conversation method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157896A (en) * 2021-05-26 2021-07-23 中国平安人寿保险股份有限公司 Voice conversation generation method and device, computer equipment and storage medium
CN113157896B (en) * 2021-05-26 2024-03-29 中国平安人寿保险股份有限公司 Voice dialogue generation method and device, computer equipment and storage medium
JP7192039B1 (en) 2021-06-14 2022-12-19 株式会社大和総研 Matching system and program
JP2022190557A (en) * 2021-06-14 2022-12-26 株式会社大和総研 Matching system and program

Similar Documents

Publication Publication Date Title
US20180336193A1 (en) Artificial Intelligence Based Method and Apparatus for Generating Article
CN110298029B (en) Friend recommendation method, device, equipment and medium based on user corpus
CN113283238B (en) Text data processing method and device, electronic equipment and storage medium
CN111310440A (en) Text error correction method, device and system
CN107862058B (en) Method and apparatus for generating information
CN112732893B (en) Text information extraction method and device, storage medium and electronic equipment
CN110427453B (en) Data similarity calculation method, device, computer equipment and storage medium
CN112199588A (en) Public opinion text screening method and device
CN112800209A (en) Conversation corpus recommendation method and device, storage medium and electronic equipment
CN109190123B (en) Method and apparatus for outputting information
CN112765364A (en) Group chat session ordering method and device, storage medium and electronic equipment
WO2021118746A1 (en) Systems and methods for generating labeled short text sequences
CN112597292B (en) Question reply recommendation method, device, computer equipment and storage medium
CN113934834A (en) Question matching method, device, equipment and storage medium
CN112100491A (en) Information recommendation method, device and equipment based on user data and storage medium
CN112784032A (en) Conversation corpus recommendation evaluation method and device, storage medium and electronic equipment
CN111079854A (en) Information identification method, device and storage medium
CN116775815B (en) Dialogue data processing method and device, electronic equipment and storage medium
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium
CN112615774B (en) Instant messaging information processing method and device, instant messaging system and electronic equipment
CN113010664B (en) Data processing method and device and computer equipment
CN110535749B (en) Dialogue pushing method and device, electronic equipment and storage medium
CN110502698B (en) Information recommendation method, device, equipment and storage medium
CN110413637B (en) Information recommendation method, device and equipment
CN110737750B (en) Data processing method and device for analyzing text audience and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination