CN116308668A

CN116308668A - Self-adaptive dialogue recommendation method and system for new commodity and new user

Info

Publication number: CN116308668A
Application number: CN202310300114.7A
Authority: CN
Inventors: 张业勤; 阮锦绣
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2023-03-26
Filing date: 2023-03-26
Publication date: 2023-06-23

Abstract

The invention discloses a self-adaptive dialogue recommendation method and a self-adaptive dialogue recommendation system for a new commodity new user, wherein the method comprises the following steps: constructing a user interaction diagram based on the user, the commodity and commodity attributes and the dialogue; based on the user interaction graph, using a generalized graph algorithm to learn and obtain the representation of all nodes in the current graph; obtaining a state and an action representation according to the node representation; based on state and action representation, introducing state transition and rewards to form a Markov decision process, and learning an action cost function through a deep Q network algorithm; query attributes or recommended merchandise are made in each round of conversational recommendation based on the modeled action cost function. The invention dynamically obtains the newly added commodity and the representation of the user through the neighbor information, and simultaneously models the dialogue strategy and the recommendation strategy through introducing the graph model, so that the two strategies can be mutually fused to perform better interaction.

Description

Self-adaptive dialogue recommendation method and system for new commodity and new user

Technical Field

The invention relates to the field of interactive dialogue recommendation, in particular to a self-adaptive dialogue recommendation method and a self-adaptive dialogue recommendation system for a new commodity new user.

Background

The recommender system is a tool that suggests items that may be of interest to the target user. Recommendation systems have been developed because people often listen to advice provided by others in daily work and decisions. With the development of e-commerce websites, the number and types of products are rapidly increasing, and users are required to spend a great deal of time finding products they want to purchase. There is an urgent need for a recommendation system capable of providing recommendation results from all goods according to user preference.

However, the conventional recommendation system performs commodity recommendation for the user based on the user interaction history and the friend interaction history, lacks direct feedback of the user, and cannot well adapt to the problem of user preference migration. And the user may not know what he likes by himself before being informed of some viable choices. Therefore, interactive dialogue recommendation systems are favored. However, the problem of user preference migration cannot be solved by performing recommendation based on priori knowledge such as interaction history of the user, and in addition, the dialogue recommendation system has inherent problems inherited to the recommendation system, such as new commodities, and online addition of the new user leads to model unavailability. On the other hand, if the prior knowledge such as the interaction history of the user is completely abandoned, meaningless exploration is greatly increased. Therefore, how to ensure the recommendation system to perform adaptive dialogue recommendation when a new commodity is added online is a worthy and urgent problem to be solved.

Disclosure of Invention

The invention aims to: the invention provides a self-adaptive dialogue recommendation method and a self-adaptive dialogue recommendation system for a new commodity and a new user, and at least partially solves the problems in the prior art.

The technical scheme is as follows: in order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, an adaptive dialogue recommendation method for a new commodity new user includes the following steps:

taking a user, a commodity, commodity attributes and a conversation as nodes, and taking the connection of the commodity and the corresponding commodity attribute, the connection of the conversation and the preference commodity or the preference attribute related to the conversation, the connection of the user and the conversation on history and the connection between the user and the friend user as sides between the nodes to construct a user interaction diagram;

initializing embedded vectors of nodes on a user interaction graph, finding all source nodes with all target nodes as the nodes for each node, performing matrix conversion on the representation of the source nodes by using a graph embedding algorithm according to the types of edges between the source nodes and the target nodes, aggregating the converted representations of all the source nodes, and learning an aggregation function by using a generalized graph algorithm;

splicing the representations corresponding to the user initiating the conversation and the representations corresponding to the current conversation to form a current state representation, and splicing the probabilities that the current candidate node will be selected in the next round under the given current conversation and the probabilities that the current candidate node will be selected in the next round under the given current interaction node to form a current action representation, wherein the current candidate node comprises candidate commodity nodes which meet all the preference attributes of the user in the current conversation and simultaneously remove the rejected commodities in the current conversation, and candidate attribute nodes which simultaneously remove all the attributes of the current candidate commodity set and the user preference or the rejected attributes in the current conversation;

based on the current state representation and the current action representation, introducing state transition and rewards to form a Markov decision process, and learning an action cost function through a deep Q network algorithm;

based on the modeled action cost function, in each round of dialogue recommendation, according to the current user interaction graph, the representation of the action and the state is obtained, and in the input action cost function, the input action with the highest output action value is selected to be executed.

According to certain embodiments of the first aspect, in the user interaction graph, conversations between the user and the user history and the user and the friend user thereof are used as data describing prior characteristics of the user, and the conversations and the preference commodities and preference attributes related to the conversations and the user are used as posterior information.

According to certain implementations of the first aspect, the matrix converting the representation of the source node using the graph embedding algorithm includes: the found source node is called the neighbor node of the target node, and the neural network is convolved by means of the relation diagramThe idea of the method is that for all neighbor nodes, before information aggregation, different conversion matrixes are set according to different edge relations with target nodes: θ _r X _j ,r∈R,j∈N(i)，X _j To initialize the representation of all source nodes in the graph by a random walk or adjacency matrix, θ _r Representing a transformation matrix associated with a relationship edge, R being a set of relationship edges, N (i) representing a neighbor node of i, i being a source node.

According to certain embodiments of the first aspect, the aggregating of neighboring nodes results in a representation of the source node as:

f _aggregate () For the aggregation function, the generalized graph algorithm learns the aggregation function based on known data, so that when a new node is added, a vectorized representation of the newly added node can be obtained by aggregating representations of neighboring nodes.

According to certain embodiments of the first aspect, the probability that a current candidate node will be selected in the next round given a current dialog is obtained by using a connection prediction algorithm between graph nodes; the probability that the current candidate node will be selected in the next round given the currently interacted node is obtained by predicting the probability that the candidate node has some type of edge with the current round of dialogue nodes.

According to certain embodiments of the first aspect, the state transition is connecting, in the user interaction graph, the current dialog node with the current round of user preferred properties or preferred goods after the user has performed the action;

the rewards include five types of rewards: r is (r) _{rec_suc} : when the user accepts the recommended commodity, the user has forward rewards of a first amount; r is (r) _{rec_fail} : when the user refuses the recommended commodity, the user has a negative punishment of the first limit; r is (r) _{ask_suc} : when the user receives the inquired attribute, the user has a forward rewards with a second amount; r is (r) _{ask_fail} : when the user refuses to inquire about the attribute, the second amount is providedNegative rewards of degrees; r is (r) _quit : when the dialogue exceeds the maximum round number, the first amount of negative rewards exists; wherein the first credit is greater than the second credit;

based on a Markov decision process, the action cost function is separated into a cost function and a dominance function by using a competition depth Q network algorithm, and calculation is performed by using two networks, so that the learning of the action cost function is completed.

According to certain embodiments of the first aspect, the cost function is constructed by a fully connected neural network, and the value V (S) of the current state is output by inputting the current state, performing a value judgment of the current state; the dominance function is constructed by a fully connected neural network, and the value A (S, a) of a certain action in the current state is output by inputting the current state and the action and evaluating the value generated by the certain action in the current state; the action cost function is formed by summing a cost function and a dominance function: q (S, a) =v (S) +a (S, a), and the action cost function performs parameter learning based on existing data through a strategy gradient algorithm.

In a second aspect, an adaptive dialogue recommendation system for a new commodity new user includes:

the diagram construction module is used for taking a user, a commodity, commodity attributes and a conversation as nodes, taking the connection of the commodity and the corresponding commodity attribute, the connection of the conversation and the preference commodity or the preference attribute related to the conversation, the connection of the user and the historical conversation and the connection between the user and the friend user as sides between the nodes, and constructing a user interaction diagram;

the embedding module is used for initializing the embedding vector of the node on the user interaction graph, finding all source nodes with all target nodes as the node for each node, performing matrix conversion on the representation of the source nodes by using a graph embedding algorithm according to the types of edges between the source nodes and the target nodes, aggregating the converted representation of all the source nodes, and learning an aggregation function by using a generalized graph algorithm;

the feature representation module is used for splicing the representation corresponding to the user initiating the conversation and the representation corresponding to the current conversation to form a current state representation, and splicing the probability that the current candidate node will be selected in the next round under the given current conversation and the probability that the current candidate node will be selected in the next round under the given current interaction node to form a current action representation, wherein the current candidate node comprises candidate commodity nodes which meet all the preference attributes of the user in the current conversation and simultaneously remove the rejected commodity in the current conversation, and candidate attribute nodes which simultaneously remove all the attributes of the current candidate commodity set and the user preference or the rejected attribute in the current conversation;

the strategy learning module is used for introducing state transition and rewards to form a Markov decision process based on the current state representation and the current action representation, and learning an action cost function through a deep Q network algorithm;

and the strategy execution module is used for obtaining the representation of the action and the state according to the current user interaction diagram in each round of dialogue recommendation based on the modeled action cost function, inputting the action cost function, and selecting the input action with the highest output action value for execution.

In a third aspect, a computer device comprises: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs when executed by the processors implement the steps of the adaptive dialogue recommendation method for new users of new goods according to the first aspect of the present invention.

In a fourth aspect, a computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the adaptive dialogue recommendation method for new users of new goods according to the first aspect of the invention.

The beneficial effects are that:

1. the invention provides a self-adaptive dialogue recommendation method and a self-adaptive dialogue recommendation system for new commodity users, which solve the problem of online addition of new nodes through a inductive algorithm, provide initial representation of the new users by dynamically aggregating representations of friend user nodes when the new users are added, and provide initial representation of the new commodity nodes by dynamically aggregating representations of attribute nodes corresponding to the new commodity when the new commodity is added, so that self-adaptive dialogue recommendation can be performed.

2. The invention uses the unified model to model the dialogue strategy and the recommendation strategy, so that the dialogue strategy and the recommendation strategy can be mutually fused to perform more effective interaction.

3. According to the invention, the user interaction history and the dialogue of the current user are modeled simultaneously through the user interaction graph, so that the prior information can directly influence the state and action representation of the dialogue of the current user by the graph aggregation algorithm, thereby providing a meaningful initial representation and reducing ineffective exploration in the early stage of strategy learning.

Drawings

FIG. 1 is a flow chart of a new commodity new user adaptive dialogue recommendation method provided by an embodiment of the invention;

FIG. 2 is a user interaction diagram provided by an embodiment of the present invention;

FIG. 3 is a schematic representation diagram of all nodes in a current graph obtained by learning a generalized graph algorithm provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a modeling process of actions and states provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a process for performing policy learning using a modeling action cost function according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

In order to solve the problem of user preference offset, the current dialogue recommendation strategies are completely carried out through current feedback of the user, and all priori information is abandoned, so that meaningless exploration is often carried out in the strategy learning process. Meanwhile, the addition of new commodities can enable the dialogue recommendation strategy to learn again, and therefore the dialogue recommendation strategy is very low in efficiency and time-consuming. According to the method, the prior information is fused into the strategy learning through the user interaction diagram, so that the strategy learning exploration efficiency is improved, the meaningless exploration times are reduced, and a more valuable strategy model is learned. Meanwhile, by using the induction graph algorithm, the neighbor nodes of the new commodity such as the commodity attribute node can be used for induction to obtain the vector representation of the new commodity, so that strategy relearning is not needed.

Fig. 1 is a flowchart of a new commodity new user adaptive dialogue recommendation method provided by an embodiment of the present invention. The method can be executed by a server, and the generated model can be suitable for various terminal application scenes. The server may execute the dialogue recommendation policy model to perform a certain policy dialogue with the user and send the system questioning or replying content to the terminal so that the terminal presents the system questioning or replying content to the user. The following describes the method in detail with reference to fig. 2 by taking a server as an example, and the specific steps of the method are as follows:

s101, constructing a user interaction diagram.

The user, the commodity attribute and the dialogue are taken as nodes, and the commodity attribute is also called as attribute for short hereinafter. As shown in fig. 2, the user nodes include "liqing", "dake", the session nodes include session 1, session 2 and session 3, the commodity or the attribute of preference preferred by the user can be embodied in the session, as shown in fig. 2, the commodity node includes a running machine, a mini refrigerator, the attribute of commodity follows the preference embodied in the commodity or session, the attribute of running machine related in the embodiment includes sports class, household and running, the attribute of mini refrigerator includes dormitory use, small and refrigerated food, and in addition, the attribute related in the session includes such as soymilk grinding and meat cutting, by making the commodity and the attribute corresponding thereto, the session and the attribute of preference related thereto, the session of the user and the history thereof, the user and the user thereof as different types of friends, so that the user node, the commodity attribute node, the commodity node and the session node are mutually connected, and the user interaction diagram shown in fig. 2 is constructed. Wherein, commodity nodes and attribute nodes are connected by the attribute of commodity as a relation edge to form a commodity attribute graph shown as reference numeral 21. The dialogue node is connected with the attribute node or commodity node by the attribute or commodity preferred by the user in the dialogue as the relation edge, and forms a user dialogue history diagram shown as reference numeral 22 together with the commodity attribute diagram. The user node and the dialogue node are connected by the user and the related dialogue thereof as a relation edge, and the user are connected by a friend relation edge, and the user node and the dialogue node form a user interaction graph shown as reference numeral 23 together.

S102, learning by using a generalized graph algorithm to obtain representations of all nodes in the current graph.

The representation of all nodes in the graph is first initialized by a random walk or adjacency matrix. Then, for each node in the graph, all edges taking the node as a target node are found, so that all source nodes corresponding to the node are found, and the found source nodes are called neighbor nodes of the target node. Then, by means of the idea of the convolutional neural network of the relation diagram, before information aggregation is carried out on all neighbor nodes, different conversion matrixes are set according to different relations between edges of the neighbor nodes and the target node: θ _r X _j R epsilon R, j epsilon N (i), R is a set of relation edges, N (i) represents a neighbor node of i, and i is a source node. X is X _j To initialize the representation of all source nodes in the graph by a random walk or adjacency matrix, θ _r Representing the transformation matrix associated with the relationship edge. Different relation edges, transformation matrix theta _r Different. The relationship graph convolution neural network is characterized in that different transformation matrixes are generated according to different relationship edges of a source node and a target node, and the transformation matrixes are multiplied by a source node representation to obtain a representation of the source node mapped to the target node.

After the neighbor nodes are aggregated, the representation of the source node is obtained as follows:

1, …, …, n denote n different relationship edges, where j ₁ ,…j _m All source nodes representing the type of edge between the source node and the target node in the relationship graph are first relationship edges. As shown in fig. 3, the "dialogue" node is the target node, and the "refrigerated food", "liqing", "dormitory", "mini-refrigerator" node is the corresponding source node, wherein "refrigerated food", "mini-refrigerator" is the target node "The edges between the dormitory "," mini refrigerator "node and the" dialogue "node belong to the same relational edge, and the edges between the" li qing "node and the" dialogue "node belong to another relational edge. Then

Refrigerated food, dormitory, mini refrigerator e r ₁ Liqing E r ₂ 。

The objective of the generalized graph algorithm is to aggregate the function f by means of known data _aggregate () Such that when a new node joins, a vectorized representation of the newly joined node is obtained by aggregating representations of neighboring nodes. That is, the new node is represented by the aggregation of all neighboring nodes before it joins, so the goal is to learn how to aggregate the information of these neighboring nodes as a function of how to obtain the most efficient representation of the new node. The new node mainly refers to new commodity and new user in the reasoning stage, but simultaneously, as the dialogue proceeds, the vectorization embedding of dialogue nodes and users is updated based on the inductive graph algorithm, and in the training stage, various types of nodes need to be randomly sampled for training. Summarizing the graph algorithm, first set the aggregation depth to K, for one node, sample their neighbors (first order neighbors), for first order neighbors, sample their neighbors (second order neighbors) until the K order neighbors sampling is complete. And (3) acting on the K-order neighbors through an aggregation function to obtain the representation of the K-1 order neighbors until the aggregation representation of the K-1 order neighbors is finally obtained. Assuming k=2, taking fig. 2 as an example, when session 2 of li talks about a preferred commodity as a mini-refrigerator, in order to get a representation of session 2, a second order neighbor of session 2 is first found, which is session 1 (session 3 is to appear after session 2 is completed, so no session 3 is present), so that it is possible to use it in dormitory, small and cool food. Dialog 1 and the information of the large can are aggregated using an aggregation function (the aggregation function may be a aggregation function that needs to be learned such as mean aggregation, lstm aggregation or pooling aggregator)Number) to obtain the neighbor information of li. And connecting the expression of the rime and the neighbor information, and obtaining the expression after the rime update through a fully connected neural network, and updating all the first-order neighbor node information of the conversation 2 in the same way. Finally, according to the updated first-order neighbor node information, the updated representation of the conversation 2 is calculated in the same manner. The learned loss source of the generalized graph algorithm can be either a downstream connection prediction task or a final recommendation task. S103, obtaining a state and an action representation.

Since the subsequent policy model is modeled as a Markov decision process, there may be four most important parts of the Markov decision process to define: status, action, transfer, and rewards; of these, what is most needed to clarify is modeling of states and actions, as shown in FIG. 4, by using a representation of the user's node of the current conversation, and a representation of the current conversation's node, concatenating forms a state representation, such as the representation of the current wheel state shown at reference numeral 43: state=

concate(E _user ,E _dialog )。

The action is then a description of actions that can be taken at the current wheel system, including: and continuing to inquire the attribute of the user preference, and recommending commodities to the user according to the user preference information obtained by the current dialogue. The action space thus contains both commodity and attribute nodes. Here, all selectable attribute nodes are referred to as candidate attribute nodes, and all selectable commodity nodes are referred to as candidate commodity nodes. At round t, the candidate commodity node includes all commodities that satisfy the user's current preference attributes, but does not include commodities that the user has rejected:

wherein->

Representing up to round t, the set of user preference attributes, < ->

Representing the t-th wheelAll goods satisfying the user's current preference attributes, +.>

Representing the set of items rejected by the user up to round t. The t round, candidate attribute node includes all attributes that all candidate commodity nodes contain at present, but do not include the attribute that the user has refused at round 1 to t-1, nor do they include the attribute that the user has accepted at round 1 to t-1:

wherein->

Representing all attribute sets contained in candidate commodity nodes, +.>

Property indicating that the user has accepted in round 1 to t-1,/for>

Indicating that the user has rejected at 1 to t-1 round. For the connection prediction task, this can be simply considered as a classification task, with the classification objective: and judging the probability that one edge exists between two nodes in the current graph. Specifically, the probability that an edge exists between the current dialogue node and a certain attribute or a certain commodity node is the attribute or commodity for predicting the current preference of the user. The simplest way of this task is to splice the representations of two nodes and then pass through a fully connected neural network to finally output the probability (number between 0 and 1) that an edge exists between the two nodes.

Each candidate node representation, and the probability that candidate node will be selected in the next round given the current dialog, are stitched to form a representation of the current round of actions as shown at reference numeral 42: action=concate (E _cand ,p(cand))，E _cand An embedded vector representation representing candidates, namely a candidate node representation obtained by aggregation of the graph algorithm, wherein V _cand And P _cand Construction siteThere is a collection of candates.

cand＝P _cand ∪V _cand

S104, modeling an action cost function.

On the basis of the representation of the state and action of the previous step, the definition of supplemental state transitions and rewards constitutes a complete markov decision process. The state transition is to connect the attribute or the preferential commodity of the current dialogue node and the current round of user preference in the user interaction graph after the user performs the action. The state and action representations are updated by a generalized graph algorithm. The state is formed by splicing the representations of the current user node and the current dialogue node, and in the process of dialogue progress, the dialogue node is connected with more commodities or attribute nodes because the user can indicate the preferred attribute or commodity, and the commodities and the attribute nodes are second-order neighbors of the user node at the same time, so that the representations of the current user node and the current dialogue node output by the generalized graph algorithm can change. Such as:

the rewards include five types of rewards:

r _{rec_suc} : when the user accepts the recommended commodity, the user has forward rewards of a first amount;

r _{rec_fail} : when the user refuses the recommended commodity, the user has a negative punishment of the first limit;

r _{ask_suc} the user receives the attribute of inquiry and has the positive rewards of the second limit;

r _{ask_fail} : when the user refuses the inquired attribute, the user has negative rewards of the second amount;

r _quit : when the dialogue exceeds the maximum number of roundsThen there is a negative prize for the first credit;

wherein the first credit is greater than the second credit.

Based on this markov decision process, a competition deep q network (Dueling DQN) algorithm is introduced as a unified policy learning module for dialogue policies and recommendation policies, as shown in fig. 5. Such value-based algorithms are explicit modeling of the expected rewards for each action performed in each state. In consideration of the continuity and infinity of state representation, the continuity of motion representation and the factors of motion limitation, network oscillation, overestimation and the like, the invention uses a competition depth q network (lasting DQN) algorithm as a strategy learning module to model a motion cost function.

Referring to fig. 5, a competition depth q network (lasting DQN) separates an action cost function into a cost function and a dominance function, and uses two networks for calculation. The cost function 52 is constructed by a fully connected neural network, and the value V (S) of the current state is output by inputting the current state and performing the value judgment of the current state. V (S) is a neural network, inputs are current states, and outputs are predictions of the value of the current states. This measure of value is as mentioned above in relation to rewards. Firstly, when the dialogue recommending task is completed, the rewards of the dialogue can be accurately calculated according to the five types of rewards defined above, and the rewards are the value of the whole dialogue finally. The value of this state is then, derived forward, the previous round of the end of the session: the value of the final entire conversation minus the value of the action taken to bring the conversation towards final (multiple actions can be taken in the previous round before the conversation ends, only one or a few actions can result in the conversation becoming final, the selection of other actions will become other conversations), and so on, to obtain the value of the initial conversation sentence. Training through a neural network enables the value of the current state of a new dialog to be predicted. The fully connected network of the first layer and the hidden layer are used for further extracting the characteristics of the state and the action, and the fully connected network of the second layer is used for calculating the value (advantage function) of the action and the value (cost function) of the state through the deep characteristics.

The merit function 51 is constructed by a fully-connected neural network, and performs a value evaluation by inputting a current state and an action, and outputs a value a (S, a) of an action in the current state. A (S, a) refers to the value of taking an action in state S. Taking fig. 2 as an example, in the dialogue 3, the user preference is known to be grinding soybean milk and cutting meat, at this time, if the action selected by the system is to ask whether the user preference commodity is a wall breaking machine, and if the user preference is really a wall breaking machine, the action generates a large forward value, specifically, the value generated by the action is the value of the final dialogue minus the value of the current state

The action cost function 53 is formed by summing the cost function 52 and the dominance function 51: q (S, a) =v (S) + with

A (S, a). The action cost function 53 performs parameter learning based on the existing data by a strategy gradient algorithm.

S105, generating a dialogue for inquiring the attribute or fusing knowledge of the recommended commodity according to the strategy.

The invention dynamically obtains newly added commodity and user representation through neighbor information by using a generalized graph algorithm, and obtains better self representation through aggregation of neighbor node vectors so that the following tasks such as link prediction and reinforcement learning can benefit; by introducing the graph model to model the dialogue strategy and the recommendation strategy simultaneously, the dialogue strategy module and the recommendation strategy module can be mutually fused to perform better interaction; in the dialogue process, the representation of the dialogue user and the representation of the current dialogue node are spliced to be used as states, the candidate node representation and the connection prediction probability of the candidate node and the previous round of user preference node are spliced to be used as action representations, the action cost function is modeled by using a reinforcement learning algorithm, finally, whether the next node is commodity or attribute is selected by maximizing the whole dialogue value, and the recommendation or dialogue query operation is carried out according to the different types of the selected next node.

The invention also provides a self-adaptive dialogue recommendation system for new commodity and new users, comprising:

It should be understood that, the adaptive dialogue recommendation system for a new commodity and a new user in the embodiment of the present invention may implement all the technical solutions in the above method embodiments, and the functions of each functional module may be specifically implemented according to the methods in the above method embodiments, and the specific implementation process may refer to the relevant descriptions in the above embodiments, which are not repeated herein.

The present invention also provides a computer device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the adaptive dialogue recommendation method for new merchandise new users as described above.

The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the adaptive dialogue recommendation method for a new commodity new user as described above.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium. In the context of the present invention, the computer-readable medium may be considered to be tangible and non-transitory. Non-limiting examples of non-transitory tangible computer readable media include non-volatile memory circuits (e.g., flash memory circuits, erasable programmable read-only memory circuits, or masked read-only memory circuits), volatile memory circuits (e.g., static random access memory circuits or dynamic random access memory circuits), magnetic storage media (e.g., analog or digital magnetic tape or hard disk drives), and optical storage media (e.g., CDs, DVDs, or blu-ray discs), among others.

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

Moreover, although operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the invention. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to the specific details of the above embodiments, and various equivalent changes can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the equivalent changes belong to the protection scope of the present invention.

Claims

1. The self-adaptive dialogue recommendation method for the new commodity and the new user is characterized by comprising the following steps of:

2. The method of claim 1, wherein in the user interaction graph, conversations between the user and his history and the user and his friends are used as data describing a priori characteristics of the user, and conversations and his related preferred items and preferred attributes are used as posterior information.

3. The method of claim 1, wherein the matrix-transforming the representation of the source node using a graph embedding algorithmThe replacement comprises the following steps: the found source node is called a neighbor node of the target node, and different conversion matrixes are set for all neighbor nodes according to different edge relations with the target node before information aggregation by means of the idea of a relation graph convolutional neural network: θ _r X _j ,r∈R,j∈N(i)，X _j To initialize the representation of all source nodes in the graph by a random walk or adjacency matrix, θ _r Representing a transformation matrix associated with a relationship edge, R being a set of relationship edges, N (i) representing a neighbor node of i, i being a source node.

4. A method according to claim 3, wherein the aggregation of the neighboring nodes results in a representation of the source node as:

5. The method according to claim 1, wherein the probability that a current candidate node will be selected in the next round given a current session is obtained by using a connection prediction algorithm between graph nodes; the probability that the current candidate node will be selected in the next round given the currently interacted node is obtained by predicting the probability that the candidate node has some type of edge with the current round of dialogue nodes.

6. The method of claim 5, wherein the state transition is a commodity connecting the attribute or preference of the current dialog node and the current round of user preference in the user interaction graph after the user performs the action;

the rewards include five types of rewards: r is (r) _{rec_suc} : when the user accepts the recommended commodity, the user has forward rewards of a first amount; r is (r) _{rec_fail} : when the user refuses to recommend commodity, the first limit is setIs a negative penalty of (1); r is (r) _{ask_suc} : when the user receives the inquired attribute, the user has a forward rewards with a second amount; r is (r) _{ask_fail} : when the user refuses the inquired attribute, the user has negative rewards of the second amount; r is (r) _quit : when the dialogue exceeds the maximum round number, the first amount of negative rewards exists; wherein the first credit is greater than the second credit;

7. The method of claim 6, wherein the cost function is constructed by a fully connected neural network, and the value V (S) of the current state is output by inputting the current state, performing a value judgment of the current state; the dominance function is constructed by a fully connected neural network, and the value A (S, a) of a certain action in the current state is output by inputting the current state and the action and evaluating the value generated by the certain action in the current state; the action cost function is formed by summing a cost function and a dominance function: q (S, a) =v (S) +a (S, a), and the action cost function performs parameter learning based on existing data through a strategy gradient algorithm.

8. An adaptive dialogue recommendation system for a new commodity new user, comprising:

9. A computer device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processor implement the steps of the adaptive dialogue recommendation method for new merchandise new users according to any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the adaptive dialogue recommendation method for new users of new goods according to any of the claims 1-7.