CN117056595A - Interactive project recommendation method and device and computer readable storage medium - Google Patents

Interactive project recommendation method and device and computer readable storage medium Download PDF

Info

Publication number
CN117056595A
CN117056595A CN202310966515.6A CN202310966515A CN117056595A CN 117056595 A CN117056595 A CN 117056595A CN 202310966515 A CN202310966515 A CN 202310966515A CN 117056595 A CN117056595 A CN 117056595A
Authority
CN
China
Prior art keywords
user
item
function
network
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310966515.6A
Other languages
Chinese (zh)
Inventor
魏文国
陈俊儒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202310966515.6A priority Critical patent/CN117056595A/en
Publication of CN117056595A publication Critical patent/CN117056595A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an interactive project recommendation method, a device and a computer readable storage medium, wherein the method comprises the following steps: acquiring user embedded features and project embedded features from randomly extracted sample batches; the user embedded feature and the project embedded feature are enabled to pass through a trained GMF network, and a collaborative feature vector obtained by splicing the user embedded feature vector and the project embedded feature vector is obtained; confirming a strategy function in an actor network according to an action cost function obtained through a commentator network and an entropy item regulated by a self-updating temperature control factor; obtaining a current state according to the cooperative feature vector; and obtaining a current action according to the current state and the strategy function, and converting the current action into a project recommendation list. The application enhances the utilization rate of similar information between the user and the project through the GMF network, and reduces the influence of user preference change on recommended content through the continuously updated critic network and actor network.

Description

Interactive project recommendation method and device and computer readable storage medium
Technical Field
The application relates to the field of deep learning oriented to big data, in particular to an interactive project recommendation method, device and medium.
Background
An interactive recommendation system based on deep reinforcement learning can learn its preferences through interactions with users and recommend related items for it. Interactive recommendation systems based on deep reinforcement learning are more sensitive to perceived changes in user preferences from the environment than traditional recommendation systems.
The main problems faced by deep reinforcement learning algorithms and recommendation systems modeled based on such algorithms are similar, namely complexity problems of action exploration and decision making in high-dimensional state space, and convergence problems in function policy optimization. While the traditional recommendation system based on deep learning can improve the prediction capability of the system for the fixed preference of the user by analyzing the relevant characteristics of the user and the project, the traditional recommendation system based on deep learning cannot accurately recommend the user with the preference changing continuously along with the time.
Disclosure of Invention
The embodiment of the application provides an interactive project recommending method, device and computer readable storage medium, which are used for enhancing the utilization rate of similar information between a user and a project through a GMF (gateway mobile communication) network and reducing the influence of deviation items caused by user preference change on recommended contents of the user through a continuously updated critic network and an actor network.
To achieve the above object, a first aspect of an embodiment of the present application provides an interactive item recommendation method, including:
acquiring user embedded features and project embedded features from randomly extracted sample batches;
the user embedded feature and the project embedded feature are enabled to pass through a trained GMF network, and a collaborative feature vector obtained by splicing the user embedded feature vector and the project embedded feature vector is obtained;
confirming a strategy function in an actor network according to an action cost function obtained through a commentator network and an entropy item regulated by a self-updating temperature control factor;
obtaining a current state according to the cooperative feature vector;
and obtaining a current action according to the current state and the strategy function, and converting the current action into a project recommendation list.
In a possible implementation manner of the first aspect, the training procedure of the GMF network is:
acquiring real user preference from the history record;
obtaining a plurality of user embedded features and a plurality of item embedded features through sampling;
in the GMF layer, each user embedded feature is subjected to feature intersection with a corresponding item embedded feature to obtain a plurality of intersection features;
fitting the plurality of cross features by randomly selecting and closing off portions of neurons at a discard layer;
and in the prediction layer, after the user prediction preference is obtained according to one cross characteristic each time, updating a prediction function according to the difference between the user prediction preference and the user real preference.
In a possible implementation manner of the first aspect, the obtaining the user prediction preference according to a cross feature specifically includes:
constructing a prediction function by adopting a full connection layer mode;
and obtaining user prediction preference according to the prediction function.
In a possible implementation manner of the first aspect, the updating the primary prediction function according to the difference between the predicted preference of the user and the actual preference of the user specifically includes:
updating the gradient of the loss function by comparing the difference between the user predicted preference and the user actual preference;
and updating the parameters of the prediction function and the discarding layer according to the loss function.
In a possible implementation manner of the first aspect, the obtaining a current action according to the current state and the policy function specifically includes:
confirming a set of selectable items in the current state according to the current state;
confirming the value of the shielding vector according to the recommended record of each item;
and acquiring the current action according to the selectable item set in the current state, the shielding vector and the strategy function.
In a possible implementation manner of the first aspect, the converting the current action into an item recommendation list specifically includes:
acquiring recommendation probability corresponding to each item;
according to the recommendation probability, all the items are arranged in a descending order;
a preset number of items arranged in a descending order in front are made into an item recommendation list.
In a possible implementation manner of the first aspect, after the obtaining the current action and converting the current action into the item recommendation list, the method further includes:
obtaining a current reward through real-time interaction with a user environment;
obtaining a historical reward through historical interactions with the user's environment;
obtaining a target state according to the historical rewards and the current rewards, and storing the current state, the current action, the current rewards and the target state into a priority experience playback technical pool in a four-tuple mode;
sampling a quadruple from the preferential experience playback technology pool, inputting the sampling result into the interview network, and obtaining an action value item from an action value function in the interview network;
and updating the strategy function according to the action value item and an entropy updating item obtained through the alpha function.
In a possible implementation manner of the first aspect, after the updating the policy function, the method further includes:
obtaining a strategy function evaluation item according to the action cost function and the strategy function;
updating the action cost function and the alpha function according to the strategy function evaluation item and the real action cost item; the real action value item is obtained according to the current rewards and a target state value, the target state value comprises a target action value item of an attenuation factor and an entropy item obtained through the alpha function, and the target action value item is obtained through evaluation of the target state through the strategy function.
A second aspect of an embodiment of the present application provides an interactive item recommendation apparatus, including:
the random acquisition module is used for acquiring user embedded features and project embedded features from randomly extracted sample batches;
the vector acquisition module is used for enabling the user embedded feature and the project embedded feature to acquire a collaborative feature vector which is obtained by splicing the user embedded feature vector and the project embedded feature vector through a trained GMF network;
the function confirmation module is used for confirming a strategy function in the actor network according to the action cost function obtained through the commentator network and the entropy item regulated by the self-updating temperature control factor;
the state acquisition module is used for acquiring the current state according to the cooperative feature vector;
and the project recommending module is used for obtaining the current action according to the current state and the strategy function and converting the current action into a project recommending list.
A third aspect of embodiments of the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements an interactive item recommendation method as described above.
Compared with the prior art, the interactive project recommendation method, device and computer readable storage medium provided by the embodiment of the application have the advantages that the recommendation process is completed by the GMF network, the commentator network, the actor network, the simulated user environment and the priority experience playback pool. Training cross features between the user and the project through the GMF network, so that the utilization rate of similar information between the user and the project by the recommendation agent is enhanced; the commentator network and the actor network are continuously updated through the alpha function containing the self-updating temperature control factor alpha, so that the influence of the partial difference on the recommendation result is reduced.
Because the GMF network learns the users with the preferences changing along with time, a new user embedded feature update GMF network is obtained to adapt to the change of the users; when the critics network and the actor network are updated, the current rewards obtained by real-time interaction with the user are used as update basis, and new critics network and actor network are obtained to adapt to the change of the user.
Drawings
FIG. 1 is a flow chart of an interactive project recommendation method according to an embodiment of the application;
FIG. 2 is a schematic diagram of an update of a GMF network, an actor network and a reviewer network according to an embodiment of the present application;
fig. 3 is a schematic diagram of a GMF network according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a user environment according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, an embodiment of the present application provides an interactive project recommendation method, which includes:
s10, acquiring user embedded features and project embedded features from randomly extracted sample batches;
s11, enabling the user embedded feature and the project embedded feature to obtain a collaborative feature vector obtained by splicing the user embedded feature vector and the project embedded feature vector through a trained GMF network;
s12, confirming a strategy function in the actor network according to an action cost function obtained through the commentator network and an entropy item regulated by the self-updating temperature control factor;
s13, obtaining a current state according to the cooperative feature vector;
s14, obtaining a current action according to the current state and the strategy function, and converting the current action into a project recommendation list.
The present embodiment adopts Generalized Matrix Factorization (GMF) algorithm in S10-S11 to train the cross feature between the user and the item, thereby enhancing the utilization of similar information between the user and the item by the recommendation agent. Meanwhile, a Critic network and an Actor network are adopted in S13-S14 to realize Soft Actor-Critic (SAC) algorithm modeling. As shown in fig. 2, the community formed by the GMF network, the simulated user environment, the priority experience playback pool, the commentator network and the actor network can be regarded as a recommendation agent, the recommendation agent performs item recommendation so as to reduce the influence of bias on the system, and finally, the prediction capability of the recommendation agent on the user preference is improved through policy optimization. The training process of the recommendation agent mainly comprises five parts, namely a GMF network, an actor network and a reviewer network based on a SAC algorithm, a simulated user environment and a preferential experience playback technology.
Wherein in S10, the user-embedded features and the item-embedded features are obtained from randomly extracted batches, each batch containing a certain number of user-item pairs (i.e. preferences of a certain user for a certain item), one part being positive sample pairs formed by the user and favorite items, and the other part being negative sample pairs formed by the user and dislike items.
It should be noted that, the feature/feature vector in the present application refers to a vector that can be used to represent some characteristics of things; the user characteristics and the item characteristics are used for respectively indicating what items the user likes and what items the user likes, wherein the items can be movies on video websites, on-demand music on music websites, specific commodities on shopping websites, same city services on same city platforms and the like, and the websites/platforms all use recommended services for the user, so that the items are a general term of various content services, and for convenience of illustration principles, the application of the movie items is taken as an example.
The GMF network architecture is shown in fig. 3. The GMF network may predict user preferences through analysis of the embedded features of the user and the item and train network parameters through comparison of the predicted preferences and the actual preferences to obtain an embedded vector containing the user and item related features. The embedded vectors are user characteristics and project characteristics trained through the GMF network, and comprise crossing information of the user and the project, which is also called crossing characteristics.
The input of the GMF network is the embedded characteristics of the user and the project obtained through sampling and the real preference of the user to the project, and the output is the predicted preference of the user to the project. The training method is to update the gradient by comparing the prediction preference with the real preference through the cross loss function, and finally obtain the embedded vector containing the relevant characteristics of the user and the project.
Illustratively, the training procedure of the GMF network is:
acquiring real user preference from the history record;
obtaining a plurality of user embedded features and a plurality of item embedded features through sampling;
in the GMF layer, each user embedded feature is subjected to feature intersection with a corresponding item embedded feature to obtain a plurality of intersection features;
fitting the plurality of cross features by randomly selecting and closing off portions of neurons at a discard layer;
and in the prediction layer, after the user prediction preference is obtained according to one cross characteristic each time, updating a prediction function according to the difference between the user prediction preference and the user real preference.
The GMF network may obtain the most direct representation of the user's movie preferences from the history, i.e. represent the user preferences through an implicit feedback matrix, with the following formula:
wherein U and I represent user set and project set, each element in the matrixThe preference of user u for item i is indicated, 1 being consistent with the user preference, 0 being non-consistent with the user preference.
The rows and columns in the matrix represent initialized user features and project features, respectively.
The specific formula of the GMF network for feature crossing is as follows:
g u,i =d(e u ⊙e i )
where d represents the parameters of the discard layer, which reduces the influence of overfitting on the algorithm effect by randomly selecting and closing a part of neurons, e u And e i Representing the embedded features of the user and the embedded features of the project respectively, and obtaining the crossed features g of the user and the project through the operation of the element product u,i
Illustratively, the obtaining the user prediction preference according to a cross feature specifically includes:
constructing a prediction function by adopting a full connection layer mode;
and obtaining user prediction preference according to the prediction function.
The prediction layer of the GMF network usually adopts a fully-connected layer mode, and is assumed to be g= [ g ] 1 ,g 2 ,…,g M ]And p= [ p ] 1 ,p 2 ,…,p N ]Respectively representing the input and the output, the specific formula of the full connection layer is as follows:
p=σ(W p g+b p )
wherein σ represents a sigmoid activation function, W p Representing a connection weight matrix, b p Representing the deviation term.
Illustratively, the updating the primary prediction function according to the difference between the predicted preference of the user and the actual preference of the user specifically includes:
updating the gradient of the loss function by comparing the difference between the user predicted preference and the user actual preference;
and updating the parameters of the prediction function and the discarding layer according to the loss function.
And obtaining the prediction of the user on the project preference through the full connection layer, obtaining the loss function through comparison with the real user preference, and realizing the training of the GMF model through gradient updating of the loss function. The specific formula of the loss function is as follows:
l n =-w n (t n logp n +(1-t n )log(1-p n ))
where n.epsilon.N represents the batch size, l n Represents cross entropy loss, w n Representing loss function weights, p n And t n Representing the predicted and actual values, respectively.
Combining the user embedded features trained through the GMF network with the project embedded features to form user-project cooperative features, and storing the user-project cooperative features and the user-project cooperative features into a temporary file, wherein the temporary file is applied to the state embedding of a follow-up recommended agent training part (actor network).
The SAC algorithm was at the earliest a deep reinforcement learning algorithm for robot skill learning. The main elements in the algorithm include states, actions, rewards, policies, etc.
The application uses the SAC algorithm of the alpha function (comprising the self-updating temperature control factor alpha) in the recommendation agent, and the main flow of the recommendation agent for generating recommendation and learning user preference is as follows:
(1) Splicing the user embedded feature vector and the project embedded feature vector trained by the GMF network to obtain a current state, and selecting a current action by a strategy function in the actor network;
(2) Converting the current action into a recommendation list, and obtaining a current reward through interaction with the user environment;
(3) Obtaining a target state according to the historical rewards and the current rewards, and storing the current state, the current action, the current rewards and the target state into a priority experience playback technical pool in a four-element mode;
(4) Sampling the quadruple from the preferential experience playback technical pool, and obtaining a Q value (action value) through a Q function (action value function) in the commentator network;
(5) The Q function and the alpha function in the policy network reviewer network in the actor network are updated.
Illustratively, the obtaining the current action according to the current state and the policy function specifically includes:
confirming a set of selectable items in the current state according to the current state;
confirming the value of the shielding vector according to the recommended record of each item;
and acquiring the current action according to the selectable item set in the current state, the shielding vector and the strategy function.
The specific formula for obtaining the current action is as follows:
wherein A is u,t Representing a set of selectable items in the current state, a u,t Representing the current policy function pi (a|s u,t-1 ) Mask vectorAction of selecting current recommended content for user u from current state, when t>Epsilon is a u,t Representing that the action selects the content with the highest probability in the optional actions, and when t is less than or equal to epsilon, a u,t Representing randomly selected content from the current recommendation list. For mask vector->When the value is 0, it indicates that the item i has been recommended to the user u, and when the value is 1, it indicates that the item i has not been recommended to the user u. In addition, t-1 appearing in the formula represents the current value of each variable in the present training, for example, the value of training 1 in training 2 will be used as the current value.
Illustratively, the converting the current action into the item recommendation list specifically includes:
acquiring recommendation probability corresponding to each item;
according to the recommendation probability, all the items are arranged in a descending order;
a preset number of items arranged in a descending order in front are made into an item recommendation list.
In the present embodiment, the conversion criterion is a recommendation probability for each item, for example, in top10 recommendation, the current action obtained by the agent is a recommendation probability for each item in the movie item list, and the recommendation list is an item generation list with a recommendation probability of 10.
Illustratively, after the obtaining the current action and converting the current action into the item recommendation list, the method further comprises:
obtaining a current reward through real-time interaction with a user environment;
obtaining a historical reward through historical interactions with the user's environment;
obtaining a target state according to the historical rewards and the current rewards, and storing the current state, the current action, the current rewards and the target state into a priority experience playback technical pool in a four-tuple mode;
sampling a quadruple from the preferential experience playback technology pool, inputting the sampling result into the interview network, and obtaining an action value item from an action value function in the interview network;
and updating the strategy function according to the action value item and an entropy updating item obtained through the alpha function.
Policy function pi in actor network:
in the formula (i) the formula (ii),indicating that the temperature control factor is self-updated>(alpha function) adjusted entropy term, Q ω (s u,t ,a′ u,t ) A 'representing a bonus item evaluated using an action cost function of a critique section' u,t Representing all possible actions predicted by the current policy function.
Illustratively, after updating the policy function, the method further includes:
obtaining a strategy function evaluation item according to the action cost function and the strategy function;
updating the action cost function and the alpha function according to the strategy function evaluation item and the real action cost item; the real action value item is obtained according to the current rewards and a target state value, the target state value comprises a target action value item of an attenuation factor and an entropy item obtained through the alpha function, and the target action value item is obtained through evaluation of the target state through the strategy function.
Current Q function in the reviewer network:
the formula contains two calculation terms related to the Q value. Wherein Q is ω (s u,t ,a u,t ) The prediction action value item is expressed, and the main function of the prediction action value item is to directly participate in evaluation operation of part of strategy functions of an actor so as to realize guidance of optimizing and updating the strategy network.The representation is based on the prize r(s) u,t ,a u,t ) And a true action value term for the target state value, the specific formula of the term is as follows:
target state price in formulaThe value is represented as a target action value term including an attenuation factor gamma and a self-updating temperature control factorIs a term of entropy of (c). The parameter updating method of the target action cost function is a flexible updating method based on a smoothing factor tau, and the specific updating formula is as follows:
the self-updating temperature control factor α in the reviewer network is denoted as α network:
in the formulaIs the minimum entropy constant, i.e. the opposite number of motion space dimensions.
In the above embodiments, the user environment simulation technique and the preferential experience playback technique are also used. Referring to fig. 4, in simulating the user environment, the above embodiment uses the offline data set to simulate the user behavior, and determines whether the user likes a certain item according to the specific scoring value of the user for the item in the history record. Whereas the above-described embodiments use binary tree based data structures that can store experience priorities for historical experience storage and extraction, this section is not an embodiment implementation focus and is not developed in detail.
Compared with the prior art, the interactive project recommendation method provided by the embodiment of the application has the advantages that the recommendation process is completed by the GMF network, the commentator network, the actor network, the simulated user environment and the priority experience playback pool. Training cross features between the user and the project through the GMF network, so that the utilization rate of similar information between the user and the project by the recommendation agent is enhanced; the commentator network and the actor network are continuously updated through the alpha function containing the self-updating temperature control factor alpha, so that the influence of the partial difference on the recommendation result is reduced.
Because the GMF network learns the users with the preferences changing along with time, a new user embedded feature update GMF network is obtained to adapt to the change of the users; when the critics network and the actor network are updated, the current rewards obtained by real-time interaction with the user are used as update basis, and new critics network and actor network are obtained to adapt to the change of the user.
A second aspect of an embodiment of the present application provides an interactive item recommendation apparatus, including: the system comprises a random acquisition module, a vector acquisition module, a function confirmation module, a state acquisition module and an item recommendation module.
And the random acquisition module is used for acquiring the user embedded features and the project embedded features from the randomly extracted sample batch.
And the vector acquisition module is used for enabling the user embedded feature and the project embedded feature to acquire a collaborative feature vector which is obtained by splicing the user embedded feature vector and the project embedded feature vector through a trained GMF network.
And the function confirmation module is used for confirming the strategy function in the actor network according to the action cost function obtained through the commentator network and the entropy item regulated by the self-updating temperature control factor.
And the state acquisition module is used for acquiring the current state according to the cooperative feature vector.
And the project recommending module is used for obtaining the current action according to the current state and the strategy function and converting the current action into a project recommending list.
It will be clear to those skilled in the art that for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method embodiments for the specific working procedure of the above-described system, which is not further described herein.
Compared with the prior art, the interactive project recommending device provided by the embodiment of the application has the advantages that the recommending process is completed by the GMF network, the commentator network, the actor network, the simulated user environment and the priority experience playback pool. Training cross features between the user and the project through the GMF network, so that the utilization rate of similar information between the user and the project by the recommendation agent is enhanced; the commentator network and the actor network are continuously updated through the alpha function containing the self-updating temperature control factor alpha, so that the influence of the partial difference on the recommendation result is reduced.
Because the GMF network learns the users with the preferences changing along with time, a new user embedded feature update GMF network is obtained to adapt to the change of the users; when the critics network and the actor network are updated, the current rewards obtained by real-time interaction with the user are used as update basis, and new critics network and actor network are obtained to adapt to the change of the user.
An embodiment of the application provides a computer device. The computer device of this embodiment includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the method embodiments described above when the computer program is executed.
The computer device can be a smart phone, a tablet computer, a desktop computer, a cloud server and other computing devices. The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the figures are merely examples of computer devices and are not limiting of computer devices, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input and output devices, network access devices, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), it may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may in some embodiments be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory may in other embodiments also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, etc., such as program code for the computer program, etc. The memory may also be used to temporarily store data that has been output or is to be output.
In addition, the embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the steps in any of the above-mentioned method embodiments.
An embodiment of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements an interactive item recommendation method as described above.
Embodiments of the present application provide a computer program product which, when run on a computer device, causes the computer device to perform the steps of the method embodiments described above.
In several embodiments provided by the present application, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the application, such changes and modifications are also intended to be within the scope of the application.

Claims (10)

1. An interactive item recommendation method, comprising:
acquiring user embedded features and project embedded features from randomly extracted sample batches;
the user embedded feature and the project embedded feature are enabled to pass through a trained GMF network, and a collaborative feature vector obtained by splicing the user embedded feature vector and the project embedded feature vector is obtained;
confirming a strategy function in an actor network according to an action cost function obtained through a commentator network and an entropy item regulated by a self-updating temperature control factor;
obtaining a current state according to the cooperative feature vector;
and obtaining a current action according to the current state and the strategy function, and converting the current action into a project recommendation list.
2. The interactive item recommendation method of claim 1, wherein the training process of the GMF network is:
acquiring real user preference from the history record;
obtaining a plurality of user embedded features and a plurality of item embedded features through sampling;
in the GMF layer, each user embedded feature is subjected to feature intersection with a corresponding item embedded feature to obtain a plurality of intersection features;
fitting the plurality of cross features by randomly selecting and closing off portions of neurons at a discard layer;
and in the prediction layer, after the user prediction preference is obtained according to one cross characteristic each time, updating a prediction function according to the difference between the user prediction preference and the user real preference.
3. The interactive method of item recommendation according to claim 2, wherein said obtaining user prediction preferences based on a cross feature comprises:
constructing a prediction function by adopting a full connection layer mode;
and obtaining user prediction preference according to the prediction function.
4. The interactive item recommendation method according to claim 2, wherein said updating a primary prediction function based on a difference between said user predicted preference and said user actual preference, comprises:
updating the gradient of the loss function by comparing the difference between the user predicted preference and the user actual preference;
and updating the parameters of the prediction function and the discarding layer according to the loss function.
5. The interactive item recommendation method of claim 1, wherein said obtaining a current action based on said current state and said policy function comprises:
confirming a set of selectable items in the current state according to the current state;
confirming the value of the shielding vector according to the recommended record of each item;
and acquiring the current action according to the selectable item set in the current state, the shielding vector and the strategy function.
6. The interactive method of item recommendation according to claim 1, wherein said converting said current action into a list of item recommendations, in particular comprises:
acquiring recommendation probability corresponding to each item;
according to the recommendation probability, all the items are arranged in a descending order;
a preset number of items arranged in a descending order in front are made into an item recommendation list.
7. The interactive item recommendation method of claim 1, wherein after obtaining a current action and converting the current action into an item recommendation list, further comprising:
obtaining a current reward through real-time interaction with a user environment;
obtaining a historical reward through historical interactions with the user's environment;
obtaining a target state according to the historical rewards and the current rewards, and storing the current state, the current action, the current rewards and the target state into a priority experience playback technical pool in a four-tuple mode;
sampling a quadruple from the preferential experience playback technology pool, inputting the sampling result into the interview network, and obtaining an action value item from an action value function in the interview network;
and updating the strategy function according to the action value item and an entropy updating item obtained through the alpha function.
8. The interactive item recommendation method of claim 7, wherein after said updating said policy function, further comprising:
obtaining a strategy function evaluation item according to the action cost function and the strategy function;
updating the action cost function and the alpha function according to the strategy function evaluation item and the real action cost item; the real action value item is obtained according to the current rewards and a target state value, the target state value comprises a target action value item of an attenuation factor and an entropy item obtained through the alpha function, and the target action value item is obtained through evaluation of the target state through the strategy function.
9. An interactive item recommendation device, comprising:
the random acquisition module is used for acquiring user embedded features and project embedded features from randomly extracted sample batches;
the vector acquisition module is used for enabling the user embedded feature and the project embedded feature to acquire a collaborative feature vector which is obtained by splicing the user embedded feature vector and the project embedded feature vector through a trained GMF network;
the function confirmation module is used for confirming a strategy function in the actor network according to the action cost function obtained through the commentator network and the entropy item regulated by the self-updating temperature control factor;
the state acquisition module is used for acquiring the current state according to the cooperative feature vector;
and the project recommending module is used for obtaining the current action according to the current state and the strategy function and converting the current action into a project recommending list.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the interactive item recommendation method according to any one of claims 1 to 8.
CN202310966515.6A 2023-08-02 2023-08-02 Interactive project recommendation method and device and computer readable storage medium Pending CN117056595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310966515.6A CN117056595A (en) 2023-08-02 2023-08-02 Interactive project recommendation method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310966515.6A CN117056595A (en) 2023-08-02 2023-08-02 Interactive project recommendation method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117056595A true CN117056595A (en) 2023-11-14

Family

ID=88668447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310966515.6A Pending CN117056595A (en) 2023-08-02 2023-08-02 Interactive project recommendation method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117056595A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779415A (en) * 2021-10-22 2021-12-10 平安科技(深圳)有限公司 Training method, device and equipment of news recommendation model and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779415A (en) * 2021-10-22 2021-12-10 平安科技(深圳)有限公司 Training method, device and equipment of news recommendation model and storage medium

Similar Documents

Publication Publication Date Title
US20210256403A1 (en) Recommendation method and apparatus
Chen et al. Deep reinforcement learning in recommender systems: A survey and new perspectives
CN111966914B (en) Content recommendation method and device based on artificial intelligence and computer equipment
US20230153857A1 (en) Recommendation model training method, recommendation method, apparatus, and computer-readable medium
CN111090756B (en) Artificial intelligence-based multi-target recommendation model training method and device
CN113254792B (en) Method for training recommendation probability prediction model, recommendation probability prediction method and device
CN110717099B (en) Method and terminal for recommending film
US20230162005A1 (en) Neural network distillation method and apparatus
WO2022166115A1 (en) Recommendation system with adaptive thresholds for neighborhood selection
CN109903103B (en) Method and device for recommending articles
CN116664719B (en) Image redrawing model training method, image redrawing method and device
CN116010684A (en) Article recommendation method, device and storage medium
CN111931054B (en) Sequence recommendation method and system based on improved residual error structure
CN117056595A (en) Interactive project recommendation method and device and computer readable storage medium
WO2022166125A1 (en) Recommendation system with adaptive weighted baysian personalized ranking loss
US20240037133A1 (en) Method and apparatus for recommending cold start object, computer device, and storage medium
WO2024051707A1 (en) Recommendation model training method and apparatus, and resource recommendation method and apparatus
CN110489435B (en) Data processing method and device based on artificial intelligence and electronic equipment
CN113449176A (en) Recommendation method and device based on knowledge graph
CN115905872A (en) Model training method, information recommendation method, device, equipment and medium
CN114493674A (en) Advertisement click rate prediction model and method
CN111897943A (en) Session record searching method and device, electronic equipment and storage medium
KR102612986B1 (en) Online recomending system, method and apparatus for updating recommender based on meta-leaining
CN117786234B (en) Multimode resource recommendation method based on two-stage comparison learning
CN111931058B (en) Sequence recommendation method and system based on self-adaptive network depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination