CN111414539B

CN111414539B - Recommendation system neural network training method and device based on feature enhancement

Info

Publication number: CN111414539B
Application number: CN202010197501.9A
Authority: CN
Inventors: 施韶韵; 张敏; 郝斌; 李大任; 张瑞; 于新星; 单厚智; 刘奕群; 马少平
Original assignee: Tsinghua University; Zhizhe Sihai Beijing Technology Co Ltd
Current assignee: Tsinghua University; Zhizhe Sihai Beijing Technology Co Ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2023-09-01
Anticipated expiration: 2040-03-19
Also published as: CN111414539A

Abstract

The disclosure relates to a recommendation system neural network training method and device based on feature enhancement, wherein the method comprises the following steps: inputting a plurality of first samples in a first training set into a neural network to be trained in a t-th round for processing to obtain predictive values corresponding to the plurality of first samples; according to the characteristic information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples, the attention degree of the neural network to each attribute is respectively determined; respectively determining the enhancement probability of each attribute according to the attention threshold and the attention of the neural network to each attribute; determining the feature information to be updated from the feature information of a plurality of first samples according to the first enhancement rate and the enhancement probability; updating a first sample in the first training set according to the feature information to be updated and the noise feature value to obtain an updated second training set; and according to the second training set, training the neural network in the t-th round. Embodiments of the present disclosure may improve the robustness of a neural network.

Description

Recommendation system neural network training method and device based on feature enhancement

Technical Field

The disclosure relates to the field of machine learning, in particular to a recommendation system neural network training method and device based on feature enhancement.

Background

Deep learning is one type of machine learning, and data is mainly analyzed and modeled by using a deep neural network to discover rules between input features and predicted targets. Deep learning achieves significant effects in many areas, such as computer vision, computational linguistics, information retrieval, and the like.

The design of deep neural networks typically focuses on network architecture, feature representation, etc., and during the training process, deep neural networks are prone to overfitting, resulting in overreliance on some feature and omission of other features. For example, if the characteristics of sex, age, weight, height, three-dimensional, shoulder width and the like are taken as inputs when the obesity degree of a person is predicted by the deep neural network, the deep neural network is likely to be over-fitted after long training without constraint, and only the two characteristics of sex and weight are concerned, but the other relatively indirect characteristics are under-utilized. Moreover, during use of the deep neural network, some features may be noisy, e.g., body weight may be inaccurate, and excessive reliance on these noisy features by the deep neural network may result in poor accuracy of the predicted outcome.

Disclosure of Invention

In view of this, the disclosure provides a recommendation system neural network training method and device based on feature enhancement.

According to an aspect of the present disclosure, there is provided a recommendation system neural network training method based on feature enhancement, the method comprising:

inputting a plurality of first samples in a preset first training set into a neural network to be trained in a t-th round for processing to obtain predictive values corresponding to the plurality of first samples, wherein t is a positive integer, and the first samples comprise characteristic information representing user attributes and characteristic information representing object attributes of an object to be recommended;

according to the characteristic information of the plurality of first samples and the prediction values corresponding to the plurality of first samples, the attention degree of the neural network to each attribute is respectively determined;

respectively determining the enhancement probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute;

determining feature information to be updated from the feature information of the plurality of first samples according to a first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute;

updating a first sample in the first training set according to the feature information to be updated and a preset noise feature value to obtain an updated second training set;

According to the second training set, the neural network is trained for the t-th round,

the neural network is applied to a recommendation system and used for predicting the score of a user on an object to be recommended in the recommendation system.

In one possible implementation manner, determining the degree of attention of the neural network to each attribute according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples includes:

for any first sample in a first training set, respectively determining first contribution values of each piece of characteristic information of the first sample to a predictive value according to the characteristic information of the first sample and the predictive value corresponding to the first sample;

for any one attribute of a plurality of attributes, determining a second contribution value of the characteristic information corresponding to the attribute from the first contribution values of the characteristic information of each first sample;

and determining the average value of the second contribution value as the attention of the neural network to the attribute.

In one possible implementation manner, according to a preset first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute, determining the feature information to be updated from the feature information of the plurality of first samples includes:

Determining the enhancement quantity of the characteristic information of the plurality of first samples according to a preset first enhancement rate of the characteristic information of the plurality of first samples;

randomly selecting a plurality of second samples from a plurality of first samples of the first training set, wherein the number of the second samples is the same as the enhancement number;

and for any second sample, randomly selecting one attribute from a plurality of attributes according to the enhanced probability of each attribute, and determining the characteristic information corresponding to the randomly selected attribute in the second sample as the characteristic information to be updated.

In one possible implementation, the method further includes:

determining a second enhancement rate of the characteristic information of the first samples during the t-th training according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset enhancement rate change value of each round;

and determining a first enhancement rate of the characteristic information of the first samples in the t-th training according to the maximum enhancement rate and the second enhancement rate.

In one possible implementation manner, according to a preset attention threshold and the attention of the neural network to each attribute, determining the enhancement probability of each attribute respectively includes:

For any attribute, determining the attention degree as the enhanced probability of the attribute under the condition that the attention degree of the neural network to the attribute is smaller than a preset attention degree threshold value.

In one possible implementation manner, according to a preset attention threshold and the attention of the neural network to each attribute, the enhancement probability of each attribute is determined respectively, and the method further includes:

for any attribute, determining the product of the attention degree and a preset adjustment proportion as the enhancement probability of the attribute under the condition that the attention degree of the neural network to the attribute is larger than or equal to a preset attention degree threshold value.

In one possible implementation manner, the neural network includes an input layer, an N-level middle layer, and an output layer, where the input layer inputs characteristic information of each first sample, the output layer outputs a prediction value corresponding to each first sample, the N-level middle layer outputs N-level middle characteristic information in a processing procedure, and N is a positive integer,

according to the characteristic information of the first sample and the predictive value corresponding to the first sample, determining a first contribution value of each characteristic information of the first sample to the predictive value respectively includes:

According to the predictive value corresponding to the first sample, determining the contribution value of each N-th intermediate characteristic information to the predictive value;

according to the contribution value of each N-th intermediate characteristic information to the predictive value, the N-th intermediate characteristic information and the N-1 th intermediate characteristic information, determining the contribution value of each N-1 th intermediate characteristic information to the predictive value;

according to the contribution value of each ith intermediate characteristic information to the predictive value, the ith intermediate characteristic information and the ith-1 intermediate characteristic information, determining the contribution value of each ith-1 intermediate characteristic information to the predictive value, wherein i is an integer and is more than or equal to 2 and less than or equal to N;

and respectively determining a first contribution value of each piece of characteristic information of the first sample to the predictive value according to the contribution value of each piece of level 1 intermediate characteristic information to the predictive value, the level 1 intermediate characteristic information and the characteristic information of the first sample.

In a possible implementation, the feature information of the plurality of first samples in the first training set is represented by a feature matrix, each row of the feature matrix representing one first sample, and each column of the feature matrix representing one attribute.

According to another aspect of the present disclosure, there is provided a recommendation system neural network training apparatus based on feature enhancement, the apparatus comprising:

the prediction value determining module is used for inputting a plurality of first samples in a preset first training set into a neural network to be trained in a t-th round for processing to obtain prediction values corresponding to the plurality of first samples, wherein t is a positive integer, and the first samples comprise characteristic information representing user attributes and characteristic information representing object attributes of objects to be recommended;

the attention degree determining module is used for determining attention degrees of the neural network to all the attributes according to the characteristic information of the first samples and the predictive values corresponding to the first samples;

the enhancement probability determining module is used for determining enhancement probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute;

the to-be-updated feature determining module is used for determining feature information to be updated from the feature information of the plurality of first samples according to a first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute;

The training set updating module is used for updating a first sample in the first training set according to the feature information to be updated and a preset noise feature value to obtain an updated second training set;

the training module is used for training the neural network for the t-th round according to the second training set,

In one possible implementation, the apparatus further includes:

the first enhancement rate determining module is used for determining second enhancement rates of the characteristic information of the first samples during the t-th training according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset enhancement rate change value of each round;

and the second enhancement rate determining module is used for determining the first enhancement rate of the characteristic information of the first samples in the t-th training according to the maximum enhancement rate and the second enhancement rate.

According to the embodiment of the disclosure, when the method is applied to the neural network training of a recommendation system, the enhancement probability of each attribute can be determined according to the attention degree of the neural network to be trained of the current training round, the training set which is used by the current training round and is subjected to feature enhancement is determined according to the enhancement probability of each attribute and the preset first enhancement rate, and the training set is used for training the neural network, so that the attention degree of the neural network to different attributes can be optimized in the training process of the neural network, all feature information can be comprehensively utilized in the prediction process of the neural network, overfitting or overdependence on part of feature information is avoided, and meanwhile, when part of feature information is noisy, the neural network can fully utilize other feature information to predict, and the robustness of the neural network is improved, and the accuracy of the neural network prediction is improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 illustrates a flow chart of a feature-enhancement-based recommendation system neural network training method, according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of an application scenario of a feature-enhancement-based recommendation system neural network training method, according to an embodiment of the present disclosure.

FIG. 3 illustrates a block diagram of a feature-enhancement-based recommendation system neural network training device, in accordance with an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

The Neural network training method of the recommendation system based on feature enhancement according to the embodiments of the present disclosure may be applied to a processor, which may be a general-purpose processor, such as a CPU (Central Processing Unit ), or an artificial Intelligence Processor (IPU) for performing an artificial intelligence operation, such as a GPU (Graphics Processing Unit, a graphics processing unit), an NPU (Neural-Network Processing Unit, a Neural network processing unit), a DSP (Digital Signal Process, a digital signal processing unit), or the like. The present disclosure is not limited by the specific type of processor.

The feature enhancement described in the embodiments of the present disclosure may refer to randomly concealing part of feature information in the initial feature information by setting an invalid value, a noise value, or the like. That is, more noise/invalid features are included in the feature information after the feature enhancement than the initial feature information. Accordingly, the score corresponding to the enhanced sample is more difficult to predict than the initial sample. The neural network is trained according to the training set with the enhanced characteristics, so that the robustness of the neural network can be improved.

In one possible implementation, the recommendation system neural network may be a neural network applied to a recommendation system for predicting a user's score for an object to be recommended in the recommendation system. The recommendation system may include various recommendation systems, such as a movie work recommendation system, a commodity recommendation system, a literature work recommendation system, a shared knowledge recommendation system in a knowledge sharing platform, and the like. The objects to be recommended in the recommendation system may also include various kinds, such as movie works, commodities, literary works, shared knowledge, multimedia materials, documents, and the like. The present disclosure does not limit the specific application scenario of the recommendation system and the specific content of the object to be recommended.

FIG. 1 illustrates a flow chart of a feature-enhancement-based recommendation system neural network training method, according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:

step S11, inputting a plurality of first samples in a preset first training set into a neural network to be trained in a t-th round for processing to obtain predictive values corresponding to the plurality of first samples, wherein t is a positive integer, and the first samples comprise characteristic information representing user attributes and characteristic information representing object attributes of an object to be recommended;

Step S12, according to the characteristic information of the plurality of first samples and the predictive values corresponding to the plurality of first samples, the attention degree of the neural network to each attribute is respectively determined;

step S13, respectively determining the enhancement probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute;

step S14, determining the feature information to be updated from the feature information of the plurality of first samples according to a first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute;

step S15, updating a first sample in the first training set according to the feature information to be updated and a preset noise feature value to obtain an updated second training set;

and S16, training the neural network for the t-th round according to the second training set.

In one possible implementation, the first training set may be determined prior to training the neural network. The first training set may include a plurality of first samples and reference scores corresponding to the plurality of first samples. Wherein each first sample may include feature information representing a user attribute and feature information representing an object attribute of the object to be recommended.

In one possible implementation, the user attributes may include user identification, age, gender, occupation, city, etc. The objects to be recommended are different, and the object attributes thereof may be different. For example, where the object to be recommended is a movie work, its object attributes may include movie identification, name, director, starring actor, year of day of the show, region of transmission, multiple movie categories (e.g., science fiction, love, war), etc.; when the object to be recommended is a commodity, the object attribute can comprise commodity identification, name, production date, manufacturer, commodity price and the like; when the object to be recommended is a literary work, the object attribute of the object to be recommended can comprise the identification, name, author, keyword and the like of the literary work; when the object to be recommended is shared knowledge, the object attribute of the object to be recommended can comprise identification, name, keyword, access amount and the like of the shared knowledge; when the object to be recommended is multimedia data, the object attribute of the object to be recommended can comprise the identification, name, format, keyword, size and the like of the multimedia data; when the object to be recommended is a document, the object attribute of the object may include identification, name, format, keyword, size, etc. of the document. The user identifier (i.e., user ID, identity) may be used to uniquely identify the user, and the identifier of the object to be recommended may be used to uniquely identify the object to be recommended.

In one possible implementation, the object attribute of the object to be recommended may be determined according to a specific application scenario of the neural network. The application scene is different, the object to be recommended may be different, and the object attribute may be different. It should be understood that, those skilled in the art may set the user attribute and the specific content of the object attribute of the object to be recommended according to the specific application scenario of the neural network, which is not limited in this disclosure.

In one possible implementation manner, after the first training set is determined, in step S11, a plurality of first samples in the preset first training set may be input into a neural network to be trained in the t-th round for processing, and scores of the users to be recommended in each first sample are predicted respectively, so as to obtain predicted scores corresponding to the plurality of first samples, where t is a positive integer. Wherein, the t-th round represents the current training round of the neural network.

In one possible implementation manner, in step S12, the degree of attention of the neural network to each attribute may be determined according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples. The degree of interest of the neural network on each attribute can be used to represent the correlation between the predictive value output by the neural network and each attribute. The sum of the attention of the neural network to all the attributes is equal to 1, or the difference between the sum and 1 is within the error range.

In one possible implementation, the degree of interest of the neural network for each attribute may be determined by inter-Layer correlation propagation (Layer-wise Relevance Propagation) based on the feature information of the plurality of first samples and the prediction values corresponding to the plurality of first samples. The degree of interest of the neural network in each attribute may also be determined by other means, which is not limited by the present disclosure.

In one possible implementation manner, after determining the attention degree of the neural network to each attribute, in step S13, the enhancement probability of each attribute may be determined according to the preset attention degree threshold and the attention degree of the neural network to each attribute. The value range of the attention threshold is more than 0 and less than 1.

In one possible implementation, the attention threshold may be different for different training rounds, that is, the attention threshold may vary according to the variation of the current training round t. For example, the attention threshold may increase with increasing t. The attention threshold during the t-th training may be preset by a person skilled in the art according to the actual situation, which is not limited in this disclosure.

In one possible implementation, the higher the neural network's attention to a property, the more important that property is represented. When the enhancement probability of each attribute is determined, the attention degree of each attribute can be classified according to a preset attention degree threshold, then the attention degree of each classified attribute is adjusted according to training requirements, and the attention degree of each attribute after adjustment is determined as the enhancement probability of each attribute in the t-th training.

In one possible implementation manner, in step S14, the feature information to be updated may be determined from the feature information of the plurality of first samples according to a first enhancement rate of the feature information of the preset plurality of first samples and enhancement probabilities of respective attributes. The value range of the first enhancement rate is more than 0 and less than 1.

In one possible implementation, the preset first enhancement rate of the feature information of the first samples may represent an enhancement ratio of the feature information of the first samples. The first enhancement rate may be different from training round to training round, that is, the first enhancement rate may vary according to the variation of the current training round t, e.g., the first enhancement rate may increase with an increase in t. The first enhancement rate in the training of the t-th wheel can be preset by a person skilled in the art according to the actual situation, and the present disclosure is not limited thereto.

In one possible implementation manner, the enhancement quantity of the feature information of the plurality of first samples may be determined according to a preset first enhancement rate of the feature information of the plurality of first samples, and then the feature information to be updated may be randomly determined from the feature information of the plurality of first samples according to the enhancement quantity and the enhancement probability of each attribute.

In one possible implementation manner, after determining the feature information to be updated, in step S15, the first sample in the first training set may be updated according to the feature information to be updated and the preset noise feature value, so as to obtain the updated second training set.

In one possible implementation, the preset noise characteristic value may be represented as a specific number. For example, the noise characteristic value U may be represented by the numeral 0. Those skilled in the art may preset specific values of the noise characteristic value according to actual situations, which is not limited in this disclosure.

In one possible implementation, when updating the first sample in the first training set, the feature information to be updated may be replaced with a preset noise feature value, so as to obtain an updated second training set. The second training set is a feature enhanced training set compared to the initial first training set.

In one possible implementation, after the second training set is obtained, in step S16, the neural network may be trained in the t-th round according to the second training set. And inputting the samples in the second training set into the neural network for processing to obtain predicted values corresponding to the samples in the second training set, and adjusting parameters of the neural network according to errors between the predicted values and corresponding reference values to obtain the neural network with the t-th training completed.

In one possible implementation, the first enhancement rate and the attention threshold may be adjusted as the training cycle increases, for example, as the training cycle increases, until a preset maximum value is reached. In this way, the number of the feature information to be updated gradually increases to the maximum value, and the importance degree of the attribute corresponding to the feature information to be updated gradually increases, so that the strength of feature enhancement of the second training set can be gradually improved along with the increase of training rounds, and the stability of the neural network training process is further improved.

In one possible implementation, the second training set of each training round is updated from the initial first training set without accumulating updates on the second training set of the previous round. In this way, the second training sets of each training round can be independent of each other, and the diversity of training samples is increased.

In one possible implementation, training may be ended when the neural network meets a preset training end condition, resulting in a trained neural network. The preset training ending condition may be set according to actual situations, for example, the training ending condition may be that the effect of the neural network on the verification set is reduced by continuously preset rounds (for example, continuously 5 rounds); the training ending condition can also be that the loss function of the neural network is reduced to a certain degree or is converged within a certain threshold value; the training end condition may also be other conditions. The present disclosure is not limited to the specific content of the training end condition.

In one possible implementation, the trained neural network may be applied to a recommendation system for predicting a user's score for an object to be recommended in the recommendation system. The user attribute and the object attribute of the object to be recommended can be determined according to the specific application scene of the recommendation system, and input data corresponding to the user attribute and the object attribute are determined; then inputting the input data into a trained neural network for processing, and predicting the score of the user to the object to be recommended; according to the scores predicted by the neural network, the recommendation system can determine a preset number of recommended objects from the objects to be recommended and recommend the recommended objects to the user.

In one possible implementation, the method may further include: determining a second enhancement rate of the characteristic information of the first samples during the t-th training according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset enhancement rate change value of each round; and determining a first enhancement rate of the characteristic information of the first samples in the t-th training according to the maximum enhancement rate and the second enhancement rate.

Wherein, the value range of the preset initial enhancement rate is more than or equal to 0 and less than 1; the value range of the preset maximum enhancement rate is more than 0 and less than or equal to 1; the preset value range of the change value of the enhancement rate of each round is more than 0 and less than 1. The specific values of the initial enhancement rate, the maximum enhancement rate, and the change value of the enhancement rate per round can be set by those skilled in the art according to the actual situation, which is not limited in this disclosure.

In one possible implementation manner, the second enhancement rate of the characteristic information of the plurality of first samples during the t-th training may be determined according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset change value of the enhancement rate per round; then judging the relation between the second enhancement rate and the maximum enhancement rate, and determining the second enhancement rate as the first enhancement rate of the characteristic information of a plurality of first samples in the t-th training under the condition that the second enhancement rate is smaller than or equal to the maximum enhancement rate; in the case where the second enhancement rate is greater than the maximum enhancement rate, the maximum enhancement rate is determined as the first enhancement rate of the feature information of the plurality of first samples at the time of the t-th round of training.

In one possible implementation, the first enhancement rate s of the feature information of the plurality of first samples during the t-th training may be determined by the following formula (1) _t ：

s _t ＝min(s,s ₀ +Δ·t) (1)

In the above formula (1), s represents a preset maximum enhancement rate and s ε (0, 1)]，s ₀ Representing a preset initial enhancement rate and s ₀ E [0, 1), delta represents a preset per-round enhancement rate change value.

In this embodiment, the second enhancement rate of the feature information of the plurality of first samples during the t-th training may be determined according to the initial enhancement rate, the maximum enhancement rate, and the change value of each enhancement rate, and the minimum value of the maximum enhancement rate and the second enhancement rate may be determined as the first enhancement rate of the feature information of the plurality of first samples during the t-th training, so that the first enhancement rate may be gradually increased from the initial enhancement rate to the maximum enhancement rate and then remain unchanged as the training round increases.

In one possible implementation, the feature information of the plurality of first samples in the first training set may be represented by a feature matrix, each row of the feature matrix representing one first sample, and each column of the feature matrix representing one attribute.

For example, the first training set includes n first samples, each first sample includes m pieces of feature information corresponding to m preset attributes, and the feature information of the first samples in the first training set may be represented as a feature matrix d= { D _u,v } _n×m Wherein the ith row of the feature matrix D represents the ith first sample, the ith column of the feature matrix D represents the ith attribute, and element D of the feature matrix D _u,v And the characteristic information corresponding to the v-th attribute in the first sample of the u is represented, n, m, u, v is a positive integer, and u is more than or equal to 1 and less than or equal to n, and v is more than or equal to 1 and less than or equal to m.

In this embodiment, the plurality of first samples in the first training set are represented as feature matrices, so that the neural network processing is facilitated, and the processing efficiency of the neural network can be improved.

In one possible implementation, step S12 may include:

In one possible implementation, when determining the degree of interest of the neural network in each attribute, a first contribution value of each feature information of the first sample to the predictive score may be first determined. For any first sample in the first training set, according to the characteristic information of the first sample and the predictive value corresponding to the first sample, the first contribution value of each characteristic information of the first sample to the predictive value of the first sample can be respectively determined through interlayer correlation propagation.

For example, for any first sample in the first training set, assuming that the first sample includes m pieces of feature information, according to the m pieces of feature information of the first sample and the prediction scores corresponding to the first sample, through inter-layer correlation propagation, the first contribution value of each piece of feature information of the first sample to the prediction scores thereof can be determined respectively, that is, each piece of feature information in the first sample corresponds to one first contribution value, and the first sample includes m pieces of feature information, and m pieces of first contribution values can be determined.

In one possible implementation manner, after determining the first contribution value, for any attribute of the plurality of attributes, determining a second contribution value of feature information corresponding to the attribute from the first contribution values of the feature information of each first sample, averaging the determined second contribution values, and determining the average value as the attention of the neural network to the attribute.

For example, the first training set includes n first samples, and for the v-th attribute, the first contribution value of the feature information corresponding to the v-th attribute may be selected from the first contribution values of the n first samples, and determined as the second contribution value, where the second contribution value is n in total; and then taking an average value of the n second contribution values, and determining the average value as the attention of the neural network to the v-th attribute. I.e., the degree of interest of the neural network in the v-th attributeWherein (1)>A first contribution value of the v-th characteristic information representing the u-th first sample to its predictive value.

In this embodiment, first contribution values of the feature information of each first sample to the predicted score of the feature information may be determined first, then, for any attribute, second contribution values of feature information corresponding to the attribute may be determined from the first contribution values, and an average value of the second contribution values may be determined as a degree of attention of the neural network to the attribute, so that accuracy of the degree of attention may be improved.

In one possible implementation manner, the neural network may include an input layer, an N-level middle layer, and an output layer, where the input layer inputs characteristic information of each first sample, the output layer outputs a prediction value corresponding to each first sample, the N-level middle layer outputs N-level middle characteristic information in a processing procedure, and N is a positive integer,

In one possible implementation, the neural network may include an input layer, an N-level middle layer, and an output layer, the input layer inputting the feature information of each first sample, the output layer outputting the prediction scores corresponding to each first sample, the N-level middle layer outputting the N-level middle feature information in the processing procedure, respectively.

In one possible implementation manner, for any first sample, when determining the first contribution value of each piece of characteristic information of the first sample to the predicted value of the first sample, starting from the predicted value output by the output layer, sequentially determining the contribution value of each piece of intermediate characteristic information output by each layer to the predicted value through interlayer correlation propagation according to the hierarchical structure of the neural network from layer to layer until determining the first contribution value of each piece of characteristic information of the input first sample to the predicted value.

In one possible implementation, the contribution value of each nth level intermediate feature information to the predictive score may be determined according to the predictive score corresponding to the first sample. For example, assuming that the nth intermediate feature information is a predicted value for E class labels (where E is a positive integer), from among the E nth intermediate feature information, the contribution value of the nth intermediate feature information corresponding to the correct class label to the predicted value is determined to be 1, and the contribution value of the other nth intermediate feature information to the predicted value is determined to be 0, according to the predicted value corresponding to the first sample.

Then, according to the contribution value of each N-th intermediate characteristic information to the predictive value, the N-th intermediate characteristic information and the N-1 th intermediate characteristic information, the contribution value of each N-1 th intermediate characteristic information to the predictive value can be respectively determined through the correlation propagation between the N-1 th intermediate layer and the N-th intermediate layer.

In one possible implementation manner, according to the contribution value of each i-th intermediate characteristic information to the predicted value, the i-th intermediate characteristic information and the i-1-th intermediate characteristic information, determining the contribution value of each i-1-th intermediate characteristic information to the predicted value through correlation propagation between the i-1-th intermediate layer and the i-1-th intermediate layer, wherein i is an integer and is more than or equal to 2 and less than or equal to N;

for example, the input of the neural network is any first sample, the ith intermediate layer of the neural network is a fully-connected layer, and the ith intermediate layer outputs the ith intermediate feature information of the qCan be determined by the following formula (2):

in the above-mentioned formula (2),the p-th i-1 th intermediate characteristic information representing the i-1 th intermediate layer output,weights representing full connection layer, +.>Indicating the bias of the fully connected layer, relu (x) =max (0, x) is a nonlinear activation function, and p and q are both positive integers.

According to the back propagation of the full connection layer, the kth i-1 th level intermediate feature informationContribution to predictive scoreCan be determined by the following formula (3):

in the above formula (3), k is a positive integer,weights representing full connection layer, +.>Representing the contribution value of the intermediate feature information of the qth ith level to the predictive value, ++>Epsilon is a parameter in inter-layer correlation propagation, and epsilon>0, sign (Z) is a sign function, sign (Z) = 1 when Z is not less than 0, otherwise sign (Z) = -1.

In one possible implementation manner, the first contribution value of each piece of characteristic information of the first sample to the prediction value may be determined according to the contribution value of each piece of level 1 intermediate characteristic information to the prediction value, the level 1 intermediate characteristic information and the characteristic information of the first sample through inter-layer correlation propagation.

In this embodiment, starting from the predicted value corresponding to the first sample, according to the hierarchical structure of the neural network, the first contribution value of each piece of characteristic information of the first sample to the predicted value is determined forward layer by layer through inter-layer correlation propagation, so that the accuracy of the first contribution value can be improved.

In one possible implementation, step S13 may include: for any attribute, determining the attention degree as the enhanced probability of the attribute under the condition that the attention degree of the neural network to the attribute is smaller than a preset attention degree threshold value.

In one possible implementation, the preset attention threshold may be determined according to an increasing function of the current training round t for automatically controlling the enhanced probability of the characteristic information according to the current training round. The increment function may be set by those skilled in the art as desired, and this disclosure is not limited in this regard.

In one possible implementation manner, for any attribute, the relationship between the attention of the neural network to the attribute and a preset attention threshold can be judged; in the event that the degree of interest of the neural network in the attribute is less than the degree of interest threshold, the degree of interest of the neural network in the attribute may be determined as an enhanced probability of the attribute.

In one possible implementation, step S13 may further include: for any attribute, determining the product of the attention degree and a preset adjustment proportion as the enhancement probability of the attribute under the condition that the attention degree of the neural network to the attribute is larger than or equal to a preset attention degree threshold value.

The preset adjustment ratio is set to a value range greater than 0 and less than 1, for example, the preset adjustment ratio is set to a value of 0.1. Those skilled in the art may set the specific values of the adjustment ratio according to the training requirements, which is not limited by the present disclosure.

In one possible implementation, for any attribute, in the case that the degree of interest of the neural network for the attribute is greater than or equal to a preset degree of interest threshold, the attribute may be considered to be of higher importance, and for maintaining the conventional training of the neural network, the probability of enhancement of the attribute may be reduced. The product of the attention of the neural network to the attribute and the preset adjustment proportion can be determined as the enhanced probability of the attribute.

In one possible implementation, the enhanced probability P of the v-th attribute may be determined by the following equation (4) _v ：

In the above formula (4), F _v Represents the degree of interest of the neural network in the v-th attribute, delta represents a preset adjustment ratio (0<δ<1) Sigma (t) is an increasing function about t, max { F ₁ ,…,F _m The degree of interest F of the neural network on m attributes ₁ ,…,F _m Maximum value of (a) max { F } (t) ₁ ,…,F _m And the attention threshold value at the time of the t-th training is represented.

In one possible implementation, after determining the enhanced probability of each attribute, it may be normalized. The normalized enhancement probability of an attribute can be determined by the following equation (5):

in the above formula (5), P' _v Normalized enhanced probability representing the v-th attribute, P _j Representing the enhanced probability of the jth attribute before normalization processing, wherein j is more than or equal to 1 and less than or equal to m, and sigma _j P _j Representing the sum of the enhanced probabilities of all the attributes before normalization.

In this embodiment, when the degree of interest of the neural network on the attribute is greater than or equal to a preset degree of interest threshold, the product of the degree of interest and a preset adjustment ratio is determined as the enhancement probability of the attribute, and the enhancement probability of the attribute with higher importance can be reduced in the initial stage (for example, the previous rounds) of training, so as to maintain conventional training, that is, in the initial stage of training, ensure the utilization of important features by the neural network.

In one possible implementation, step S14 may include:

In one possible implementation, the number of enhancement of the feature information of the plurality of first samples may be determined according to a preset first enhancement rate of the feature information of the plurality of first samples. For example, in the training of the t-th round, the first enhancement rate of the characteristic information of the preset first samples is s _t The total number of the feature information of the plurality of first samples in the first training set is n×m, then the number of enhancement of the feature information of the plurality of first samples=n×m×s _t 。

In one possible implementation, a plurality of second samples may be randomly selected from the plurality of first samples of the first training set, the number of second samples being the same as the enhanced number of characteristic information of the plurality of first samples, wherein the randomly selected plurality of second samples may be repeated. For example, the enhancement amount of the feature information of the plurality of first samples is n×m×s _t The number of randomly selected second samples is the same as the enhancement number, also n×m×s _t 。

In one possible implementation manner, for any second sample, according to the enhanced probability of each attribute, one attribute is randomly selected from a plurality of attributes, and feature information corresponding to the randomly selected attribute in the second sample is determined as feature information to be updated.

In one possible implementation, the enhanced number of characteristic information of the plurality of first samples may be greater than the number of the plurality of first samples. In this case, there may be repeated feature information to be updated. When the repeated characteristic information to be updated exists, the characteristic information to be updated can be selected from a plurality of first samples again according to the repeated number until all the characteristic information to be updated is not repeated. For example, the enhancement number of the feature information of the first samples is 100, and 5 pieces of feature information to be updated and other pieces of feature information to be updated are selected from 100 pieces of feature information to be updated, and 5 pieces of feature information to be updated, which are not repeated, need to be reselected from the first samples, so that none of the 100 pieces of feature information to be updated is repeated.

The determination of the characteristic information to be updated is illustrated below. Assuming that the first enhancement rate of the characteristic information of the preset first samples is s in the t-th training _t The first training set is represented as a feature matrix d= { D _u,v } _n×m The enhancement number g=n×m×s of the feature information of the plurality of first samples may be first determined _t The method comprises the steps of carrying out a first treatment on the surface of the Then randomly selecting G rows in the feature matrix D to determine G second samples G _a A is a positive integer and a is more than or equal to 1 and less than or equal to G; for any second sample g _a One attribute c (wherein, c is not less than 1 and not more than m, and the probability of the c-th attribute being selected is the enhanced probability P 'of the c-th attribute) can be randomly selected from a plurality of attributes according to the enhanced probability of each attribute' _c ) Obtaining a second sample g _a Corresponding row-column pair (g _a C) a step of; using the same method, row-column pairs corresponding to G second samples can be obtained; judging whether repeated line pairs exist in the G line pairs, and if so, reselecting the line pairs according to the repeated number until none of the G line pairs is repeated; and then determining the characteristic information of the G rows and columns corresponding positions as the characteristic information to be updated.

Assuming that the preset noise eigenvalue u=0, 0 may be used to replace G rows and columns in the eigenvector D for the corresponding positionsThe characteristic information and other characteristic information are kept unchanged, and an updated characteristic matrix is obtainedD ^G Represented as an updated second training set. Can be based on the feature matrix D ^G And training the neural network in the t-th round.

In this embodiment, the number of enhancements can be determined according to the first enhancement rates of the feature information of the plurality of first samples, and according to the number of enhancements, a plurality of second samples are randomly selected from the plurality of first samples, and then the feature information to be updated in the plurality of second samples is determined according to the enhancement probabilities of the respective attributes, so that the determined feature information to be updated can satisfy the first enhancement rates and the enhancement probabilities of the respective attributes during the t-th training, and the accuracy of the feature information to be updated is improved.

Fig. 2 shows a schematic diagram of an application scenario of a feature-enhancement-based recommendation system neural network training method, according to an embodiment of the present disclosure. As shown in fig. 2, an initial first training set may be first determined in step S201, where the first training set may include a plurality of first samples, and the first training set may be represented as a feature matrix D, then in step S202, a current training round t is determined, and in step S203, the plurality of first samples in the first training set are input into a neural network to be trained in the t-th round for processing, to obtain prediction values corresponding to the plurality of first samples;

thereafter, in step S204, the degree of interest of the neural network on each attribute may be determined according to the feature information of the plurality of first samples and the prediction values corresponding to the plurality of first samples, and F may be used _v Representing the attention of the neural network to the v-th attribute; in step S205, a attention threshold value at the time of the t-th training is determined, for example, the attention threshold value at the time of the t-th training is σ (t) ·max { F ₁ ,…,F _m }；

Then in step S206, it may be respectively determined whether the attention degree of the neural network to each attribute is greater than or equal to an attention degree threshold; the degree of interest of the neural network in the attribute is greater than or equal to the threshold of degree of interest In the case of the value, step S207 is performed, the enhanced probability of the attribute=the preset adjustment ratio×the degree of interest of the neural network for the attribute, for example, the degree of interest F of the neural network for the v-th attribute _v In the case of being greater than or equal to the attention threshold, the enhanced probability P of the v-th attribute _v ＝δ·F _v Wherein δ represents an adjustment ratio; otherwise, step S208 is performed, where the enhanced probability of the attribute=the degree of interest of the neural network for the attribute, e.g., the degree of interest F of the neural network for the v-th attribute _v In the case of being smaller than the attention threshold, the enhanced probability P of the v-th attribute _v ＝F _v ；

Step S209 is then executed, and the enhancement probabilities of the attributes determined in the steps S207 and S208 are normalized, so that normalized enhancement probabilities of the attributes are obtained;

after the step S202 is performed, in step S210, a first enhancement rate S of the feature information of the first samples during the t-th training is determined _t And in step S211, determining the enhancement amount of the feature information of the plurality of first samples according to the first enhancement rate determined in step S210;

after step S211 and step S209 are performed, in step S212, the feature information to be updated may be determined according to the number of enhancements of the feature information of the plurality of first samples determined in step S211 and the probability of enhancement of each attribute determined in step S209, and the first samples in the first training set may be updated according to the feature information to be updated and the preset noise feature value to obtain an updated second training set, where the feature matrix is represented as D ^G ；

After the second training set is determined in step S212, step S213 may be performed to perform a t-th training on the neural network according to the second training set; after the training of the t-th round is completed, in step S214, it may be determined whether the neural network meets a preset training end condition, when the neural network does not meet the training end condition, step S215 is executed, the training round is added with 1, i.e., t=t+1, and then step S202 is executed to perform the next round of training; when the neural network meets the training ending condition, training can be ended, and the trained neural network is obtained.

The recommendation system neural network training method based on feature enhancement is described below with reference to specific examples.

The neural network is assumed to be applied to a film and television work recommendation system and used for predicting the score of a user on the film and television work, the prediction result is expressed as a score, and the value range of the score is 1-5; the user attributes are 4, including user identification, age, gender and occupation, the object to be recommended is film and television work, and the object attributes are 21, including film identification, year and 19 film categories (science fiction, love, war, etc.).

The total number of samples is 10 ten thousand, and the characteristic information of each sample comprises characteristic information representing the attribute of a user and characteristic information representing the attribute of an object of the film and television work.

In the training of the t-th round, specific values of preset variables are shown in the following table 1:

TABLE 1

All samples can be first represented as a feature matrix D ₁ Feature matrix D ₁ With 1X 10 ⁵ Row, 25 columns; then, respectively inputting each sample into a neural network to be trained in the t-th round for processing to obtain a predictive value corresponding to each sample; according to the prediction value corresponding to each sample and the characteristic information of each sample, determining the attention degree of the neural network to each attribute through interlayer correlation propagation, wherein the attention degree of the neural network to the v' th attribute can be expressed as F _v′ Wherein v' =1, …,25;

can be according to sigma (t) =1.1/(1+e) ^3-t ) Delta=0.1 and the degree of interest of the neural network on each attribute, and the probability of enhancement of each attribute is determined by the above formula (4). Enhanced probability P of the v' th attribute _v′ The method comprises the following steps:

and carrying out normalization processing by the formula (5) to obtain normalized enhancement probability of each attribute, wherein the normalized enhancement probability of the v 'th attribute can be expressed as P' _v′ ；

Then, the maximum enhancement rate s=0.2 and the initial enhancement rate s can be used ₀ By the above formula (1), the first enhancement rate s of the feature information of all samples during the t-th training is determined _t ：

s _t ＝min(s,s ₀ +Δ·t)＝min(0.2,0.1+0.005t)；

The number of enhancements G of the feature information of all samples can then be determined:

G＝n×m×s _t ＝10 ⁵ ×25×min(0.2,0.1+0.005t)；

from the feature matrix D according to the enhancement quantity G ₁ G rows are randomly selected, and for any one of the selected G rows, a column can be randomly selected according to the enhanced probability of each attribute, wherein the probability of the v 'th column being selected is P' _v′ The method comprises the steps of carrying out a first treatment on the surface of the The characteristic information of the corresponding positions of the G rows and columns is the characteristic information to be updated; replacing feature matrix D with noise feature value u=0 ₁ To obtain updated feature matrixWith the initial feature matrix D ₁ Feature matrix->Is the feature matrix after feature enhancement.

All samples can be divided into training, testing and validation sets in a ratio of 8:1:1. Accordingly, the feature matrix can be scaled in a ratio of 8:1:1Division into feature matrices corresponding to training sets +.>Feature matrix corresponding to test set->And a test matrix corresponding to the verification set +.>

Feature matrices corresponding to training sets may be usedAnd training the neural network in the t-th round. After training of the t-th round, the feature matrix corresponding to the test set can be used +.>Evaluating the effect of a neural network using a test matrix corresponding to a validation set>Judging whether the neural network meets a preset training ending condition, for example, the training ending condition can be that the effect of the neural network on the verification set is reduced for 5 continuous rounds; when the neural network does not meet the training ending condition, performing the next training; and ending training when the neural network meets the training ending condition, so as to obtain the trained neural network.

In one possible implementation, RMSE (Root Mean Square Error ) may be used in evaluating the effects of a neural network. The smaller the RMSE value, the better the effect of the neural network. It was verified that the neural network using the above training method has a smaller RMSE value than the neural network not using the above training method, and it is considered that the neural network using the above training method has a better effect and the neural network has a higher accuracy of the prediction result.

According to the embodiment of the disclosure, when the method is applied to the neural network training of a recommendation system, the attention threshold and the first enhancement rate can be adjusted according to the current training round, so that the feature information corresponding to the attribute with lower attention can be enhanced in the initial stage (such as the first several rounds) of the neural network training, the feature information corresponding to the attribute with higher attention is gradually enhanced along with the enhancement of the training round, the neural network is promoted to learn the noisy condition of part of the feature information, the overfitting or overdependence of the neural network on part of the feature information can be avoided, the robustness of the neural network is improved, and the accuracy of the neural network prediction is also improved.

It should be noted that, although the recommendation system neural network training method based on feature enhancement is described above by taking the above embodiments as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.

FIG. 3 illustrates a block diagram of a feature-enhancement-based recommendation system neural network training device, in accordance with an embodiment of the present disclosure. As shown in fig. 3, the apparatus includes:

the prediction score determining module 31 is configured to input a plurality of first samples in a preset first training set into a neural network to be trained in a t-th round for processing, to obtain prediction scores corresponding to the plurality of first samples, where t is a positive integer, and the first samples include feature information indicating a user attribute and feature information indicating an object attribute of an object to be recommended;

a degree of interest determining module 32, configured to determine the degree of interest of the neural network on each attribute according to the feature information of the plurality of first samples and the prediction values corresponding to the plurality of first samples;

the enhancement probability determining module 33 is configured to determine enhancement probabilities of the respective attributes according to a preset attention threshold and an attention of the neural network to the respective attributes;

The to-be-updated feature determining module 34 is configured to determine feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and enhancement probabilities of the respective attributes;

a training set updating module 35, configured to update a first sample in the first training set according to the feature information to be updated and a preset noise feature value, so as to obtain an updated second training set;

a training module 36, configured to perform a t-th training on the neural network according to the second training set,

In one possible implementation, the apparatus further includes:

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A recommendation system neural network training method based on feature enhancement, the method comprising:

determining feature information to be updated from the feature information of the plurality of first samples according to a first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute; the first enhancement rate is used for representing enhancement proportions of characteristic information of the plurality of first samples;

the neural network is applied to a recommendation system and used for predicting the score of a user on an object to be recommended in the recommendation system; the attention threshold and the first enhancement rate are adjusted according to the value of t;

according to the characteristic information of the first samples and the prediction values corresponding to the first samples, determining the attention degree of the neural network to each attribute respectively, wherein the attention degree comprises the following steps:

2. The method according to claim 1, wherein determining the feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and enhancement probabilities of the respective attributes includes:

3. The method according to claim 1, wherein the method further comprises:

4. The method of claim 1, wherein determining the probability of enhancement for each attribute according to a predetermined attention threshold and the attention of the neural network to each attribute, respectively, comprises:

5. The method of claim 4, wherein determining the probability of enhancement for each attribute according to a predetermined attention threshold and the attention of the neural network to each attribute, respectively, further comprises:

6. The method of claim 1, wherein the neural network comprises an input layer, N-level intermediate layers, and an output layer, the input layer inputting the characteristic information of each first sample, the output layer outputting the predictive value corresponding to each first sample, the N-level intermediate layers outputting N-level intermediate characteristic information during processing, respectively, N being a positive integer,

7. The method of claim 1, wherein the characteristic information of the plurality of first samples in the first training set is represented by a characteristic matrix, each row of the characteristic matrix representing one first sample, and each column of the characteristic matrix representing one attribute.

8. A recommendation system neural network training device based on feature enhancement, the device comprising:

the to-be-updated feature determining module is used for determining feature information to be updated from the feature information of the plurality of first samples according to a first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute; the first enhancement rate is used for representing enhancement proportions of characteristic information of the plurality of first samples;

the attention degree determining module is further configured to:

9. The apparatus of claim 8, wherein the apparatus further comprises: