CN111651558A

CN111651558A - Hyperspherical surface cooperative measurement recommendation device and method based on pre-training semantic model

Info

Publication number: CN111651558A
Application number: CN202010389075.9A
Authority: CN
Inventors: 郑海涛; 汪杨; 刘昊; 肖喜; 沈颖; 周岚
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2020-09-11
Anticipated expiration: 2040-05-09
Also published as: CN111651558B

Abstract

The hypersphere cooperative measurement recommendation device and method based on a pre-training semantic model comprise a pre-training implicit module and a cooperative measurement recommendation module, wherein the pre-training implicit module comprises an article text information encoder and a decoder, and the cooperative measurement recommendation module comprises a hypersphere mapping module and a fusion loss function module; obtaining, by an encoder and a decoder, a text information vectorized representation of an article; the hypersphere mapping module maps initialized positive and negative user and object hidden vectors into the same high-dimensional hypersphere manifold in an angle measurement mode, and the fusion loss function module trains the user and object hidden vectors after hypersphere mapping and optimizes the intra-class inter-class distance of the positive and negative user object sample pairs; and obtaining a text vector of the article and a corresponding user article hidden vector during prediction, and obtaining an article recommendation result of the user by calculating the cosine distance between the text vector and the corresponding user article hidden vector.

Description

Hyperspherical surface cooperative measurement recommendation device and method based on pre-training semantic model

Technical Field

The invention relates to computer application, in particular to a hypersphere (or called super-dimensional sphere) cooperative measurement recommendation device and method based on a pre-training semantic model.

Background

The recommendation algorithm belongs to the interdisciplinary discipline of artificial intelligence and computer technology, and aims to predict articles liked by a user and give recommendations by analyzing historical behavior data of the user and modeling the recommendation algorithm through the computer algorithm. The progress of the algorithm of the recommendation system improves the efficiency and experience of processing massive network information when people face the situation of 'information overload', and for a content generator, the efficient recommendation system can help content to spread on a platform to reach a target user more quickly and accurately, and the content spreading efficiency and quality are improved. Commonly used websites in the fields of shopping, movies, social contact and the like, such as ***, Taobao, Baidu and bean cotyledon, can not be supported by the recommendation system.

Under the background of rapid development of the internet era, as an individual user, the user is in an information explosion environment every day. Users need to face massive information to select, and in this context, the importance of recommendation systems to users and platforms is increasing. A recommendation system with excellent performance can help a user browse goods or information which is more suitable for the user in a limited time. Generally speaking, recommendation systems characterize the user's image and model the features of the item based on the user's basic information and his past browsing and interaction records (rating, likes, etc.). Whether the grade of one article reflects the preference degree of a user, and meanwhile, the basic information of the article can be used as a component of the article side characteristics to complement the article characteristics, so that the recommendation effect is improved.

In recent years, with the rapid development of the natural language processing field (NLP), the importance of modeling for text preprocessing has gradually become prominent, and a trend is gradually toward modeling text information by using a semantic model and completing downstream tasks (classification, translation, recommendation, and the like) according to the result of preprocessing, thereby achieving a good effect. In the recommendation system, the scoring information of the articles by the user can be used for modeling through a traditional collaborative filtering model or a deep learning technology. The name, attribute, category and other text information of the article can be preprocessed by the intelligent semantic technology and then used as the input of the recommendation system, so that the recommendation effect is improved.

The recommendation system based on collaborative metric learning achieves good effect recently, and is a competitive recommendation technology based on collaborative filtering, and the implicit feedback of users and articles is mainly focused. The model maps the hidden vectors of the user and the article into a high-dimensional geometric measurement space, and learns the direct relation between the user and the article through a pair-by-pair triple loss function. The main contribution of the collaborative metric learning based on Euclidean distance is that it captures the relationship of user items through Euclidean distance, and optimizes the model by minimizing the Euclidean distance of those positive feedback user item sample pairs.

However, there are also many problems with current learning based on euclidean distance metrics. The existing collaborative metric recommendation method uses Euclidean distance as a similarity measurement index of hidden vectors of a user and articles, maps the article vectors of the user into a high-dimensional geometric space, and is constrained by a triangle inequality, so that the situation that one user likes a plurality of articles at the same time becomes difficult to train, and the hidden vectors of the articles tend to be concentrated in the same area. Meanwhile, hidden vectors of articles liked by the same user are gathered in a certain area due to a geometric constraint relationship, which can increase the difficulty of model training and reduce the recommendation performance. Finally, the Euclidean metric learning only considers the distance relation between the hidden vectors of the user articles as an index for measuring the correlation degree of the user articles, and neglects important spatial attribute information of directionality.

On the other hand, in a recommendation starting phase, such as a recall phase, a problem of how to perform recommendation in a case where the accumulation of interaction behavior data of the user is insufficient is referred to as a cold start problem. While many advanced methods are currently proposed to improve the performance of the recommendation model, the problem of item-based cold start is always present. How to improve the cold start of a recommendation system by utilizing a depth semantic technology based on prior information of existing articles and users is a direction worthy of research and exploration.

Disclosure of Invention

The present invention is directed to overcome at least one of the above technical defects, and provides a hypersphere cooperative metric recommendation apparatus and method based on a pre-training semantic model.

In order to achieve the purpose, the invention adopts the following technical scheme:

a hypersphere cooperative measurement recommendation device based on a pre-training semantic model comprises a pre-training implicit module and a cooperative measurement recommendation module, wherein the pre-training implicit module comprises an article text information encoder and a decoder, and the cooperative measurement recommendation module comprises a hypersphere mapping module and a fusion loss function module;

the text information of the article is firstly subjected to an encoder and a decoder of the pre-training latent semantic module to obtain the text information vectorization representation of the article, so that the text information vectorization representation is used for training and predicting the collaborative metric recommendation module and cold starting of the article; the hypersphere mapping module is used for mapping initialized positive and negative user and object hidden vectors into the same high-dimensional hypersphere manifold in an angle measurement mode, and the fusion loss function module is used for training the user and object hidden vectors after hypersphere mapping and optimizing the intra-class inter-class distance of the positive and negative user object sample pairs;

inputting article text information into the pre-training latent semantic module to obtain an article text information feature vector, and adding the article text information feature vector into the article latent vector in the collaborative metric recommendation model to perform the training; in the prediction stage, a text vector of an article and a corresponding user article hidden vector are obtained, and an article recommendation result of a user is obtained by calculating the cosine distance between the text vector and the corresponding user article hidden vector.

Further:

the system also comprises a positive and negative random sampling module which is used for randomly sampling the training samples and then providing the training samples to the cooperative measurement recommending module.

The pre-training latent semantic module is a BERT pre-training model, and semantic vector expression of the object text is obtained through the pre-training latent semantic module.

The article encoder comprises a position embedding module and a plurality of layers of coding layers, wherein the position embedding module is used for embedding words in input and recording position information of each word in a text, the plurality of layers of coding layers comprise a multi-head attention module, a feedforward neural network module and a standardization module, the multi-head attention module is used for capturing the relation between words and words among input text vectors and obtaining the relation between words in sentences, the feedforward neural network is used for improving the generalization of the module and increasing the depth of the neural network, and the standardization module is used for regularizing to ensure the consistency of output.

The item decoder has a self-attention module for calculating a relationship between currently parsed and already parsed text; and the output vector decoded by the article decoder is used as a final vectorization representation of the text semantic features.

The feedforward neural network is a residual error network, and the residual error network and the result of the upper layer calculation are added to keep the memory; and finally, the article decoder outputs the probability of softmax and calculates the matching probability of the source text and the target text.

The hypersphere mapping module maps the hidden vectors of the user and the article to the hypersphere of the hypersphere manifold, so that the modular lengths of the hidden vectors of the user and the article are consistent, and the influence caused by the modular length of the vector is counteracted.

The fusion loss function module comprises a ternary loss function module and a logistic loss function module, wherein the ternary loss function module is used for increasing the inter-class distance of the positive and negative training sample pairs, and the logistic loss function module is used for reducing the intra-class distance between the object and the user in the positive sample pairs; and training the model by forming a hybrid loss function based on hypersphere metric learning through the ternary loss function module and the logistic loss function module, so that the model aggregates positive sample objects of the user and simultaneously distinguishes the positive sample objects and the negative sample objects.

The fusion loss function module also adds an adjustment factor to control the weight ratio of the ternary loss function and the logistic loss function.

Considering the collaborative metrics recommendation task as a two-classification task, the positive sample pair is optimized by the logistic regression loss function so that the included angle between the user and the positive feedback article will tend to 0 in order to model and aggregate the positive samples.

A hypersphere cooperative measurement recommendation method based on a pre-training semantic model is used for recommendation.

A method for training the hypersphere cooperative metric recommendation device comprises the following steps: obtaining vectorization representation of article text information by using the pre-training latent semantic module, and fusing the vectorization representation with the article latent vector in the collaborative measurement recommendation module; sampling positive and negative samples of a user and an article, and mapping the positive and negative samples to the same high-dimensional hypersphere manifold hypersphere; training positive and negative sample articles and positive sample articles simultaneously by fusing loss functions, and optimizing the inter-class distance of the positive and negative samples and the intra-class distance of the user and the articles in the positive sample until convergence.

The invention has the following beneficial effects:

the invention provides a hypersphere cooperative measurement recommendation device and method based on a pre-training semantic model, which are used for performing cooperative filtering recommendation based on hypersphere measurement and can help users and content producers to effectively improve recommendation experience and effect. The method effectively solves the problem of collaborative metric recommendation in a pair-by-pair training scene, and builds the hidden vectors of the user and the articles for the text information of the data and the existing attributes of the articles through the history of the user to give recommendations to the user. The article text information features are obtained through the pre-training semantic model and are fused into the article hidden vector, so that the problem of article cold start at the recommendation starting stage is solved. By using the angle measurement mode, the hidden vectors of the user and the articles are mapped to the same high-dimensional manifold hypersphere, so that the problem of geometric constraint brought by the traditional cooperative measurement recommendation method is eliminated or relieved. And in the aspect of model optimization, the intra-class inter-class distance of the positive and negative user article sample pairs is optimized simultaneously through a fusion loss function module. Compared with the traditional collaborative metric recommendation, the method and the device effectively improve the recommendation effect, can complete the end-to-end recommendation process, and obtain better recommendation performance than the traditional recommendation scheme.

Drawings

Fig. 1 is a flowchart of a hypersphere cooperative metric recommendation method based on a pre-training semantic model according to an embodiment of the present invention.

FIG. 2 is a block diagram of a text encoding module for an article according to an embodiment of the present invention.

FIG. 3 is a block diagram of an article text decoding module according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of euclidean distance and angular distance measurements.

FIG. 5 shows the general case of margin without the hypersphere constraint.

Detailed Description

The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.

Referring to fig. 1 to 3, an embodiment of the present invention provides a hypersphere cooperative metric recommendation apparatus based on a pre-training semantic model, including a pre-training implicit module and a cooperative metric recommendation module, where the pre-training implicit module includes an article text information encoder and a decoder, and the cooperative metric recommendation module includes a hypersphere mapping module and a fusion loss function module;

inputting article text information into the pre-training latent semantic module to obtain an article text information feature vector, and adding the article text information feature vector into the article latent vector in the collaborative metric recommendation model to perform the training; in the prediction stage, a text vector of an article and a corresponding user article hidden vector are obtained, and an article recommendation result of a user, typically an article recommendation list, is obtained by calculating the cosine distance between the text vector and the corresponding user article hidden vector, so that recommendation is completed.

In a preferred embodiment, the system further comprises a positive and negative random sampling module, which is used for randomly sampling the training samples and then providing the training samples to the cooperative metric recommendation module.

In a preferred embodiment, the pre-training latent semantic module is implemented based on a BERT pre-training model, from which a semanticized vector representation of the article text is obtained.

The embodiment of the invention also provides a hypersphere cooperative measurement recommendation method based on the pre-training semantic model, and the hypersphere cooperative measurement recommendation device is used for recommendation.

The embodiment of the invention also provides a method for training the hypersphere cooperative metric recommendation device, which comprises the following steps: obtaining vectorization representation of article text information by using the pre-training latent semantic module, and fusing the vectorization representation with the article latent vector in the collaborative measurement recommendation module; sampling positive and negative samples of a user and an article, and mapping the positive and negative samples to the same high-dimensional hypersphere manifold hypersphere; training positive and negative sample articles and positive sample articles simultaneously by fusing loss functions, and optimizing the inter-class distance of the positive and negative samples and the intra-class distance of the user and the articles in the positive sample until convergence.

The hypersphere collaborative metric recommendation device and method based on the pre-training semantic model, provided by the invention, can be used for collaborative filtering recommendation based on hypersphere metric, and can help users and content producers to effectively improve recommendation experience and effect. The method effectively solves the problem of collaborative metric recommendation in a pair-by-pair training scene, and builds the hidden vectors of the user and the articles for the text information of the data and the existing attributes of the articles through the history of the user to give recommendations to the user. The article text information features are obtained through the pre-training semantic model and are fused into the article hidden vector, so that the problem of article cold start at the recommendation starting stage is solved. By using the angle measurement mode, the hidden vectors of the user and the articles are mapped to the same high-dimensional manifold hypersphere, so that the problem of geometric constraint brought by the traditional cooperative measurement recommendation method is eliminated or relieved. And in the aspect of model optimization, the intra-class inter-class distance of the positive and negative user article sample pairs is optimized simultaneously through a fusion loss function module. Compared with the traditional collaborative metric recommendation, the method and the device effectively improve the recommendation effect, can complete the end-to-end recommendation process, and obtain better recommendation performance than the traditional recommendation scheme.

As shown in fig. 1, the hypersphere cooperative metric recommendation apparatus according to the embodiment of the present invention mainly includes a latent semantic module and a cooperative metric recommendation module. The latent semantic module is composed of a BERT pre-training model and mainly comprises two components, a Transformer encoder and a decoder; the collaborative metric recommendation module is divided into two components: a hypersphere manifold mapping component and a fusion penalty function component.

The connection relation of each module is shown in fig. 1, the text information of the article firstly passes through an encoder and a decoder part of a pre-training semantic module to obtain the text information vectorization representation of the article, the vectorization representation is used as a part of an article hidden vector in a subsequent recommending module, the part participates in the training and prediction of a recommending model, and meanwhile, the cold start of the article is assisted. And then, randomly sampling the training samples by the positive and negative sampling modules to increase the generalization of the method. In the hypersphere mapping module, the hidden vectors of the user and the object are mapped to the high-dimensional hypersphere manifold at the same time, so that the problem of geometric constraint in collaborative measurement recommendation is solved, and the representability of the vectors of the user and the object is improved. The fusion loss function part trains the hidden vectors of the user objects after the hypersphere mapping, wherein a ternary loss function is responsible for increasing the inter-class distance of a positive training sample pair and a negative training sample pair, and a logistic loss function is responsible for reducing the intra-class distance between the objects and the users in the positive training sample pair. And further adding an adjusting factor to control the weight ratio of the ternary loss function and the logistic loss function and control the intra-class distance and the inter-class distance of the positive and negative samples to be within a reasonable range.

The specific implementation process of the embodiment of the invention can comprise two stages: a training phase and a prediction phase. In the training stage, the existing article text information is input into a coding and decoding module in a BERT pre-training model to obtain an article text information characteristic vector, and the article text information characteristic vector is added into an article hidden vector in a collaborative measurement recommendation model for training. In the prediction stage, firstly, a text vector of an article and a corresponding user article hidden vector are obtained, and an article recommendation list of a user is obtained by calculating the cosine distance between the text vector and the corresponding user article hidden vector, so that recommendation is completed.

An article text encoding module:

as shown in fig. 2, an article coding module. And in an article text coding module, inputting the text information of the article into a pre-trained BERT semantic model to obtain the semantic vector expression of the article text.

The BERT model (Bidirectional Encoder retrieval from Transformers), the core component of which is the transform model. The Transformer carries out redesign aiming at the weakness of the traditional recurrent neural network RNN, and solves the efficiency problem of the RNN, the defects in transmission and the like.

The article coding module firstly embeds the input words, because the input text of the coding module is input all at once, the position embedding module helps to record the position information of each word in the text. Then, a multi-layer coding layer is accessed, and the multi-layer coding layer comprises a multi-head attention module, a feedforward neural network module and a normalization module. The multi-head attention is responsible for capturing the word-word relation between input text vectors to obtain the word-word relation inside sentences, the generalization of the modules is improved through the feedforward neural network, meanwhile, the depth of the neural network is increased, the deep semantic relation of texts can be obtained, and finally, regularization is carried out, so that the consistency of output is ensured.

The feedforward neural network of the coding module is a residual error network, and the addition is carried out on the result of the calculation of the upper layer, so that the memory is kept to a certain degree.

The article text information is encoded by the encoding module and then input into a decoding module in the BERT pre-training model for decoding. The decoded output vector is used as a final vectorized representation of the semantic features of the text.

An article text decoding module:

fig. 3 shows a decoding module for the article text. In the decoding module, the coding module at the upper layer is used as the input of the decoding module, and the data is firstly sent to the next calculation module through a multi-head self-attention model by weighting. Unlike the encoding module, the attention calculation mechanism for the input of the encoding model to the decoding module is added in the decoding module.

Wherein the self-attention module is responsible for calculating the relationship between the currently parsed and already parsed text, and the encode-decode attention module is responsible for the relationship between the currently parsed text and the text vector input by the encoding module.

The decoding module and the coding module have similar structures, and only one more softmax probability is output. And (4) performing probability output of softmax through a linear layer, and calculating the matching probability of the source text and the target text.

For the vector acquisition of the article text, only an intermediate product in the training process, namely the vectorized expression of the text information, is needed.

The hypersphere manifold maps the module:

before the hypersphere manifold mapping module, the text hidden vectors of the user and the article are initialized. And adding a vector obtained by the article text through a BERT pre-training model into the article hidden vector to participate in the training and prediction of the recommendation module. After the text hidden vectors of the articles and the users are obtained, random positive and negative sampling is carried out on the article sample pairs of the users according to the data set, and preparation is made for the subsequent training of the fusion loss function.

Fig. 4 is a diagram showing the difference between the euclidean metric and the angular metric. From the perspective of the Euclidean distance measurement, three items cannot be correctly distinguished by the Euclidean distance measurement method because the Euclidean distance has geometrical constraint and is constrained by the triangle inequality. If the idea of angle measurement is taken, it can be seen that all three articles can be correctly distinguished as positive sample articles.

In the cooperative measurement method, a ternary loss is often used as a training loss function of the model, and the ternary loss function achieves the effect of correctly distinguishing samples by increasing the distance between a positive sample and a negative sample. However, in the cosine metric space, the value of the interval margin in the ternary loss function is affected by both the angular relationship between the vectors and the modulo length of the vectors.

As shown in fig. 5, a schematic diagram of a spacing margin in a general case is shown. As can be seen from the figure, the interval margin of the ternary loss function is simultaneously influenced by the model of the hidden vector of the article and the vector angle of the positive article and the negative article. Meanwhile, in the scope of angle measurement, the optimization target of the method is only the direct angle relation of the user object, and similarity grade prediction of the recommended task is given according to the direct angle relation. In the general case shown in fig. 5, the interval margin at this time corresponds to a model with implicit vectors introduced as an early market during training, which is not expected.

In summary, the hidden vector of the user object is mapped to the hypersphere of the hypersphere manifold, so that the modular lengths of all the hidden vectors of the user and the object are consistent, the influence caused by the modular length of the vector is offset, and the recommendation effect of the invention is further improved.

A fusion loss function module:

most collaborative filtering methods using metric learning mainly aim to better distinguish positive and negative sample pairs, which distinguish positive feedback articles and negative feedback articles of users as much as possible through a ternary loss function. However, it is a natural and effective way to reduce the angle between the hidden vector of the user and the positive feedback article as much as possible.

Based on this, the method of the embodiment regards the collaborative metric recommendation task as a two-classification task, and optimizes the positive sample pair by using a logistic regression loss function. The included angle between the user and the positive feedback article tends to be 0, so that the recommendation method can better model and aggregate positive samples. The model is then trained through a completely new mixing-loss function based on hypersphere metric learning.

On the other hand, the influence of the two loss functions, the ternary loss function and the logistic loss, on the model training is balanced using the tuning parameters. Through the proposed mixing loss function, the model can aggregate the user's positive sample items while distinguishing between positive and negative sample items.

The training method of the whole device comprises the following steps:

and acquiring vectorization representation of the article text information by using a pre-training semantic model, and fusing the vectorization representation with the article hidden vector in the recommendation module. And then random positive and negative sample sampling is carried out on the user and the article, and the samples are mapped onto the same high-dimensional hypersphere manifold hypersphere, so that the consistency of angle measurement learning is ensured. And finally, training the positive and negative sample articles and the positive sample articles simultaneously by using the proposed fusion loss function, and optimizing the inter-class distance of the positive and negative samples and the intra-class distance of the user and the articles in the positive sample until convergence.

Specific embodiments of the present invention are further described below with reference to the accompanying drawings.

As shown in fig. 1, the embodiment of the present invention mainly comprises a pre-training BERT module and a hypersphere cooperation metric recommendation module. The BERT pre-training model is divided into an encoder module and a decoder module, and the hypersphere cooperation measurement recommending module is divided into a hypersphere mapping module and a fusion loss function module. The implementation details of the individual modules are as follows.

An encoding and decoding module:

as shown in FIGS. 2 and 3, the transform-based coding and module mainly comprises a position embedding layer, a multi-head attention layer, a feedforward neural network layer and a regularization layer. Specifically, in the decoding module, the output part has a linearization layer and a softmax probability output layer.

The position embedding layer calculates word embedding of a position vector of a corresponding position for each word of the input text. The position coding is performed to introduce a position relation in the model training process, so that the model can distinguish words at different positions. The formula for position embedding is as follows:

where pos represents the position of the word and i represents the dimension of the word.

The expression for the multi-head attention layer is as follows:

in the encoder, Q, K, V are all inputs after the model has been subjected to the embedding layer operation, i.e. the model calculates the attention weight and parameters of the sequence input itself, which is also the origin from the name of the attention mechanism, and the resulting output is denoted by Z.

After the multi-head attention mechanism calculation, the Transformer is followed by a feedforward neural network:

FFN(Z)＝max(0，ZW₁+b₁)W₂+b₂

and performing standard layer processing on the last layer of the encoder, and updating the vectorization representation. By stacking multiple layers of transform structures, text features of the article can be extracted better through the deep neural network.

And in the part for extracting the text semantic information, using a pre-trained BERT model to input the text information of the article side into the BERT model to obtain a semantic representation vector. Meanwhile, in a downstream task, namely a recommendation task, the hidden vector of the article and the semantic vector of the article participate in scoring together in the final scoring model, in order to align the dimensionality of the hidden vector and the semantic vector of the article, a layer of neural network is added behind the BERT semantic model, and the number of nodes is consistent with the dimensionality k of the hidden vector. This also ensures the adaptability and expansibility of the semantic vector of the item from the semantic space to the recommendation task.

y＝BERT_pretrain(x)

z＝σ(yW_l×k+b)

The above formula summarizes the articlesCalculation of text information by means of a BERT pre-training model, where W_l*kIs the parametric weight matrix of l x k, σ is the sigmoid activation function, y is the output vector of the BERT model, and the dimension is l. z is the output vector of y through one layer of fully connected network, with dimension k.

A hypersphere mapping module:

from fig. 5, it is clear that the interval m is simultaneously affected by the positive and negative sample included angle difference θ + θ 'and the positive and negative sample mode lengths d and d'. Meanwhile, in the scope of angle measurement, the optimization target is only the angle relation between the user items, and similarity score prediction of the recommended task is given according to the angle relation. Therefore, the method maps the hidden vectors of the user and the articles to the hypersphere of the hypersphere manifold, so that the modular lengths of the hidden vectors of the articles of all the users are consistent to offset the influence brought by the modular length of the vectors, and for each hidden vector p of the user and the hidden vector q of the articles, the hidden vectors are converted into p^*And q is^*：

Wherein, | p | and | q | are the two norms of p and q, i.e., the modular length of the hidden vector of the user's article. r represents the spherical radius of the hypersphere manifold. Thus, we have the formula for the computation of the ternary penalty function:

mapping users and articles on a hypersphere in geometric space, and regarding the users and the articles as a point on the hypersphere, also enables the recommended method to have a brand-new geometric explanation.

Fusion loss function module

Regarding the collaborative filtering recommendation task as a two-classification task, optimizing the cosine distance between the user and the object in the positive sample pair by using a logistic regression loss function, wherein the optimization target is as follows:

s_i＝sigmoid(-d(p，q))

thus, there are:

where d (q, q) is the user implicit vector p and the positive feedback item q, y_iIndicating the category of the item q. In the logistic loss function, only those positive sample pairs are considered<p，q>The situation (2). Through model training, -d (p, q) ═ cos (θ) approaches 1, which also means that the angle between the user and the positive feedback item will approach 0, which allows the proposed method of the present invention to model and aggregate better positive samples.

Thus, we have a completely new hybrid loss function based on hypersphere metric learning:

loss_hybrid＝α·loss_{triplet loss}+(1-α)·loss_{logistic loss}

in the formula, the influence of two loss functions, tripletloss and logisticlos, on model training is balanced using the parameter α. Through the proposed mixing loss function, the proposed recommendation method can aggregate the positive sample items of the user and simultaneously distinguish the positive sample items from the negative sample items.

The training method of the whole device comprises the following steps:

firstly, inputting text information of an article into a BERT pre-training model to obtain a text feature vector of the article, fusing the text feature vector into an article hidden vector of a recommended task, and participating in training of a recommending module. In a recommendation module based on hypersphere metric learning, firstly, implicit vectors of users and articles are initialized, and then positive and negative sample pairs are randomly sampled according to a data set. After random sampling, mapping all hidden vectors of users and articles to the same hypersphere, and finally training through a fusion loss function, wherein a ternary loss function trains a positive sample pair and a negative sample pair, increases the inter-class cosine distance between the positive sample and the negative sample, a logistic loss function trains a positive sample pair, reduces the cosine angles of the positive sample and the users, and balances the training weights of the two loss functions through an adjusting factor alpha until the model converges.

The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.

The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims

1. A hypersphere cooperative measurement recommendation device based on a pre-training semantic model is characterized by comprising a pre-training implicit module and a cooperative measurement recommendation module, wherein the pre-training implicit module comprises an article text information encoder and a decoder, and the cooperative measurement recommendation module comprises a hypersphere mapping module and a fusion loss function module;

2. The hypersphere cooperative metric recommendation device of claim 1, further comprising a positive and negative random sampling module for randomly sampling training samples and providing the training samples to the cooperative metric recommendation module.

3. The hypersphere cooperative metric recommendation device of claim 1 or 2, wherein said pre-training implicit module is a BERT pre-training model, from which a semanticized vector representation of an item text is obtained.

4. The hypersphere cooperative metric recommendation device of any one of claims 1 to 3, wherein the article encoder comprises a position embedding module and multiple coding layers, the position embedding module embeds words into input and records position information of each vocabulary in a text, and the multiple coding layers comprise a multi-head attention module, a feedforward neural network module and a normalization module, wherein the multi-head attention module is used for capturing word-word relations between input text vectors to obtain word-word relations inside sentences, the feedforward neural network is used for improving generalization of the modules and increasing depth of the neural network, and the normalization module is used for performing normalization to ensure consistency of output.

5. The hypersphere cooperative metric recommendation device of claim 4, wherein said item decoder has a self-attention module for computing a relationship between currently parsed and already parsed text; and the output vector decoded by the article decoder is used as a final vectorization representation of the text semantic features.

6. The hypersphere cooperative metric recommendation device of claim 5, wherein said feedforward neural network is a residual network, which is summed with the result of the upper layer calculation to maintain memory; and finally, the article decoder outputs the probability of softmax and calculates the matching probability of the source text and the target text.

7. The hypersphere cooperative metric recommendation device of any of claims 1 to 6, wherein the hypersphere mapping module maps the hidden vectors of the user and the object onto the hypersphere of the hypersphere manifold, so that the modulo lengths of the hidden vectors of the user and the object are consistent to offset the influence of the vector modulo length.

8. The hypersphere cooperative metric recommendation device of any of claims 1-6, wherein the fusion loss function module comprises a ternary loss function module for increasing an inter-class distance of a positive and negative training sample pair and a logistic loss function module for decreasing an intra-class distance of an item and a user in the positive sample pair; forming a hybrid loss function based on hypersphere metric learning through the ternary loss function module and the logistic loss function module to train the model, so that the model aggregates positive sample objects of the user and simultaneously distinguishes the positive sample objects and the negative sample objects; preferably, the fusion loss function module further adds an adjustment factor to control the weight ratio of the ternary loss function and the logistic loss function; preferably, the synergy metric recommendation task is treated as a two-classification task, and the positive sample pairs are optimized by the logistic regression loss function so that the angle between the user and the positive feedback item will tend to 0 in order to model and aggregate the positive samples.

9. A hypersphere cooperative metric recommendation method based on a pre-trained semantic model is characterized in that the hypersphere cooperative metric recommendation device as claimed in any one of claims 1 to 8 is used for recommendation.

10. A method of training a hypersphere cooperative metric recommendation device as recited in any of claims 1 to 8, comprising: obtaining vectorization representation of article text information by using the pre-training latent semantic module, and fusing the vectorization representation with the article latent vector in the collaborative measurement recommendation module; sampling positive and negative samples of a user and an article, and mapping the positive and negative samples to the same high-dimensional hypersphere manifold hypersphere; training positive and negative sample articles and positive sample articles simultaneously by fusing loss functions, and optimizing the inter-class distance of the positive and negative samples and the intra-class distance of the user and the articles in the positive sample until convergence.