FR3144362A1

FR3144362A1 - Recommendation system and method using multivariate data learning by collaborative filtering

Info

Publication number: FR3144362A1
Application number: FR2214530A
Authority: FR
Inventors: Pierre BLANCHART
Original assignee: Commissariat a lEnergie Atomique CEA; Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Current assignee: Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2024-06-28

Abstract

Il est proposé un système de recommandation de produits par filtrage collaboratif (10) mettant en œuvre des itérations d’une fonction d’apprentissage, appliquée à un ensemble de données et comprenant : un bloc d’initialisation (142) pour déterminer individus d’entrainement , chacun associé à un comportement associé à produits, et individus voisins , issus de ;un bloc d’inférence (144) pour déterminer probabilités , chacune associée à un individu et un produit, et déterminée via mesures de similarité déterminées en appliquant une fonction de mapping de paramètres d’apprentissage aux individus et ;un bloc de rétropropagation (148) pour mettre à jour les paramètres , via lesdites probabilités comparées au comportement associé ; le système (10) déterminant des produits à recommander à un individu ciblé en utilisant les paramètres mis à jour. Figure pour l’abrégé : [Fig. 2]A product recommendation system by collaborative filtering (10) is proposed implementing iterations of a learning function, applied to a set of data and comprising: an initialization block (142) for determining individuals of training, each associated with a behavior associated with products, and neighboring individuals, resulting from; an inference block (144) to determine probabilities, each associated with an individual and a product, and determined via similarity measures determined by applying a function mapping learning parameters to individuals and; a backpropagation block (148) for updating the parameters, via said probabilities compared to the associated behavior; the system (10) determining products to recommend to a targeted individual using the updated parameters. Figure for the abstract: [Fig. 2]

Description

Système et procédé de recommandation utilisant un apprentissage de données multivariées par filtrage collaboratifRecommendation system and method using multivariate data learning by collaborative filtering

La présente invention concerne de manière générale le domaine de systèmes informatiques, et en particulier un système et un procédé de recommandations utilisant un apprentissage supervisé de données multivariées par filtrage collaboratif, utilisant une mesure de similarité optimisée.The present invention relates generally to the field of computer systems, and in particular to a recommendation system and method using supervised learning of multivariate data by collaborative filtering, using an optimized similarity measure.

Les systèmes de recommandations sont classiquement utilisés pour déterminer des recommandations de produits à destination des utilisateurs (appelés ci-après ‘individus’) dans un système fournisseur de produits (systèmes de e-commerce par exemple).Recommendation systems are classically used to determine product recommendations for users (hereinafter called “individuals”) in a product provider system (e-commerce systems for example).

Des systèmes connus de génération de recommandations combinent des modèles de type réseaux de neurones et des technologies d’apprentissage de données par filtrage collaboratif.Known recommendation generation systems combine neural network type models and data learning technologies by collaborative filtering.

Certains systèmes existants sont basés sur l’apprentissage conjoint d’une métrique entre individus et produits, et sur des représentations de l’espace latent des individus et des produits, l’apprentissage utilisant un algorithme de descente de gradient, comme décrit par exemple dans l’article “Collaborative metric learning” de C. K. Hsieh et al. 2017, Proceedings of the 26th international conference on world wide web, pages 193–201. D’autres systèmes existants utilisent un apprentissage d’une mesure de similarité entre produits, basé sur l’historique d’achats des utilisateurs, qui utilise un algorithme appliqué à un réseau de neurones et un critère de similarité de type « objectif contrastive», comme décrit dans l’article “Collaborative metric learning recommendation system: Application to theatrical movie releases” de M. Campo et al. 2018, arXiv preprint arXiv:1803.00202. Cependant, dans de telles solutions, l’apprentissage de données ne permet pas d’exploiter les informations exogènes potentielles sur les utilisateurs.Some existing systems are based on the joint learning of a metric between individuals and products, and on representations of the latent space of individuals and products, the learning using a gradient descent algorithm, as described for example in the article “Collaborative metric learning” by C. K. Hsieh et al. 2017, Proceedings of the 26th international conference on world wide web, pages 193–201. Other existing systems use learning of a measure of similarity between products, based on users' purchasing history, which uses an algorithm applied to a neural network and a similarity criterion of the "contrastive objective" type, as described in the article “Collaborative metric learning recommendation system: Application to theatrical movie releases” by M. Campo et al. 2018, arXiv preprint arXiv:1803.00202. However, in such solutions, data learning does not make it possible to exploit potential exogenous information about users.

Il existe ainsi un besoin pour un système et un procédé capable d’améliorer l’apprentissage de données multivariées et de fournir une solution plus optimale de générations de recommandations de produits à des individus.There is thus a need for a system and a method capable of improving the learning of multivariate data and providing a more optimal solution for generating product recommendations to individuals.

La présente invention vient améliorer la situation en proposant un système de détermination de recommandations de produits par filtrage collaboratif comprenant un dispositif d’apprentissage de données apte à recevoir un ensemble de données initial comprenant un pluralité d’individus initiaux issue d’une source de données. Le dispositif d’apprentissage est configuré pour mettre en œuvre une pluralité d’itérations d’une fonction d’apprentissage. La fonction d’apprentissage comprend :
- un bloc d’initialisation configuré pour échantillonner une pluralité d’individus d’entrainement issus de l’ensemble de données initial , chaque individu d’entrainement étant associé à un ensemble de caractéristiques multivariées et à un vecteur de comportement associé à une pluralité de produits , le bloc d’initialisation étant en outre configuré pour déterminer une pluralité d’individus voisins issus de l’ensemble de données initial , en appliquant une fonction de voisinage à chacun des ensembles de caractéristiques multivariées des individus d’entrainement ;
- un bloc d’inférence configuré pour déterminer, pour chaque individu d’entrainement , une pluralité de probabilités associées aux produits et à une pluralité de mesures de similarité , les mesures de similarité étant déterminées en appliquant une fonction de mapping aux individus d’entrainement et aux individus voisins , la fonction de mapping étant associée à des paramètres d’apprentissage ;
- un bloc de rétropropagation (148) configuré pour mettre à jour les paramètres d’apprentissage , en appliquant une phase de rétropropagation pour chaque individu d’entrainement , la mise à jour étant effectuée à partir d’au moins une comparaison des probabilités avec le vecteur de comportement associé.The present invention improves the situation by proposing a system for determining product recommendations by collaborative filtering comprising a data learning device capable of receiving an initial set of data comprising a plurality of initial individuals from a data source. The training device is configured to implement a plurality of iterations of a training function. The learning function includes:
- an initialization block configured to sample a plurality of training individuals from the initial dataset , each training individual being associated with a set of multivariate characteristics and a behavior vector associated with a plurality of products , the initialization block being further configured to determine a plurality of neighboring individuals from the initial dataset , by applying a neighborhood function to each of the sets of multivariate characteristics of training individuals ;
- an inference block configured to determine, for each training individual , a plurality of probabilities associated with products and to a plurality similarity measures , similarity measures being determined by applying a mapping function to training individuals and neighboring individuals , the mapping function being associated with learning parameters ;
- a backpropagation block (148) configured to update the learning parameters , by applying a backpropagation phase for each training individual , the update being carried out from at least one comparison of the probabilities with the behavior vector partner.

Le système comprend en outre un module d’application configuré pour déterminer un ou plusieurs des produits à recommander à un individu ciblé issu de la source de données, à partir d’une détermination de probabilités associées à l’individu ciblé en utilisant les paramètres d’apprentissage mis à jour par la pluralité d’itérations de la fonction d’apprentissage.The system further includes an application module configured to determine one or more of the products to recommend to a targeted individual from the data source, from a determination of probabilities associated with the targeted individual using learning settings updated by the plurality of iterations of the learning function.

Dans des modes de réalisation, la fonction de mapping peut être appliquée de manière à déterminer respectivement un espace latent d’entrainement et un espace latent de voisinage . Les mesures de similarité peuvent être en outre déterminées en fonction de différences entre l’espace latent d’entrainement et l’espace latent de voisinage . La phase de rétropropagation peut être appliquée en outre à partir du calcul du gradient des mesures de similarité par rapport à l’espace latent d’entrainement . Le module d’application (16) peut être configuré pour appliquer :
- la fonction de voisinage, à l’individu ciblé , pour générer une pluralité d’individus voisins appliqués issus de l’ensemble de données initial ;
- la fonction de mapping , à l’individu ciblé et aux individus voisins appliqués , pour déterminer respectivement un élément latent estimé et un espace latent de voisinage applicatif .In embodiments, the mapping function can be applied so as to respectively determine a latent training space and a latent neighborhood space . Similarity measures can be further determined based on differences between the training latent space and the latent neighborhood space . The backpropagation phase can be further applied from the calculation of the gradient of the similarity measures compared to the latent training space . The application module (16) can be configured to apply:
- the neighborhood function, to the targeted individual , to generate a plurality of neighboring individuals applied from the initial dataset ;
- the mapping function , to the targeted individual and to neighboring individuals applied , to respectively determine an estimated latent element and a latent application neighborhood space .

Les probabilités peuvent être estimées à partir de mesures de similarité déterminées en fonction de différences entre l’élément latent estimé et l’espace latent de voisinage applicatif .The odds can be estimated from similarity measures determined based on differences between the estimated latent element and the latent application neighborhood space .

La fonction de voisinage peut comprendre, pour chaque individu d’entrainement :
- une détermination de mesures de distance entre l’individu d’entrainement et chacun des individus initiaux ;
- une comparaison de chaque mesure de distance avec une valeur de seuil de distance prédéterminée.The neighborhood function can include, for each training individual :
- a determination of distance measurements between the training individual and each of the initial individuals ;
- a comparison of each distance measurement with a distance threshold value predetermined.

Le dispositif d’apprentissage peut être configuré pour générer sous-ensembles de données de voisinage à partir de la comparaison, tel que .The learning device can be configured to generate Neighborhood Data Subsets from the comparison, such that .

La détermination de la pluralité d’individus voisins peut être effectuée à partir d’une intersection entre les sous-ensembles de données de voisinage .Determination of the plurality of neighboring individuals can be performed from an intersection between neighborhood subsets of data .

Avantageusement, les mesures de similarité peuvent être déterminées à partir d’une fonction dérivable de type Gaussian kernel, ou d’une fonction dérivable de type Student-t kernel.Advantageously, the similarity measures can be determined from a differentiable function of the Gaussian kernel type, or from a differentiable function of the Student-t kernel type.

Dans des modes de réalisation, la fonction d’apprentissage peut comprendre en outre un bloc de comparaison configuré pour déterminer et évaluer critères d’évaluation à partir des probabilités et des vecteurs de comportement d’entrainement . La phase de rétropropagation peut être appliquée en outre à partir du calcul du gradient des probabilités par rapport aux mesures de similarité .In embodiments, the learning function may further include a comparison block configured to determine and evaluate evaluation criteria from the probabilities and training behavior vectors . The backpropagation phase can be further applied from the calculation of the probability gradient compared to similarity measures .

Les critères d’évaluation peuvent être déterminés à partir d’une fonction de coût de type entropie croisée. Le dispositif d’apprentissage de données peut être configuré pour déterminer un coefficient de perte à partir des critères d’évaluation , pour évaluer la valeur du coefficient de perte par rapport à un critère d’évaluation minimal prédéterminé, et pour générer un point arrêt de la fonction d’apprentissage si la valeur du coefficient de perte est inférieure ou égale au critère d’évaluation minimal . Le dispositif d’apprentissage de données peut être alors configuré pour arrêter les itérations de la fonction d’apprentissage en réponse de la détection du point arrêt.Evaluation criteria can be determined from a cross-entropy cost function. The data learning device may be configured to determine a loss coefficient based on the evaluation criteria , to evaluate the value of the loss coefficient compared to a minimum evaluation criterion predetermined, and to generate a stopping point of the learning function if the value of the loss coefficient is less than or equal to the minimum evaluation criterion . The data training device may then be configured to stop iterations of the training function in response to detection of the stopping point.

Dans des modes de réalisation, le bloc d’inférence (144) peut être en outre configuré pour échantillonner, pour chaque individu d’entrainement , une pluralité d’éléments parmi les probabilités . La phase de rétropropagation peut être appliquée en outre à partir du calcul du gradient des éléments échantillonnés par rapport aux mesures de similarité . Les critères d’évaluation peuvent être déterminés à partir d’une fonction d’erreur de type MAP. Le dispositif d’apprentissage de données peut être configuré pour déterminer un coefficient de gain à partir des critères d’évaluation , pour évaluer la valeur du coefficient de gain par rapport à un critère d’évaluation maximal prédéterminé, et pour générer un point d’arrêt de la fonction d’apprentissage si la valeur du coefficient de gain est supérieure ou égale au critère d’évaluation maximal . Le dispositif d’apprentissage de données peut être alors configuré pour arrêter les itérations de la fonction d’apprentissage en réponse de la détection du point arrêt.In embodiments, the inference block (144) may be further configured to sample, for each training individual , a plurality of elements among the probabilities . The backpropagation phase can be further applied from the calculation of the gradient of the elements sampled against similarity measures . Evaluation criteria can be determined from a MAP type error function. The data learning device may be configured to determine a gain coefficient based on the evaluation criteria , to evaluate the value of the gain coefficient compared to a maximum evaluation criterion predetermined, and to generate a stopping point of the learning function if the value of the gain coefficient is greater than or equal to the maximum evaluation criterion . The data training device may then be configured to stop iterations of the training function in response to detection of the stopping point.

Les mesures de similarité peuvent être en outre déterminées en fonction d’un paramètre de similarité . Le dispositif d’apprentissage de données peut être configuré pour mettre à jour à chaque itération le paramètre de similarité à partir d’un gradient de la fonction d’erreur en fonction du paramètre de similarité .Similarity measures can be further determined based on a similarity parameter . The data learning device can be configured to update the similarity parameter on each iteration from a gradient of the error function as a function of the similarity parameter .

La fonction de mapping peut comprendre une implémentation et une utilisation d’un réseau de neurones convolutif construit à partir de blocs de type convolution à une dimension.The mapping function may include an implementation and use of a convolutional neural network constructed from one-dimensional convolution-like blocks.

L’invention fournit également un procédé de détermination de recommandations de produits par filtrage collaboratif mis en œuvre par un système de détermination de recommandations apte à traiter un ensemble de données initial comprenant un pluralité d’individus initiaux issu d’une source de données. Le procédé comprend une pluralité d’itérations d’étapes d’apprentissage supervisé de données consistant à :
- échantillonner une pluralité d’individus d’entrainement issus d’un ensemble de données initial , chaque individu d’entrainement étant associé à un ensemble de caractéristiques multivariées et à un vecteur de comportement associé à une pluralité de produits ;
- déterminer une pluralité d’individus voisins issus de l’ensemble de données initial , à partir d’une fonction de voisinage appliquée à chacun des ensembles de caractéristiques multivariées des individus d’entrainement ;
- déterminer, pour chaque individu d’entrainement , une pluralité de probabilités associées aux produits et à mesures de similarité , les mesures de similarité étant générées en appliquant une fonction de mapping aux individus d’entrainement et aux individus voisins , la fonction de mapping étant associée à des paramètres d’apprentissage ;
- mettre à jour les paramètres d’apprentissage en appliquant une phase de rétropropagation, pour chaque individu d’entrainement , à partir d’au moins une comparaison des probabilités avec le vecteur de comportement associé.The invention also provides a method for determining product recommendations by collaborative filtering implemented by a recommendation determination system capable of processing an initial set of data comprising a plurality of initial individuals from a data source. The method comprises a plurality of iterations of supervised data learning steps consisting of:
- sample a plurality of training individuals from an initial dataset , each training individual being associated with a set of multivariate characteristics and a behavior vector associated with a plurality of products ;
- determine a plurality of neighboring individuals from the initial dataset , from a neighborhood function applied to each of the sets of multivariate characteristics of training individuals ;
- determine, for each training individual , a plurality of probabilities associated with products and to similarity measures , similarity measures being generated by applying a mapping function to training individuals and neighboring individuals , the mapping function being associated with learning parameters ;
- update learning settings by applying a backpropagation phase, for each training individual , from at least one comparison of probabilities with the behavior vector partner.

Le procédé comprend en outre une étape consistant à déterminer un ou plusieurs des produits à recommander à un individu ciblé issu de la source de données, à partir d’une détermination de probabilités associées à l’individu ciblé en utilisant les paramètres d’apprentissage mis à jour par la pluralité d’itérations d’étapes d’apprentissage.The method further comprises a step of determining one or more of the products to recommend to a targeted individual from the data source, from a determination of probabilities associated with the targeted individual using learning settings updated by the plurality of iterations of learning steps.

Le système et le procédé supervisé d’apprentissage de données selon les modes de réalisation de l’invention permettent l’optimisation conjointe d’une transformation entre ensembles de caractéristiques d’entrée et un espace latent, et d’une mesure de similarité entre représentations de l’espace latent, la mesure de similarité optimisée étant apprise à partir de l’utilisation d’un paradigme d’apprentissage dérivé du principe de filtrage collaboratif.The system and the supervised method for learning data according to the embodiments of the invention allow the joint optimization of a transformation between sets of input characteristics and a latent space, and of a measure of similarity between representations of the latent space, the optimized similarity measure being learned from the use of a learning paradigm derived from the principle of collaborative filtering.

Ils permettent notamment de fournir une solution de détermination de recommandations de produits à des individus à partir de données multivariées et en fonction de mesures de similarité entre ensembles de caractéristiques sous la forme d’un réseau de neurones profond, en utilisant un paradigme prédictif de type filtrage collaboratif.In particular, they make it possible to provide a solution for determining product recommendations to individuals from multivariate data and based on measures of similarity between sets of characteristics in the form of a deep neural network, using a predictive paradigm of the type collaborative filtering.

Description des figuresDescription of figures

D’autres caractéristiques, détails et avantages de l’invention ressortiront à la lecture de la description faite en référence aux dessins annexés donnés à titre d’exemple.Other characteristics, details and advantages of the invention will emerge on reading the description made with reference to the appended drawings given by way of example.

La est un schéma représentant une structure globale d’un système de détermination de recommandations par filtrage collaboratif, selon des modes de réalisation de l’invention. There is a diagram representing an overall structure of a system for determining recommendations by collaborative filtering, according to embodiments of the invention.

La est un schéma représentant des blocs fonctionnels associés à un algorithme d’apprentissage, selon des modes de réalisation de l’invention. There is a diagram representing functional blocks associated with a learning algorithm, according to embodiments of the invention.

La est un schéma représentant des blocs fonctionnels associés à un algorithme de prédiction pour la détermination de recommandations, selon des modes de réalisation de l’invention. There is a diagram representing functional blocks associated with a prediction algorithm for determining recommendations, according to embodiments of the invention.

La est un exemple de représentation visuelle possible d’un espace latent d’entrainement, selon des modes de réalisation de l’invention. There is an example of a possible visual representation of a latent training space, according to embodiments of the invention.

La est un organigramme représentant les étapes du procédé d’apprentissage de données, selon des modes de réalisation de l’invention. There is a flowchart representing the steps of the data learning method, according to embodiments of the invention.

La est un organigramme représentant les étapes du procédé de recommandation, selon des modes de réalisation de l’invention. There is a flowchart representing the steps of the recommendation method, according to embodiments of the invention.

Des références identiques sont utilisées dans les figures pour désigner des éléments identiques ou analogues.Identical references are used in the figures to designate identical or similar elements.

Description détailléedetailed description

La représente un exemple d’un système de détermination de recommandations de données par filtrage collaboratif 10 comprenant une source de données 12, un dispositif d’apprentissage de données 14 et un module d’application de l’apprentissage 16, selon certains modes de réalisation de l’invention.There represents an example of a system for determining data recommendations by collaborative filtering 10 comprising a data source 12, a data learning device 14 and a learning application module 16, according to certain embodiments of the invention.

Tel qu’utilisée ici, l’expression « prédiction de données par filtrage collaboratif » fait référence à une prédiction du comportement d’un individu donné à partir d’une combinaison linéaire pondérée du comportement d’individus voisins similaires. Cette combinaison pondérée est en particulier exprimée à travers la similarité (ou mesure de similarité) entre l’individu donné qui doit faire l’objet d’une prédiction de comportement (par exemple, une prédiction sur l’achat d’un produit) et chacun des individus similaires.As used herein, the term "collaborative filtering data prediction" refers to a prediction of the behavior of a given individual from a weighted linear combination of the behavior of similar neighboring individuals. This weighted combination is in particular expressed through the similarity (or similarity measure) between the given individual who must be the subject of a behavior prediction (for example, a prediction on the purchase of a product) and each of similar individuals.

La source de données 12 comprend un ou plusieurs ensembles de données pouvant être compris et/ou acquis par le système 10. Un ensemble de données est un ensemble de données multivariées généré à partir de la combinaison de données de tout type.The data source 12 includes one or more data sets capable of being understood and/or acquired by the system 10. A data set is a multivariate data set generated from the combination of data of any type.

En particulier, un ensemble de données multivariées peut être caractérisé par un ensemble d’individus notés . Tel qu’utilisé ici, le terme ‘individu’ est utilisé pour désigner un ensemble d’informations relatives à un utilisateur donné du système 10. Ainsi, chaque individu est associé à un ensemble de données noté correspondant à des ‘caractéristiques’ dites exogènes de l’individu .In particular, a multivariate data set can be characterized by a set of individuals noted . As used here, the term 'individual' is used to designate a set of information relating to a given user of the system 10. Thus, each individual is associated with a set of data noted corresponding to so-called exogenous 'characteristics' of the individual .

Par exemple et sans limitation, un individu peut représenter un client actuel ou un client futur. Un ensemble de caractéristiques d’un individu peut comprendre des informations issues d’une base de données client (ou banque de données client) comprenant des informations relatives aux individus. Une caractéristique peut être sans restriction une donnée numérique ou une donnée catégorielle. Une donnée catégorielle peut notamment être convertie en donnée numérique en utilisant un « encodage 1 parmi n » (ou «one-hot encoding »selon l’expression anglo-saxonne). Une donnée catégorielle peut être par exemple des informations d’‘âge’, de ‘sexe’, de ‘revenus’, de ‘secteur d’activité’, d’‘indice d’ancienneté’, etc.For example and without limitation, an individual may represent a current customer or a future customer. A set of characteristics of an individual may include information from a customer database (or customer database) comprising information relating to individuals. A characteristic can be without restriction numerical data or categorical data. Categorical data can in particular be converted into digital data using “1 among n encoding” (or “ one-hot encoding ” according to the Anglo-Saxon expression). Categorical data can for example be information on 'age', 'sex', 'income', 'sector of activity', 'seniority index', etc.

Dans des modes de réalisation, chaque individu d’un ensemble de données peut en outre être associé à un ensemble (ou vecteur) de données, noté , correspondant à un ‘comportement attendu’ ou ‘comportement effectif’ de l’individu .In embodiments, each individual of a set of data can also be associated with a set (or vector) of data, denoted , corresponding to an 'expected behavior' or 'actual behavior' of the individual .

Par exemple et sans limitation, le système de détermination de recommandations 10 (encore appelé ci-après « système de recommandations ») peut être un système de détermination de recommandations d’achat en relation avec un nombre de produits comprenant au moins un produit, étant un entier compris entre 1 et .For example and without limitation, the system for determining recommendations 10 (hereinafter also called “recommendations system”) may be a system for determining purchase recommendations in relation to a number of products comprising at least one product, being an integer between 1 and .

Dans des modes de réalisation, un vecteur de comportement d’un individu peut être une distribution discrète définie par . Un tel vecteur de comportement peut correspondre par exemple à un historique d’achats normalisé de l’individu . Dans ce cas, par exemple, chaque terme du vecteur de comportement peut représenter le décompte d’achat(s) d’un produit spécifique par un individu , sur une période de temps. Par exemple, la période de temps peut être égale à une année et demie. Dans des modes de réalisation, chaque terme du vecteur de comportement peut être normalisé, c’est-à-dire qu’il peut être divisé par le nombre total d’achats de l’individu défini sur une même période de temps, selon la formule (01) suivante :In embodiments, a behavior vector of an individual can be a discrete distribution defined by . Such a vector of behavior may correspond for example to a standardized purchasing history of the individual . In this case, for example, each term of the behavior vector can represent the purchase count(s) of a product specific by an individual , over a period of time. For example, the time period may be equal to one and a half years. In embodiments, each term of the behavior vector can be normalized, that is, it can be divided by the individual's total number of purchases defined over the same period of time, according to the following formula (01):

et (01) And (01)

Comme représenté sur la figure 1, le dispositif d’apprentissage de données 14 est apte à recevoir un ensemble de données initial noté et issu de la source de données 12. En particulier, l’ensemble de données initial comprend une pluralité d’individus notés et appelés ‘individu initiaux’. Chaque individu initial est associé à un ensemble initial de caractéristiques et un vecteur initial de comportement . L’ensemble de données initial peut être représenté par l’expression (02) suivante :As shown in Figure 1, the data learning device 14 is able to receive an initial set of data denoted and from data source 12. In particular, the initial dataset includes a plurality of individuals noted and called 'initial individuals'. Each initial individual is associated with an initial set of characteristics and an initial behavior vector . The initial dataset can be represented by the following expression (02):

(02) (02)

Le dispositif d’apprentissage de données 14 est configuré pour appliquer (ou mettre en œuvre) une pluralité d’itérations d’un algorithme d’apprentissage à l’ensemble de données initial . En particulier, l’algorithme d’apprentissage (ou fonction d’apprentissage, modèle d’apprentissage) peut correspondre à un algorithme du type « Algorithme de descente de gradient stochastique » (ouStochastic Gradient Descentselon l’expression anglo-saxonne correspondante).The data learning device 14 is configured to apply (or implement) a plurality of iterations of a learning algorithm to the initial dataset . In particular, the learning algorithm (or learning function, learning model) can correspond to an algorithm of the type “Stochastic Gradient Descent Algorithm” (or Stochastic Gradient Descent according to the corresponding Anglo-Saxon expression) .

L’algorithme d’apprentissage peut être réalisé à partir d’une implémentation de n’importe quelle méthode d’apprentissage classique et/ou architecture de réseau de neurone, généralement caractérisé par une fonction de mapping notée et associée à un ensemble de paramètres d’apprentissage . Les paramètres d’apprentissage dépendent de la méthode d’apprentissage utilisé. Par exemple et sans limitation, la méthode d’apprentissage utilisée peut être issu d’un réseau de neurones convolutif (ou convolutionnel) construit à partir de blocs de type convolution à une dimension (1D). Un réseau de neurones à couches linéaires peut également être utilisé.The learning algorithm can be carried out from an implementation of any classical learning method and/or neural network architecture, generally characterized by a mapping function denoted and associated with a set of learning parameters . Learning settings depend on the learning method used. For example and without limitation, the learning method used can come from a convolutional (or convolutional) neural network constructed from one-dimensional (1D) convolution-type blocks. A linear layer neural network can also be used.

Pour faciliter la compréhension des modes de réalisation de l'invention, des définitions ou notions relatives à l’algorithme d’apprentissage et aux réseaux de neurones sont détaillées ci-après.To facilitate understanding of the embodiments of the invention, definitions or notions relating to the learning algorithm and neural networks are detailed below.

De façon générale, un réseau de neurone constitue un modèle de calcul comprenant des opérations de calculs, appelées « neurones », interconnectées entre elles par des liaisons, ou « synapses », implémentées sous la forme de mémoires numériques. Un réseau de neurones peut comprendre une pluralité de couches successives de neurones, comprenant une couche d'entrée, une couche de sortie, et une ou plusieurs couches intermédiaires. La couche d'entrée porte des données d'entrée du réseau, c’est-à-dire notamment un ensemble de caractéristiques . Un réseau de neurones est associé à un ensemble de paramètres d’apprentissage . Par exemple et sans limitations, un ensemble de paramètres d’apprentissage peut être un ensemble de données de modèle d’apprentissage relatifs à des biais associés aux neurones du réseau et/ou à des poids synaptiques associés aux liaisons entre différents neurones du réseau. Un ensemble de paramètres d’apprentissage peut en outre être associé à des poids de filtres convolutifs.Generally speaking, a neural network constitutes a calculation model comprising calculation operations, called “neurons”, interconnected between them by links, or “synapses”, implemented in the form of digital memories. A neural network may include a plurality of successive layers of neurons, including an input layer, an output layer, and one or more intermediate layers. The input layer carries network input data, that is to say in particular a set of characteristics . A neural network is associated with a set of learning parameters . For example and without limitations, a set of learning parameters may be a set of learning model data relating to biases associated with the neurons of the network and/or to synaptic weights associated with the connections between different neurons in the network. A set of training parameters can further be associated with convolutional filter weights.

Par ailleurs, à chaque itération, l’algorithme du type « Algorithme de descente de gradient stochastique » comprend une phase d’inférence (ou phase de propagation vers l’avant diteforward) et une phase de mise à jour des paramètres d’apprentissage du modèle (appelée également phase de rétropropagation par gradients ditebackward). La phase d’inférence propage les données caractéristiques d'entrée de la couche d'entrée au travers des couches successives du réseau de neurones. Chaque couche d'un réseau de neurones prend ses entrées sur les sorties de la couche précédente. Les données sont propagées jusqu’à la couche de sortie portant le résultat des calculs effectués par le réseau de neurones. Ce résultat peut permettre de déterminer la prédiction notée du réseau par rapport aux données d'entrée considérées. La phase de rétropropagation permet de corriger les erreurs entre les sorties obtenues dans la phase d’inférence et les sorties attendues (ou vecteur de comportement ) pour des données d’entrée considérées.Furthermore, at each iteration, the algorithm of the “Stochastic Gradient Descent Algorithm” type includes an inference phase (or forward propagation phase called forward ) and a phase for updating the learning parameters. of the model (also called backward gradient backpropagation phase). The inference phase propagates the input feature data from the input layer through successive layers of the neural network. Each layer of a neural network takes its inputs from the outputs of the previous layer. The data is propagated to the output layer carrying the result of the calculations carried out by the neural network. This result can make it possible to determine the prediction noted of the network in relation to the input data considered. The backpropagation phase makes it possible to correct errors between the outputs obtained in the inference phase and the expected outputs (or behavior vector ) for considered input data.

En particulier, l’algorithme d’apprentissage permet de comparer une prédiction par rapport à une sortie attendue selon un critère d’erreur ou d’évaluation encore appelé ‘objectif’, et en fonction de cette comparaison, de mettre à jour à chaque itération les paramètres d’apprentissage pour améliorer le résultat final pouvant être délivré par le réseau de neurone.In particular, the learning algorithm makes it possible to compare a prediction compared to an expected output according to an error or evaluation criterion also called 'objective', and based on this comparison, to update the learning parameters at each iteration to improve the final result that can be delivered by the neural network.

Dans la phase de rétropropagation, les erreurs éventuelles obtenues par un ou plusieurs neurones du réseau sont « rétropropagées », aux liaisons et aux neurones qui lui sont reliés, vers l’arrière à partir de la dernière couche, de couche en couche, jusqu’à la première couche. Dans les cas où le critère d’erreur est défini à partir d’une fonction dérivable, la rétropropagation peut être une rétropropagation par gradients, dans laquelle dans chaque neurone d’une couche donnée, un message d’erreur à rétropropager est déterminé en utilisant la dérivée du critère d’erreur par rapport à la sortie du neurone de la couche considérée.In the backpropagation phase, any errors obtained by one or more neurons in the network are "backpropagated", to the links and neurons connected to it, backwards from the last layer, from layer to layer, until to the first layer. In cases where the error criterion is defined from a differentiable function, the backpropagation can be a gradient backpropagation, in which in each neuron of a given layer, an error message to be backpropagated is determined using the derivative of the error criterion with respect to the output of the neuron of the layer considered.

La représente schématiquement des blocs fonctionnels associés à l’algorithme d’apprentissage appliqué, selon certains modes de réalisation de l’invention.There schematically represents functional blocks associated with the applied learning algorithm, according to certain embodiments of the invention.

L’algorithme d’apprentissage comprend un bloc d’initialisation 142 configuré pour générer un ensemble d’entrainement de données issu de l’ensemble de données initial . En particulier, l’ensemble d’entrainement de données comprend une pluralité d’individus notés et appelés ‘individus d’entrainement’. Chaque individu d’entrainement est échantillonné parmi l’ensemble des individus initiaux . Chaque individu d’entrainement est alors associé à un ensemble de caractéristiques d’entrainement et un vecteur de comportement d’entrainement (encore appelé ‘distribution comportementale d’entrainement’). L’ensemble d’entrainement de données peut être défini par l’expression (03) suivante :The learning algorithm includes an initialization block 142 configured to generate a training set of data from the initial dataset . In particular, the training set of data includes a plurality of individuals noted and called 'training individuals'. Each training individual is sampled from all the initial individuals . Each training individual is then associated with a set of training characteristics and a training behavior vector (also called 'behavioral training distribution'). The training set of data can be defined by the following expression (03):

(03) (03)

Le bloc d’initialisation 142 est en outre configuré pour générer un ensemble de données de voisinage issu de l’ensemble de données initial . En particulier, l’ensemble de données de voisinage comprend une pluralité d’individus notés et appelés ‘individu voisins’. Chaque individu voisin est alors associé à un ensemble de caractéristiques de voisinage et un vecteur de comportement de voisinage (encore appelé ‘distribution comportementale de voisinage’). L’ensemble de données de voisinage peut être défini par l’expression (04) suivante :The initialization block 142 is further configured to generate a set of neighborhood data from the initial dataset . In particular, the neighborhood dataset includes a plurality of individuals noted and called 'individual neighbors'. Each neighboring individual is then associated with a set of neighborhood characteristics and a neighborhood behavior vector (also called 'neighborhood behavioral distribution'). The Neighborhood Dataset can be defined by the following expression (04):

(04) (04)

Chaque individu voisin est choisi parmi l’ensemble des individus initiaux . En particulier, le choix d’un individu initial donné est effectué à partir d’une fonction de voisinage appliquée entre cet individu initial donné et les différents individus d’entrainement échantillonné.Each neighboring individual is chosen from all the initial individuals . In particular, the choice of a given initial individual is made from a neighborhood function applied between this given initial individual and the different training individuals. sampled.

Dans des modes de réalisation, l’application de la fonction de voisinage peut comprendre une détermination de mesures de distance entre individus d’entrainement et individus initiaux . Une mesure de distance, notée ou , s’effectue entre au moins une partie des données comprises dans l’ensemble de caractéristiques d’entrainement et au moins une partie une partie des données comprises dans l’ensemble initial de caractéristiques . Par exemple et sans limitation, la mesure de distance peut être effectuée par exemple en utilisant une méthode de calcul d’une distance euclidienne, entre les deux l’ensemble de caractéristiques d’entrainement et initial .In embodiments, application of the neighborhood function may include determining distance measurements between training individuals and initial individuals . A measure of distance, noted Or , is carried out between at least part of the data included in the set of training characteristics and at least a portion of the data included in the initial set of characteristics . For example and without limitation, the distance measurement can be carried out for example using a method of calculating a Euclidean distance, between the two set of training characteristics and initial .

L’application de la fonction de voisinage peut en outre comprendre une comparaison de chacune des mesures de distance déterminées avec une valeur de seuil de distance prédéterminée. La valeur de seuil de distance peut notamment être prédéfinie en fonction de la méthode de la mesure de distance utilisée.The application of the neighborhood function may further include a comparison of each of the distance measurements determined with a distance threshold value predetermined. The distance threshold value can in particular be predefined depending on the distance measurement method used.

Avantageusement, le dispositif d’apprentissage de données peut être configuré pour générer sous-ensembles de données de voisinage, notés , à partir de la comparaison des mesures de distance avec la valeur de seuil de distance . En particulier, un individu initial peut être compris dans un sous-ensemble de données de voisinage associé à un individu d’entrainement , si la mesure de distance entre l’individu initial et l’individu d’entrainement est supérieure à la valeur de seuil de distance . Le sous-ensemble de données de voisinage exclu ainsi l’individu d’entrainement . Dans ce cas, chaque sous-ensemble de données de voisinage peut être représenté par l’expression (05) suivante :Advantageously, the data learning device can be configured to generate subsets of neighborhood data, denoted , from the comparison of distance measurements with the distance threshold value . In particular, an initial individual can be included in a subset of neighborhood data associated with a training individual , if the distance measurement between the initial individual and the training individual is greater than the distance threshold value . The Neighborhood Dataset thus excludes the individual from training . In this case, each subset of neighborhood data can be represented by the following expression (05):

(05) (05)

Selon des modes de réalisation, la génération de l’ensemble de données de voisinage associé à un ensemble d’entrainement de données peut alors être effectuée à partir d’une intersection entre une partie ou la totalité des sous-ensembles de données de voisinage . L’ensemble de données de voisinage peut être également défini par l’expression (06) suivante :According to embodiments, generating the neighborhood dataset associated with a training set of data can then be carried out from an intersection between part or all of the Neighborhood Data Subsets . The Neighborhood Dataset can also be defined by the following expression (06):

(06) (06)

L’utilisation d’un ensemble de données de voisinage et la valeur de seuil de distance permet au modèle d’apprentissage de ne pas utiliser d’individus trop similaires pour prédire un individu d’entrainement donné, et par conséquent d’obtenir de bonnes capacités de généralisation. En d’autres termes, l’utilisation de et permet au modèle d’apprentissage d’être capable d’interpoler convenablement les ensembles de caractéristiques entre différents individus d’une base de données, même en des zones où il y a peu ou pas d’individus similaires « à proximité ».Using a Neighborhood Dataset and the distance threshold value allows the learning model not to use overly similar individuals to predict a given training individual, and consequently to obtain good generalization capabilities. In other words, the use of And allows the learning model to be able to properly interpolate feature sets between different individuals in a database, even in areas where there are few or no similar individuals “nearby”.

Par ailleurs, la mise en œuvre de l’intersection entre sous-ensembles de données de voisinage permet d’exclure, de l’ensemble de données de voisinage , tout individu d’entrainement .Furthermore, the implementation of the intersection between subsets of neighborhood data allows you to exclude from the neighborhood dataset , any training individual .

L’algorithme d’apprentissage comprend également un bloc d’inférence 144 configuré pour appliquer une phase d’inférence aux ensembles de caractéristiques d’entrainement et de voisinage .The learning algorithm also includes an inference block 144 configured to apply an inference phase to the training feature sets and neighborhood .

La phase d’inférence met en œuvre la fonction de mapping à l’ensemble de caractéristiques d’entrainement de manière à déterminer un espace latent dit espace latent d’entrainement noté . Le bloc d’inférence 144 est également configuré pour appliquer cette fonction de mapping à l’ensemble de caractéristiques de voisinage de manière à déterminer un autre espace latent dit espace latent de voisinage noté .The inference phase implements the mapping function to the set of training characteristics so as to determine a latent space called latent training space noted . The inference block 144 is also configured to apply this mapping function to the set of neighborhood characteristics so as to determine another latent space called neighborhood latent space noted .

Les espaces latents d’entrainement et de voisinage sont les données de résultat des calculs obtenus à la couche de sortie d’un réseau de neurones par exemple. Les espaces latents d’entrainement et de voisinage sont en particulier des tenseurs comprenant des éléments latents d’entrainement et de voisinage, notés respectivement et .Latent training spaces and neighborhood are the result data of the calculations obtained at the output layer of a neural network for example. Latent training spaces and neighborhood are in particular tensors comprising latent training and neighborhood elements, denoted respectively And .

Le bloc d’inférence 144 est également configuré pour déterminer vecteurs de prédiction de comportement notés (encore appelés ‘distributions comportementales prédites’). Chaque vecteur de prédiction de comportement est associé à un individu d’entrainement , et est déterminé à partir des vecteurs de comportement de voisinage et de mesures de similarité définies entre l’individu d’entrainement et chacun des individus voisins . Un vecteur de prédiction de comportement peut être ainsi défini par l’expression (07) suivante :The inference block 144 is also configured to determine noted behavior prediction vectors (also called 'predicted behavioral distributions'). Each behavior prediction vector is associated with a training individual , and is determined from the neighborhood behavior vectors and of similarity measures defined between the training individual and each of the neighboring individuals . A behavior prediction vector can thus be defined by the following expression (07):

(07) (07)

En particulier, un vecteur de prédiction de comportement associé à un individu correspond à une distribution discrète définie telle que .In particular, a behavior prediction vector associated with an individual corresponds to a discrete distribution defined such that .

En particulier, chaque terme du vecteur de prédiction de comportement peut correspondre à une probabilité qu’à un individu d’entrainement d’acheter le produit , tel que défini par l’expression (08) suivante :In particular, each term of the behavior prediction vector can correspond to a probability that a training individual to buy the product , as defined by the following expression (08):

et (08) And (08)

Dans des modes de réalisation, chaque mesure de similarité peut être calculée à partir d’une fonction de similarité dérivable prenant en compte un élément latent d’entrainement et un élément latent de voisinage . Dans ce cas, le bloc d’inférence 144 peut avantageusement être configuré pour déterminer un tenseur de voisinage noté et correspondant à la différence entre chacun les éléments latents d’entrainement et de voisinage , tel que défini par la formule (09) suivante :In embodiments, each similarity measure can be calculated from a derivable similarity function taking into account a latent training element and a latent neighborhood element . In this case, the inference block 144 can advantageously be configured to determine a neighborhood tensor denoted and corresponding to the difference between each of the latent training elements and neighborhood , as defined by the following formula (09):

(09) (09)

Chaque mesure de similarité est en outre déterminée en fonction d’un paramètre de similarité noté . Le résultat d’une mesure de similarité peut par exemple être une valeur réelle comprise entre 0 et 1.Each similarity measure is further determined according to a similarity parameter noted . The result of a similarity measurement can for example be a real value between 0 and 1.

Dans un mode de réalisation, la fonction de similarité dérivable peut être de type « noyau gaussien » (ou ‘Gaussian kernel ’selon l’expression anglo-saxonne correspondante). Dans ce cas, la mesure de similarité peut être définie par l’expression (10) suivante :In one embodiment, the derivable similarity function can be of the "Gaussian kernel" type (or ' Gaussian kernel ' according to the corresponding Anglo-Saxon expression). In this case, the similarity measure can be defined by the following expression (10):

(10) (10)

Alternativement, la fonction de similarité dérivable peut être de typeStudent -t kernel, et la mesure de similarité peut être définie par l’expression (11) suivante :Alternatively, the derivable similarity function can be of type Student -t kernel , and the similarity measure can be defined by the following expression (11):

(11) (11)

Selon certains modes de réalisation, le bloc d’inférence 144 peut également être configuré pour échantillonner (ou choisir ou déterminer), pour chaque individu d’entrainement , une pluralité d’éléments parmi l’ensemble des termes du vecteur de prédiction de comportement associé, étant un entier compris entre 1 et .According to some embodiments, the inference block 144 can also be configured to sample (or choose or determine), for each training individual , a plurality of elements among all the terms of the behavior prediction vector partner, being an integer between 1 and .

Il est à noter que par construction, une distribution comportementale est une distribution de probabilité sur les produits (tel que ). Ainsi, un tel échantillonnage peut être effectué par ‘tirages indépendants’ de produits suivant la loi multimodale du vecteur de prédiction de comportement défini par l’expression (07) précédente et en utilisant, par exemple et sans limitations, un principe d’échantillonnage de type « méthode de la transformée inverse ». Un tel échantillonnage correspond au tirage d’un produit avec une probabilité .It should be noted that by construction, a behavioral distribution is a probability distribution over the products (such as ). Thus, such sampling can be done by 'independent prints' of products following the multimodal law of the behavior prediction vector defined by the preceding expression (07) and using, for example and without limitations, a sampling principle of the “inverse transform method” type. Such sampling corresponds to the print run of a product with a probability .

Un échantillonnage suivant la loi multimodale peut correspondre à la mise en œuvre d’un classement (ou ‘ranking ’selon l’expression anglo-saxonne correspondante) d’une partie ou de l’ensemble des produits , correspondant par exemple aux produits qu’un individu d’entrainement est susceptible d’acheter. En particulier, lorsqu’un produit donné est acheté plusieurs fois par un individu d’entrainement (ou client), ce produit donné est classé comme produit pertinent ou important, et est alors susceptible d’être répété plusieurs fois (c’est-à-dire selon plusieurs échantillons) dans l’échantillonnage .A sampling according to the multimodal law may correspond to the implementation of a classification (or ' ranking ' according to the corresponding Anglo-Saxon expression) of part or all of the products , corresponding for example to the products that a training individual is likely to buy. In particular, when a product given is purchased several times by a training individual (or customer), this product given is classified as a relevant or important product, and is then likely to be repeated several times (i.e. in several samples) in sampling .

Dans des modes de réalisation, l’algorithme d’apprentissage peut comprendre un bloc de comparaison 146 configuré pour comparer chaque distribution comportementale prédite à la distribution comportementale d’entraînement correspondant. Pour cela, le bloc de comparaison 146 peut être configuré pour déterminer critères d’évaluation notés , à partir d’une fonction d’erreur ou fonction de coût. Chaque critère d’évaluation est associé à un individu d’entrainement et est déterminée à partir de la différence entre le vecteur de prédiction de comportement et le vecteur de comportement d’entrainement .In embodiments, the learning algorithm may include a comparison block 146 configured to compare each predicted behavioral distribution to behavioral training distribution corresponding. For this, the comparison block 146 can be configured to determine rated evaluation criteria , from an error function or cost function. Each evaluation criterion is associated with a training individual and is determined from the difference between the behavior prediction vector and the training behavior vector .

Dans une variante de réalisation, la fonction de coût peut être une fonction de coût dérivable. En particulier, la fonction de coût peut être une fonction de type ‘entropie croisée’. Dans ce cas, le critère d’évaluation peut être défini par l’expression (12) suivante :In a variant embodiment, the cost function can be a differentiable cost function. In particular, the cost function can be a 'cross entropy' type function. In this case, the evaluation criterion can be defined by the following expression (12):

(12) (12)

Alternativement, dans les modes de réalisation où un échantillonnage est déterminé à partir du vecteur de prédiction de comportement , la fonction de coût peut être une fonction de type « moyenne de ‘précision moyenne’ », encore appelée MAP (oumean average precisionselon l’expression anglo-saxonne correspondante) et désignée ci-après par la notation . Dans ce cas, le critère d’évaluation peut être défini selon l’équation (13) suivante :Alternatively, in embodiments where sampling is determined from the behavior prediction vector , the cost function can be an "average of 'average precision'" type function, also called MAP (or mean average precision according to the corresponding Anglo-Saxon expression) and designated below by the notation . In this case, the evaluation criterion can be defined according to the following equation (13):

(13) (13)

Il est à noter qu’une fonction de coût de type MAP n’est pas dérivable. Le critère d’évaluation défini par l’équation (13) correspond à la moyenne des ‘précisions moyennes’ (AP) de chaque élément de l’échantillonnage . La précision moyenne d’ordre , ou , peut être définie comme la moyenne des précisions de chacun des ordres de l’échantillonnage . En outre, la précision à l’ordre , ou , peut être définie comme le nombre d’éléments ‘pertinents’ parmi -premiers éléments (c’est-à-dire les premiers produits recommandés).It should be noted that a MAP type cost function is not differentiable. The evaluation criterion defined by equation (13) corresponds to the average of the 'average precisions' (AP) of each element of the sampling . Average order precision , Or , can be defined as the average of the precisions of each of the orders of sampling . Additionally, precision to order , Or , can be defined as the number of 'relevant' elements among -first elements (i.e. the first recommended products).

Avantageusement, la combinaison de l’utilisation d’une fonction de coût non-dérivable à l’application de l’algorithme du type « Algorithme de descente de gradient stochastique » peut être effectués à partir d’un procédé d’apprentissage par renforcement, et en particulier selon la méthode de « policy-gradient ». La méthode de « policy-gradient » (appliquée à d’autres problématiques) est notamment décrite dans l’article “Simple statistical gradient-following algorithms for connectionist reinforcement learning” de R. J. Williams et al. 1992, Machine learning, 8(3):229–256, ou dans l’article “Policy gradient methods for reinforcement learning with function approximation” de R. S. Sutton et al. 1999, Advances in neural information processing systems, page 12.Advantageously, the combination of the use of a non-derivable cost function with the application of the algorithm of the “Stochastic Gradient Descent Algorithm” type can be carried out using a reinforcement learning method, and in particular according to the “policy-gradient” method. The “policy-gradient” method (applied to other problems) is described in particular in the article “Simple statistical gradient-following algorithms for connectionist reinforcement learning” by R. J. Williams et al. 1992, Machine learning, 8(3):229–256, or in the article “Policy gradient methods for reinforcement learning with function approximation” by R. S. Sutton et al. 1999, Advances in neural information processing systems, page 12.

Dans les modes de réalisation où la fonction de coût est dérivable, lors d’une itération donnée, notée par exemple , de l’algorithme d’apprentissage, le bloc de comparaison 146 peut être configuré pour déterminer un coefficient de perte à partir des valeurs d’erreur associées à chaque individu d’entrainement . En particulier, le coefficient de perte peut être défini selon l’équation (14) suivante :In the embodiments where the cost function is differentiable, during a given iteration, noted for example , of the learning algorithm, the comparison block 146 can be configured to determine a loss coefficient from error values associated with each training individual . In particular, the loss coefficient can be defined according to the following equation (14):

(14) (14)

Le coefficient de perte est à minimiser à chaque itération. Ainsi, le bloc de comparaison 146 peut également être configuré pour évaluer la valeur du coefficient de perte par rapport à un critère d’évaluation minimal prédéterminé. Le bloc de comparaison 146 peut être en outre configuré pour générer un point arrêt du modèle d’apprentissage si la valeur du coefficient de perte est inférieure ou égale au critère d’évaluation minimal .The loss coefficient is to be minimized at each iteration. Thus, the comparison block 146 can also be configured to evaluate the value of the loss coefficient compared to a minimum evaluation criterion predetermined. The comparison block 146 can be further configured to generate a stopping point of the learning model if the value of the loss coefficient is less than or equal to the minimum evaluation criterion .

Dans les modes de réalisation où la fonction de coût est de type MAP, lors d’une itération donnée, notée par exemple , de l’algorithme d’apprentissage, le bloc de comparaison 146 peut être configuré pour déterminer un objectif (ou coefficient de gain) à partir des critères d’évaluation associées à chaque individu d’entrainement . En particulier, le coefficient de gain peut être défini comme l’espérance mathématiquedu critère d’évaluation .In the embodiments where the cost function is of type MAP, during a given iteration, noted for example , of the learning algorithm, the comparison block 146 can be configured to determine an objective (or gain coefficient) based on the evaluation criteria associated with each training individual . In particular, the gain coefficient can be defined as the mathematical expectation of the evaluation criterion .

Dans des modes de réalisation, le coefficient de gain peut être défini selon l’équation (15) suivante :In embodiments, the gain coefficient can be defined according to the following equation (15):

(15) (15)

Le coefficient de gain est à maximiser à chaque itération. Ainsi, le bloc de comparaison 146 peut également être configuré pour évaluer la valeur du coefficient de gain par rapport à un critère d’évaluation maximal prédéterminé. Le bloc de comparaison 146 peut être en outre configuré pour générer un point arrêt du modèle d’apprentissage si la valeur du coefficient de gain est supérieure ou égale au critère d’évaluation maximal .The gain coefficient is to be maximized at each iteration. Thus, the comparison block 146 can also be configured to evaluate the value of the gain coefficient compared to a maximum evaluation criterion predetermined. The comparison block 146 may be further configured to generate a stopping point of the learning model if the value of the gain coefficient is greater than or equal to the maximum evaluation criterion .

Dans certains modes de réalisation, lors d’une itération , le dispositif d’apprentissage de données 14 peut être configuré pour arrêter l’application du modèle d’apprentissage à la fin de l’itération , en réponse à la détection d’un point arrêt. Dans ce cas, la valeur du nombre d’itérations de l’algorithme d’apprentissage peut alors être fixée égale à , par exemple.In certain embodiments, during an iteration , the data training device 14 can be configured to stop the application of the training model at the end of the iteration , in response to the detection of a breakpoint. In this case, the value of the number of iterations of the learning algorithm can then be set equal to , For example.

L’algorithme d’apprentissage comprend également un bloc de rétropropagation 148 configuré pour appliquer une phase de rétropropagation par gradients calculés à partir de la fonction de mapping .The learning algorithm also includes a backpropagation block 148 configured to apply a backpropagation phase by gradients calculated from the mapping function .

Le bloc de rétropropagation 148 est alors configuré pour mettre en œuvre la phase de rétropropagation par gradients pour chaque individu d’entrainement à partir d’un « message retour » noté .The backpropagation block 148 is then configured to implement the backpropagation phase by gradients for each training individual from a noted “return message” .

Dans les modes de réalisation où la fonction de coût est dérivable, le message retour peut correspondre au gradient de la fonction de coût par rapport au paramètre d’apprentissage . Ce message retour peut être déterminé selon le théorème de dérivation des fonctions composées (ou ‘règle de la chaîne’ pourchain ruleselon l'appellation anglaise) à partir des gradients suivants :
- le gradient de la fonction d’erreur par rapport au vecteur de prédiction de comportement ;
- le gradient du vecteur de prédiction de comportement par rapport aux mesures de similarité ;
- le gradient des mesures de similarité par rapport aux éléments de l’espace latent d’entrainement ; et
- le gradient des éléments de l’espace latent d’entrainement par rapport aux paramètres d’apprentissage .In the embodiments where the cost function is differentiable, the return message can correspond to the gradient of the cost function compared to the learning parameter . This message returns can be determined according to the derivation theorem of compound functions (or ' chain rule ') from the following gradients:
- the gradient of the error function with respect to the behavior prediction vector ;
- the gradient of the behavior prediction vector compared to similarity measures ;
- the gradient of similarity measures with respect to the elements of the latent training space ; And
- the gradient of the elements of the latent training space compared to learning parameters .

En particulier, le gradient du vecteur de prédiction de comportement par rapport à une mesure de similarité peut être dérivé à partir de l’équation (07), et défini selon l’expression (16) suivante :In particular, the gradient of the behavior prediction vector compared to a similarity measure can be derived from equation (07), and defined according to the following expression (16):

(16) (16)

En particulier, si la fonction de coût est de type ‘entropie croisée’, le gradient de la fonction de coût par rapport au vecteur de prédiction de comportement peut être défini selon l’expression (17) suivante :In particular, if the cost function is of the 'cross entropy' type, the gradient of the cost function with respect to the behavior prediction vector can be defined according to the following expression (17):

(17) (17)

Alternativement, dans les modes de réalisation où la fonction de coût est de type MAP, et donc non dérivable, le message retour peut correspondre au gradient de l’espérance mathématiquedu critère d’évaluation par rapport au paramètre d’apprentissage . Ce message retour peut être déterminé selon la formule (18) suivante :Alternatively, in the embodiments where the cost function is of type MAP, and therefore not differentiable, the return message can correspond to the gradient of the mathematical expectation of the evaluation criterion compared to the learning parameter . This message returns can be determined according to the following formula (18):

(18) (18)

Dans l’expression (18) précédente, le terme correspond au gradient de l’élément de l’échantillonnage par rapport au paramètre d’apprentissage . En particulier, le terme peut être déterminé selon le théorème de dérivation des fonctions composées à partir des gradients suivants :
- le gradient de l’élément de l’échantillonnage par rapport aux mesures de similarité ;
- le gradient des mesures de similarité par rapport aux éléments de l’espace latent d’entrainement ; et
- le gradient des éléments de l’espace latent d’entrainement par rapport au paramètre d’apprentissage .In the previous expression (18), the term corresponds to the gradient of the element of sampling compared to the learning parameter . In particular, the term can be determined according to the derivation theorem of compound functions from the following gradients:
- the gradient of the element of sampling compared to similarity measures ;
- the gradient of similarity measures with respect to the elements of the latent training space ; And
- the gradient of the elements of the latent training space compared to the learning parameter .

Dans ce cas, le gradient de l’élément de l’échantillonnage par rapport à une mesure de similarité peut être dérivé à partir de l’équation (07), et défini selon l’expression (19) suivante :In this case, the gradient of the element of sampling compared to a similarity measure can be derived from equation (07), and defined according to the following expression (19):

(19) (19)

Dans le mode de réalisation où la fonction de similarité dérivable est de typeGauss ian kernel, le gradient d’une mesure de similarité par rapport à un élément latent d’entrainement peut être dérivé à partir de l’équation (10), et défini l’expression (20) suivante :In the embodiment where the derivable similarity function is of the Gaussian kernel type , the gradient of a similarity measure in relation to a latent training element can be derived from equation (10), and defined the following expression (20):

(20) (20)

Alternativement, dans le mode de réalisation où la fonction de similarité dérivable est de type Student-t kernel, le gradient d’une mesure de similarité par rapport à un élément latent d’entrainement peut être dérivé à partir de l’équation (11), et défini l’expression (21) suivante :Alternatively, in the embodiment where the derivable similarity function is of the Student-t kernel type, the gradient of a similarity measure in relation to a latent training element can be derived from equation (11), and defined the following expression (21):

(21) (21)

Le bloc de rétropropagation 148 est par conséquent configuré pour déterminer messages retours associés à chaque individu d’entrainement , en fonction de chacun des vecteurs de comportement de voisinage et des mesures de similarité .The backpropagation block 148 is therefore configured to determine feedback messages associated with each training individual , depending on each of the neighborhood behavior vectors and similarity measures .

Le bloc de rétropropagation 148 est également configuré pour mettre à jour à chaque itération les paramètres d’apprentissage en fonction de la rétropropagation des gradients.The backpropagation block 148 is also configured to update the learning parameters at each iteration depending on the backpropagation of the gradients.

En particulier, lors d’une itération , la mise à jour des paramètres d’apprentissage, alors noté pour cette itération et pour l’itération suivante, peut être mise en œuvre en prenant en compte une valeur de paramètre d’apprentissage donnée (oulearning rateselon l’expression anglo-saxonne correspondante), et l’écart d’apprentissage déterminé en fonction de la rétropropagation des gradients. Ainsi, la mise à jour des paramètres d’apprentissage peut être définie selon l’équation (22) suivante :In particular, during an iteration , updating the learning parameters, then noted for this iteration and for iteration following, can be implemented by taking into account a learning parameter value data (or learning rate according to the corresponding Anglo-Saxon expression), and the learning gap determined based on the backpropagation of the gradients. Thus, the updating of the learning parameters can be defined according to the following equation (22):

(22) (22)

En particulier, l’écart d’apprentissage peut être déterminé à partir des messages retour , selon l’équation (23) suivante :In particular, the learning gap can be determined from return messages , according to the following equation (23):

(23) (23)

Avantageusement, le bloc de rétropropagation 148 peut également être configuré pour mettre à jour à chaque itération d’un paramètre de similarité en fonction de la rétropropagation des gradients.Advantageously, the backpropagation block 148 can also be configured to update a similarity parameter at each iteration. depending on the backpropagation of the gradients.

Dans des modes de réalisation, la mise à jour du paramètre de similarité est effectuée à partir du gradient de la fonction d’erreur par rapport au paramètre de similarité . Par exemple et sans limitation, ce gradient peut être déterminé à partir des différences entre les vecteurs de prédiction de comportement et les vecteurs de comportement de voisinage , des différences entre les vecteurs de prédiction de comportement et les vecteurs de comportement d’entrainement , et les mesures de similarité .In embodiments, updating the similarity parameter is carried out from the gradient of the error function with respect to the similarity parameter . For example and without limitation, this gradient can be determined from the differences between the behavior prediction vectors and neighborhood behavior vectors , differences between behavior prediction vectors and training behavior vectors , and similarity measures .

Le module d’application de l’apprentissage 16 du système de détermination de recommandations 10 est utilisé pour la prédiction d’information. Le module d’application 16, également appelé module de généralisation, est configuré pour appliquer un algorithme de prédiction par filtrage collaboratif à un ensemble de caractéristiques d’entrée noté , associé à un individu d’entrée (ou ‘individu cible’, ‘utilisateur cible’ ou ‘client ciblé’) et issu de la source de données 12. Le module d’application 16 est en outre être configuré pour estimer, en utilisant le paramètre d’apprentissage mis à jour par la pluralité d’itérations du modèle d’apprentissage, un vecteur de comportement estimé , associé à un individu d’entrée , à partir de l’application de l’algorithme de prédiction. En d’autres termes, le module d’application 16 est configuré pour recommander un ou plusieurs produits à un individu ciblé , à partir des probabilités qu’a l’individu ciblé d’acheter les produits .The learning application module 16 of the recommendation determination system 10 is used for information prediction. The application module 16, also called generalization module, is configured to apply a prediction algorithm by collaborative filtering to a set of input characteristics noted , associated with an input individual (or 'target individual', 'target user' or 'targeted customer') and from the data source 12. The application module 16 is further configured to estimate, using the learning parameter updated by the plurality of iterations of the learning model, an estimated behavior vector , associated with an input individual , from the application of the prediction algorithm. In other words, the application module 16 is configured to recommend one or more products to a targeted individual , from the probabilities what does the targeted individual have? to buy the products .

La représente schématiquement des blocs fonctionnels associés à l’algorithme de prédiction appliqué, selon certains modes de réalisation de l’invention.There schematically represents functional blocks associated with the prediction algorithm applied, according to certain embodiments of the invention.

L’algorithme de prédiction peut comprendre un bloc d’initialisation applicatif 162 configuré pour générer un sous-ensemble de données de voisinage, dit alors applicatif et noté . En particulier, l’ensemble données de voisinage applicatif et issu de l’ensemble de données initial et comprend une pluralité d’individus voisins appliqués . Chaque individu voisin appliqué est alors associé à un ensemble de caractéristiques de voisinage applicatif et un vecteur de comportement de voisinage applicatif . La génération du sous-ensemble de données de voisinage applicatif peut être mise en œuvre via l’application de la fonction de voisinage appliquée à l’ensemble de caractéristiques d’entrée par rapport à l’ensemble de données initial . Le sous-ensemble de données de voisinage applicatif peut alors être défini par l’expression (24) suivante :The prediction algorithm may include an application initialization block 162 configured to generate a subset of neighborhood data, then called application and denoted . In particular, the application neighborhood data set and from the initial dataset and includes a plurality of neighboring individuals applied . Each neighboring individual applied is then associated with a set of application neighborhood characteristics and an application neighborhood behavior vector . Generation of the application neighborhood data subset can be implemented via application of the neighborhood function applied to the input feature set compared to the initial dataset . The application neighborhood dataset can then be defined by the following expression (24):

(24) (24)

Avantageusement, une fois le modèle appris, la notion de valeur de seuil de distance peut être écartée et tous les individus initiaux de l’ensemble de donnée peuvent initial être utilisés pour prédire l’individu d’entrée , tel que .Advantageously, once the model has been learned, the notion of distance threshold value can be discarded and all the initial individuals of the dataset can initial be used to predict the input individual , such as .

L’algorithme de prédiction peut également comprendre un bloc d’inférence applicatif 164 configuré pour appliquer la fonction de mapping à l’ensemble de caractéristiques d’entrée de manière à déterminer un élément latent dit élément latent estimé . Le bloc d’inférence applicatif 164 est également configuré pour appliquer la fonction de mapping aux ensembles de caractéristiques de voisinage de manière à déterminer un espace latent dit espace latent de voisinage applicatif alors noté . L’espace latent de voisinage applicatif est un tenseur comprenant des éléments notés .The prediction algorithm can also include an application inference block 164 configured to apply the mapping function to the input feature set so as to determine a latent element called estimated latent element . The application inference block 164 is also configured to apply the mapping function to neighborhood feature sets so as to determine a latent space called application neighborhood latent space then noted . The latent application neighborhood space is a tensor comprising elements denoted .

Le bloc d’inférence applicatif 164 peut être ainsi également configuré pour déterminer le vecteur de comportement estimé , à partir des vecteurs de comportement de voisinage applicatif et de mesures de similarité notée alors définies entre l’individu d’entrée et chacun des individus voisins appliqués et décrits en équation (10) ou (11). Le vecteur de comportement estimé peut être ainsi défini par l’expression (25) suivante :The application inference block 164 can thus also be configured to determine the estimated behavior vector , from application neighborhood behavior vectors and of similarity measures noted then defined between the input individual and each of the neighboring individuals applied and described in equation (10) or (11). The estimated behavior vector can be thus defined by the following expression (25):

(25) (25)

Le vecteur de comportement estimé comprend les éléments . Chaque correspond à une probabilité qu’a l’individu ciblé d’acheter les produits . Le système 10 peut ainsi être configuré pour trier et/ou sélectionner et/ou déterminer un ou plusieurs éléments de manière à déterminer un ou plusieurs produits à recommander à l’individu ciblé . Par exemple et sans limitations, le système 10 peut être configuré pour sélectionner l’élément le plus proche de 1 et pour déterminer le produit associé pour le recommander à l’individu ciblé .The estimated behavior vector includes the elements . Each corresponds to a probability that the targeted individual has to buy the products . The system 10 can thus be configured to sort and/or select and/or determine one or more elements so as to determine one or more products to recommend to the targeted individual . For example and without limitations, the system 10 can be configured to select the element closest to 1 and to determine the product associated to recommend it to the targeted individual .

En particulier, le système 10 peut être implémenté sous la forme d’un ou plusieurs dispositifs ou systèmes informatiques (appelé ci-après ordinateur). Comme représenté sur la , l’ordinateur (ou système) 10 peut comporter également un module 10-i interface d'entrée/sortie (I/O) (également appelé IHM ou Interface Homme-Machine). Le module 10-i peut comprendre un dispositif d’affichage comprenant un telle interface homme-machine et des moyens de saisie de données pour permettre une entrée de données dans l’interface graphique (par exemple une dispositif de pointage, des microphones de commande vocale, clavier alphanumérique, écran vidéo, écran tactile, des pavés numériques, des boutons poussoirs, des boutons de commande, etc.).In particular, the system 10 can be implemented in the form of one or more computer devices or systems (hereinafter called computer). As shown on the , the computer (or system) 10 may also include an input/output (I/O) interface module 10-i (also called HMI or Human-Machine Interface). The module 10-i may include a display device comprising such a human-machine interface and data entry means to enable data entry into the graphical interface (e.g. a pointing device, voice control microphones , alphanumeric keyboard, video screen, touch screen, numeric keypads, push buttons, control buttons, etc.).

L'ordinateur 10 peut également être couplé de manière fonctionnelle à une ou plusieurs ressources externes 20 via un réseau 30. Des ressources externes 20 peuvent inclure, mais sans y être limitées, des serveurs, des bases de données, des dispositifs de stockage de masse, des dispositifs périphériques, des services de réseau à base de nuage, ou toute autre ressource informatique appropriée qui peut être utilisée par l'ordinateur 10.The computer 10 may also be operably coupled to one or more external resources 20 via a network 30. External resources 20 may include, but are not limited to, servers, databases, mass storage devices , peripheral devices, cloud-based network services, or any other suitable computing resources that can be used by the computer 10.

Comme représenté sur la , le système de détermination de recommandations par filtrage collaboratif 10 peut comprendre un module 18 d’exploitation des résultats des algorithmes de l’apprentissage et des prédictions.As shown on the , the system for determining recommendations by collaborative filtering 10 may include a module 18 for exploiting the results of the learning and prediction algorithms.

Dans des modes de réalisation, le module d’exploitation 18 peut être configuré pour générer une représentation d’un espace latent déterminé, par exemple l’espace latent d’entrainement , en fonction d’une valeur de dimensionnalité donnée, notée K et correspondant à la dimensionnalité de l’espace latent.In embodiments, the operating module 18 can be configured to generate a representation of a determined latent space, for example the latent training space , according to a given dimensionality value, denoted K and corresponding to the dimensionality of the latent space.

Par exemple et sans limitation, la valeur de dimensionnalité peut être une valeur de dimensionnalité d’affichage définie pour générer un rendu (ou un affichage) de la représentation sur une interface graphique. La valeur de dimensionnalité d’affichage peut être par exemple fixée à 2, par un utilisateur du système 10.For example and without limitation, the dimensionality value may be a display dimensionality value defined to generate a rendering (or display) of the representation on a graphical interface. The display dimensionality value can for example be set to 2, by a user of the system 10.

La figure 4 illustre une représentation (projection ou carte de projection) d’un espace latent d’entrainement dans un plan (pour K=2). Dans l’exemple représenté sur la figure 4, l’espace latent d’entrainement est issu de la fonction de mapping appliqué au cours d’une des itérations de l’algorithme d’apprentissage de données issues d’historiques d’achats de produits (par exemple des produits bancaires) par des individus. La génération de la représentation sur l’interface graphique peut comprendre la détermination de 2 hyper-paramètres, représentant les deux axes sur la .Figure 4 illustrates a representation (projection or projection map) of a latent training space in a plane (for K=2). In the example shown in Figure 4, the latent training space comes from the mapping function applied during one of the iterations of the algorithm for learning data from product purchase histories (for example banking products) by individuals. The generation of the representation on the graphical interface can include the determination of 2 hyper-parameters, representing the two axes on the .

Selon des modes de réalisation, la mise en œuvre de la représentation visuelle peut également comprendre l’application de différentes couleurs aux points d’affichages en fonction de différentes données des ensembles de caractéristiques associés aux individus. Par exemple et sans limitation, une représentation visuelle peut être mise en œuvre de manière à observer le regroupement graphique de certaines données pour des applications dites de « clustering ». C’est le cas de l’exemple représenté sur la où la coloration des points d’affichage est mise en œuvre en fonction des profils de consommation de produits des individus, c’est-à-dire des combinaisons de produits que les individus ont achetées dans le passé (i.e. en gris les individus ayant acheté : un produit P1 sur la (a), des produits P3 et P22 sur la (b), des produits P1, P3 et P22 sur la (c)). Une représentation peut également être générée de manière à fournir un rendu d’un ou plusieurs groupements de données sur l’interface graphique, ce qui permet de détecter des profils de consommation comparables. Dans ce cas, les colorations peuvent permettant ainsi d’effectuer du profilage d’individus, par exemple en observant la distribution des caractéristiques socio-économique à l’intérieur d’un groupement, de manière à déterminer une loi de distribution des différentes combinaisons de produits d’achat en fonction de certaines caractéristiques multivariées. La coloration des points d’affichage peut également être mise en œuvre en fonction d’une caractéristique associée aux individus, tel que par exemple en rouge les ‘nouveaux clients’ sur la (d). L’association des représentations visuelles (i.e. (c) et (d) quasiment identique), permet dans ce cas de détecter commodément que les ‘nouveaux clients’ achètent exclusivement les produits P1, P3 et P22. L’observation de la distribution des caractéristiques socio-économique à l’intérieur des deux groupements ‘nouveaux clients’ et ‘produits P1, P3 et P22’, permet alors de déterminer une loi de distribution de caractéristiques multivariées communes entre individus en fonction de combinaisons de produits d’achat.According to embodiments, the implementation of the visual representation may also include the application of different colors to the display points based on different data from the sets of characteristics associated with the individuals. For example and without limitation, a visual representation can be implemented in order to observe the graphical grouping of certain data for so-called “clustering” applications. This is the case of the example shown on the where the coloring of the display points is implemented according to the product consumption profiles of individuals, that is to say the combinations of products that individuals have purchased in the past (ie in gray individuals who have purchased : a P1 product on the (a), products P3 and P22 on the (b), products P1, P3 and P22 on the (vs)). A representation may also be generated to provide a rendering of one or more data groupings on the graphical interface, thereby detecting comparable consumption profiles. In this case, the colorings can thus make it possible to carry out profiling of individuals, for example by observing the distribution of socio-economic characteristics within a group, so as to determine a distribution law of the different combinations of purchasing products based on certain multivariate characteristics. The coloring of the display points can also be implemented according to a characteristic associated with the individuals, such as for example in red the 'new customers' on the (d). The association of visual representations (ie (this (d) almost identical), allows in this case to conveniently detect that 'new customers' exclusively buy products P1, P3 and P22. Observation of the distribution of socio-economic characteristics within the two groupings 'new customers' and 'products P1, P3 and P22', then makes it possible to determine a law of distribution of common multivariate characteristics between individuals according to combinations of purchasing products.

Selon des modes de réalisation, la mise en œuvre de la représentation visuelle peut également comprendre l’utilisation d’un algorithme de partitionnement de données de type DBSCAN (pourDensity-Based Spatial Clustering of Applications with Noiseselon l'appellation anglaise) ou de type HDBSCAN (pourHierarchical Density-Based Spatial Clustering of Applications with Noiseselon l'appellation anglaise). Un tel type d’algorithme peut être exploité pour des applications de « clustering ».According to embodiments, the implementation of the visual representation can also include the use of a data partitioning algorithm of the DBSCAN type (for Density-Based Spatial Clustering of Applications with Noise according to the English name) or of type HDBSCAN (for Hierarchical Density-Based Spatial Clustering of Applications with Noise according to the English name). This type of algorithm can be used for “clustering” applications.

Dans certains modes de réalisation, le module d’exploitation 18 peut être configuré pour évaluer la performance de l’apprentissage des données, et en particulier pour calculer la performance en termes de nombre d’itérations, vitesse d’application de l’algorithme d’apprentissage, ou encore de conformité des résultats par rapport à un ensemble de données de tests (c’est-à-dire application de l’algorithme de prédiction et vérification d’obtention de valeurs de coefficient de perte minimale ou de gain maximal).In certain embodiments, the operating module 18 can be configured to evaluate the performance of data learning, and in particular to calculate the performance in terms of number of iterations, speed of application of the algorithm d learning, or even conformity of the results with respect to a set of test data (i.e. application of the prediction algorithm and verification of obtaining minimum loss or maximum gain coefficient values) .

La est un organigramme décrivant un procédé d’apprentissage de données 500, mis en œuvre par un dispositif d’apprentissage de données 14 par filtrage collaboratif, selon des modes de réalisation de l’invention.There is a flowchart describing a data learning method 500, implemented by a data learning device 14 by collaborative filtering, according to embodiments of the invention.

A l’étape 510, l’ensemble de données initial comprenant des individus initiaux est reçu. A l’étape 520, les différents paramètres de mis en œuvre des itérations d’un algorithme d’apprentissage (ou fonction d’apprentissage) sont reçus. Ces paramètres peuvent comprendre par exemple un nombre d’itérations , les différents paramètres et éléments de la fonction de mapping , la valeur de seuil de distance , ou encore le taux d’apprentissage .In step 510, the initial data set including initial individuals is received. In step 520, the different parameters for implementing the iterations of a learning algorithm (or learning function) are received. These parameters may include, for example, a number of iterations , the different parameters and elements of the mapping function , the distance threshold value , or the learning rate .

A l’étape 532, les individus d’entrainement sont échantillonnés de l’ensemble de données initial pour générer l’ensemble d’entrainement de données . A l’étape 534, les individus voisins sont issus de l’ensemble de données initial à partir de mesures de distance entre individus d’entrainement et individus initiaux , pour générer l’ensemble de données de voisinage .In step 532, the training individuals are sampled from the initial dataset to generate the training set of data . At step 534, the neighboring individuals are from the initial dataset from distance measurements between training individuals and initial individuals , to generate the neighborhood dataset .

A l’étape 550, la fonction de mapping est appliquée en parallèle aux ensembles caractéristiques des individus d’entrainement et des individus voisins pour générer respectivement l’espace latent d’entrainement et l’espace latent de voisinage .At step 550, the mapping function is applied in parallel to the characteristic sets of the training individuals and neighboring individuals to respectively generate the latent training space and the latent neighborhood space .

A l’étape 570, les vecteurs de prédiction de comportement associés aux individus d’entrainement sont déterminés, en particulier à partir des vecteurs de comportement de voisinage et de mesures de similarité définies entre un individu d’entrainement et chacun des individus voisins . Chaque mesure de similarité est déterminée au préalable à l’étape 560 en fonction d’un paramètre de similarité et des espaces latents d’entrainement et de voisinage .In step 570, the behavior prediction vectors associated with training individuals are determined, in particular from the neighborhood behavior vectors and of similarity measures defined between a training individual and each of the neighboring individuals . Each similarity measure is determined beforehand in step 560 according to a similarity parameter and latent training spaces and neighborhood .

A l’étape 580, un « message retour » de la phase de rétropropagation par gradients est défini pour chaque individu d’entrainement . En particulier, ce message dépend du gradient des mesures de similarité par rapport aux éléments de l’espace latent d’entrainement .In step 580, a “return message” from the gradient backpropagation phase is defined for each training individual. . In particular, this message depends on the gradient of the similarity measures with respect to the elements of the latent training space .

A l’étape 590, la phase de rétropropagation par gradients est appliquée et le paramètre d’apprentissage est mis à jour.In step 590, the gradient backpropagation phase is applied and the learning parameter is updated.

La est un organigramme décrivant un procédé de détermination de recommandations de données 600 par filtrage collaboratif, selon des modes de réalisation de l’invention. Le procédé peut être mis en œuvre par le système 10.There is a flowchart describing a method of determining data recommendations 600 by collaborative filtering, according to embodiments of the invention. The method can be implemented by the system 10.

A l’étape 500, une pluralité de fonctions d’apprentissage est appliquée à l’ensemble de données initial (étape de mise en œuvre de la fonction d’apprentissage de données).In step 500, a plurality of learning functions are applied to the initial data set (data learning function implementation step).

A l’étape 610, l’ensemble des caractéristiques d’un individu cible est reçu.In step 610, all of the characteristics of a target individual is received.

A l’étape 630, les individus voisins appliqués sont échantillonnés à partir de l’ensemble de données initial à partir de mesures de distance entre l’individu cible et les individus initiaux , pour générer l’ensemble de données de voisinage applicatif .In step 630, the neighboring individuals applied are sampled from the initial dataset from distance measurements between the target individual and the initial individuals , to generate the application neighborhood dataset .

A l’étape 650, la fonction de mapping est appliquée en parallèle aux ensembles caractéristiques de l’individu cible et des individus voisins appliqués pour générer respectivement l’élément latent estimé et l’espace latent de voisinage applicatif .At step 650, the mapping function is applied in parallel to the characteristic sets of the target individual and neighboring individuals applied to generate respectively the estimated latent element and the latent application neighborhood space .

A l’étape 670, le vecteur de comportement estimé associés à l’individu cible est déterminé, en particulier à partir des vecteurs de comportement de voisinage applicatif et de mesures de similarité définies entre l’individu d’entrée et chacun des individus voisins appliqués . Chaque mesure de similarité est déterminée au préalable à l’étape 660 en fonction d’un paramètre de similarité et de l’élément latent estimé et de l’espace latent de voisinage applicatif .In step 670, the estimated behavior vector associated with the target individual is determined, in particular from the application neighborhood behavior vectors and of similarity measures defined between the input individual and each of the neighboring individuals applied . Each similarity measure is determined beforehand in step 660 according to a similarity parameter and the estimated latent element and the latent application neighborhood space .

A l’étape 680, un ou plusieurs éléments du vecteur de comportement estimé sont triés et/ou sélectionnés et/ou déterminés, ce qui fournit à l’étape 690, un ou plusieurs produits à recommander à l’individu ciblé .In step 680, one or more elements of the estimated behavior vector are sorted and/or selected and/or determined, which provides in step 690, one or more products to recommend to the targeted individual .

L'homme du métier comprendra aisément que certaines étapes peuvent être réalisées de manière simultanée et/ou selon un ordre différent, par exemple selon un ordre défini par le système de détermination de recommandations 10 et/ou le dispositif d’apprentissage de données 14.Those skilled in the art will easily understand that certain steps can be carried out simultaneously and/or in a different order, for example in an order defined by the recommendation determination system 10 and/or the data learning device 14.

L’invention peut être mise en œuvre dans de nombreuses applications et dans des domaines techniques variés. Le système de détermination de recommandations 10 et/ou le dispositif d’apprentissage de données 14 et les procédés selon les modes de réalisation de l’invention peuvent être utilisés par exemple dans des systèmes industriels (systèmes de cybersécurité, processus de fabrication par exemple) ou militaire, dans le domaine de la publicité, de l’audio-visuel, dans le domaine bancaire ou fiscal, dans des dispositifs d’imagerie, de diagnostic médical ou d'aide au diagnostic médical. L’invention s’applique plus généralement à tout domaine pour lequel il existe un besoin de recommandation, de prédiction et/ou d’apprentissage de données.The invention can be implemented in numerous applications and in various technical fields. The recommendation determination system 10 and/or the data learning device 14 and the methods according to the embodiments of the invention can be used for example in industrial systems (cybersecurity systems, manufacturing processes for example) or military, in the field of advertising, audio-visual, in the banking or tax field, in imaging devices, medical diagnosis or aid to medical diagnosis. The invention applies more generally to any field for which there is a need for recommendation, prediction and/or data learning.

L’homme du métier comprendra que l’invention peut être mise en œuvre en tant que programme d’ordinateur comportant des instructions pour son exécution. Le programme d’ordinateur peut être enregistré sur un support d’enregistrement lisible par un processeur. La référence à un programme d'ordinateur qui, lorsqu'il est exécuté, effectue l'une quelconque des fonctions décrites précédemment, ne se limite pas à un programme d'application s'exécutant sur un ordinateur hôte unique. Au contraire, les termes programme d'ordinateur et logiciel sont utilisés ici dans un sens général pour faire référence à tout type de code informatique (par exemple, un logiciel d'application, un micro logiciel, un microcode, ou toute autre forme d'instruction d'ordinateur) qui peut être utilisé pour programmer un ou plusieurs processeurs pour mettre en œuvre des aspects des techniques décrites ici. Les moyens ou ressources informatiques peuvent notamment être distribués ("Cloud computing"), éventuellement selon des technologies de pair-à-pair.Those skilled in the art will understand that the invention can be implemented as a computer program comprising instructions for its execution. The computer program can be recorded on a processor-readable recording medium. Reference to a computer program which, when executed, performs any of the functions described above, is not limited to an application program running on a single host computer. Rather, the terms computer program and software are used here in a general sense to refer to any type of computer code (e.g., application software, firmware, microcode, or any other form of computer instruction) which can be used to program one or more processors to implement aspects of the techniques described herein. The IT means or resources may in particular be distributed ("Cloud computing"), possibly using peer-to-peer technologies.

Le code logiciel peut être exécuté sur n'importe quel processeur approprié (par exemple, un microprocesseur) ou cœur de processeur ou un ensemble de processeurs, qu'ils soient prévus dans un dispositif de calcul unique ou répartis entre plusieurs dispositifs de calcul (par exemple tels qu’éventuellement accessibles dans l’environnement du dispositif). Le code exécutable de chaque programme permettant au dispositif programmable de mettre en œuvre les processus selon l'invention, peut être stocké, par exemple, dans le disque dur ou en mémoire morte. De manière générale, le ou les programmes pourront être chargés dans un des moyens de stockage du dispositif avant d'être exécutés. L'unité centrale peut commander et diriger l'exécution des instructions ou portions de code logiciel du ou des programmes selon l'invention, instructions qui sont stockées dans le disque dur ou dans la mémoire morte ou bien dans les autres éléments de stockage précités.The software code may be executed on any suitable processor (e.g., microprocessor) or processor core or set of processors, whether provided in a single computing device or distributed across multiple computing devices (e.g. example as possibly accessible in the environment of the device). The executable code of each program allowing the programmable device to implement the processes according to the invention can be stored, for example, in the hard disk or in read-only memory. Generally speaking, the program(s) can be loaded into one of the storage means of the device before being executed. The central unit can control and direct the execution of the instructions or portions of software code of the program(s) according to the invention, instructions which are stored in the hard disk or in the read-only memory or in the other aforementioned storage elements.

Le système de détermination de recommandations 10 et/ou le dispositif d’apprentissage de données 14 peuvent être implémentés sur une ou plusieurs unités de calcul distribuées comprenant une pluralité de cœurs physiques (par exemple 16 cœurs). Les modes de réalisation de l’invention sont particulièrement adaptés à une telle implémentation distribuée ainsi qu’à un portage sur des technologies hardware multi-cœur, ce qui permet d’obtenir une réponse rapide à une requête utilisateur quelle que soit la taille du modèle d’apprentissage ou de prédiction considéré. Le système de détermination de recommandations 10 et/ou le dispositif d’apprentissage de données 14 peuvent être est implémentés dans différents langages, tels que le langage C++.The recommendation determination system 10 and/or the data learning device 14 can be implemented on one or more distributed computing units comprising a plurality of physical cores (for example 16 cores). The embodiments of the invention are particularly suitable for such a distributed implementation as well as for porting to multi-core hardware technologies, which makes it possible to obtain a rapid response to a user request whatever the size of the model. learning or prediction considered. The recommendation determination system 10 and/or the data learning device 14 can be implemented in different languages, such as the C++ language.

L'invention n'est pas limitée aux modes de réalisation décrits ci-avant à titre d’exemple non limitatif. Elle englobe toutes les variantes de réalisation qui pourront être envisagées par l'homme du métier. En particulier, l’homme du métier comprendra que l’invention n’est pas limitée aux différents blocs fonctionnels, décrits à titre d’exemples non limitatifs.
The invention is not limited to the embodiments described above by way of non-limiting example. It encompasses all the alternative embodiments which could be envisaged by those skilled in the art. In particular, those skilled in the art will understand that the invention is not limited to the different functional blocks, described by way of non-limiting examples.

Claims

Système de détermination de recommandations de produits par filtrage collaboratif (10) comprenant un dispositif d’apprentissage de données (14) apte à recevoir un ensemble de données initial comprenant un pluralité d’individus initiaux issue d’une source de données (12), le dispositif d’apprentissage (14) étant configuré pour mettre en œuvre une pluralité d’itérations d’une fonction d’apprentissage, ladite fonction d’apprentissage comprenant :

un bloc d’initialisation (142) configuré pour échantillonner une pluralité d’individus d’entrainement issus dudit ensemble de données initial , chaque individu d’entrainement étant associé à un ensemble de caractéristiques multivariées et à un vecteur de comportement associé à une pluralité de produits , le bloc d’initialisation (142) étant en outre configuré pour déterminer une pluralité d’individus voisins issus dudit ensemble de données initial , en appliquant une fonction de voisinage à chacun des ensembles de caractéristiques multivariées desdits individus d’entrainement ;
un bloc d’inférence (144) configuré pour déterminer, pour chaque individu d’entrainement , une pluralité de probabilités associées auxdits produits et à une pluralité de mesures de similarité , lesdites mesures de similarité étant déterminées en appliquant une fonction de mapping auxdits individus d’entrainement et auxdits individus voisins , ladite fonction de mapping étant associée à des paramètres d’apprentissage ;
un bloc de rétropropagation (148) configuré pour mettre à jour lesdits paramètres d’apprentissage , en appliquant une phase de rétropropagation pour chaque individu d’entrainement , ladite mise à jour étant effectuée à partir d’au moins une comparaison desdits probabilités avec ledit vecteur de comportement associé ;

le système (10) comprenant en outre un module d’application (16) configuré pour déterminer un ou plusieurs desdits produits à recommander à un individu ciblé issu de ladite source de données (12), à partir d’une détermination de probabilités associées audit individu ciblé en utilisant les paramètres d’apprentissage mis à jour par la pluralité d’itérations de ladite fonction d’apprentissage.System for determining product recommendations by collaborative filtering (10) comprising a data learning device (14) capable of receiving an initial data set comprising a plurality of initial individuals from a data source (12), the learning device (14) being configured to implement a plurality of iterations of a learning function, said learning function comprising:

an initialization block (142) configured to sample a plurality of training individuals from said initial data set , each training individual being associated with a set of multivariate characteristics and a behavior vector associated with a plurality of products , the initialization block (142) being further configured to determine a plurality of neighboring individuals from said initial data set , by applying a neighborhood function to each of the sets of multivariate characteristics of said training individuals ;
an inference block (144) configured to determine, for each training individual , a plurality of probabilities associated with said products and to a plurality similarity measures , said similarity measures being determined by applying a mapping function to said training individuals and to said neighboring individuals , said mapping function being associated with learning parameters ;
a backpropagation block (148) configured to update said learning parameters , by applying a backpropagation phase for each training individual , said update being carried out from at least one comparison of said probabilities with said behavior vector partner ;

the system (10) further comprising an application module (16) configured to determine one or more of said products to recommend to a targeted individual from said data source (12), from a determination of probabilities associated with said targeted individual using learning parameters updated by the plurality of iterations of said learning function.

Système (10) selon la revendication 1, dans lequel ladite fonction de mapping est appliquée de manière à déterminer respectivement un espace latent d’entrainement et un espace latent de voisinage , lesdites mesures de similarité étant en outre déterminées en fonction de différences entre l’espace latent d’entrainement et l’espace latent de voisinage , dans lequel ladite phase de rétropropagation est appliquée en outre à partir du calcul du gradient des mesures de similarité par rapport à l’espace latent d’entrainement , et dans lequel ledit module d’application (16) configuré pour appliquer :

ladite fonction de voisinage, audit individu ciblé , pour générer une pluralité d’individus voisins appliqués issus dudit ensemble de données initial ;
ladite fonction de mapping , audit individu ciblé et auxdits individus voisins appliqués , pour déterminer respectivement un élément latent estimé et un espace latent de voisinage applicatif ;

lesdites probabilités étant estimées à partir de mesures de similarité déterminées en fonction de différences entre l’élément latent estimé et l’espace latent de voisinage applicatif .System (10) according to claim 1, wherein said mapping function is applied so as to respectively determine a latent training space and a latent neighborhood space , said similarity measures being further determined as a function of differences between the latent training space and the latent neighborhood space , in which said backpropagation phase is further applied from the calculation of the gradient of the similarity measures compared to the latent training space , and in which said application module (16) configured to apply:

said neighborhood function, said targeted individual , to generate a plurality of neighboring individuals applied from said initial data set ;
said mapping function , targeted individual audit and to said neighboring individuals applied , to respectively determine an estimated latent element and a latent application neighborhood space ;

said probabilities being estimated from similarity measures determined based on differences between the estimated latent element and the latent application neighborhood space .

Système (10) selon l’une des revendications précédentes, dans lequel la fonction de voisinage comprend, pour chaque individu d’entrainement :

une détermination de mesures de distance entre ledit individu d’entrainement et chacun desdits individus initiaux ;
une comparaison de chaque mesure de distance avec une valeur de seuil de distance prédéterminée ;

le dispositif d’apprentissage (14) étant configuré pour générer sous-ensembles de données de voisinage à partir de ladite comparaison tel que ;
ladite détermination de ladite pluralité d’individus voisins étant effectuée à partir d’une intersection entre les sous-ensembles de données de voisinage .System (10) according to one of the preceding claims, in which the neighborhood function comprises, for each training individual :

a determination of distance measurements between said training individual and each of said initial individuals ;
a comparison of each distance measurement with a distance threshold value predetermined;

the learning device (14) being configured to generate Neighborhood Data Subsets from said comparison such that ;
said determination of said plurality of neighboring individuals being carried out from an intersection between the neighborhood data subsets .

Système (10) selon l’une des revendications précédentes, dans lequel lesdites mesures de similarité sont déterminées à partir d’une fonction dérivable de type Gaussian kernel, ou d’une fonction dérivable de type Student-t kernel.System (10) according to one of the preceding claims, in which said similarity measures are determined from a differentiable function of Gaussian kernel type, or a differentiable function of Student-t kernel type.

Système (10) selon l’une des revendications précédentes, dans lequel ladite fonction d’apprentissage comprend en outre un bloc de comparaison (146) configuré pour déterminer et évaluer critères d’évaluation à partir desdits probabilités et desdits vecteurs de comportement d’entrainement , et dans lequel ladite phase de rétropropagation est appliquée en outre à partir du calcul du gradient desdits probabilités par rapport auxdites mesures de similarité .System (10) according to one of the preceding claims, wherein said learning function further comprises a comparison block (146) configured to determine and evaluate evaluation criteria from said probabilities and said training behavior vectors , and in which said backpropagation phase is further applied from the calculation of the gradient of said probabilities with respect to said similarity measures .

Système (10) selon la revendication 5, dans lequel lesdites critères d’évaluation sont déterminés à partir d’une fonction de coût de type entropie croisée, ledit dispositif d’apprentissage de données (14) étant configuré pour déterminer un coefficient de perte à partir desdites critères d’évaluation , pour évaluer la valeur du coefficient de perte par rapport à un critère d’évaluation minimal prédéterminé, et pour générer un point arrêt de ladite fonction d’apprentissage si la valeur du coefficient de perte est inférieure ou égale au critère d’évaluation minimal , ledit dispositif d’apprentissage de données (14) étant alors configuré pour arrêter les itérations de ladite fonction d’apprentissage en réponse de la détection du point arrêt.System (10) according to claim 5, wherein said evaluation criteria are determined from a cross-entropy cost function, said data learning device (14) being configured to determine a loss coefficient based on said evaluation criteria , to evaluate the value of the loss coefficient compared to a minimum evaluation criterion predetermined, and to generate a stopping point of said learning function if the value of the loss coefficient is less than or equal to the minimum evaluation criterion , said data learning device (14) then being configured to stop iterations of said learning function in response to detection of the stopping point.

Système (10) selon l’une des revendications 1 à 4, dans lequel le bloc d’inférence (144) est en outre configuré pour échantillonner, pour chaque individu d’entrainement , une pluralité d’éléments parmi lesdites probabilités , dans lequel ladite phase de rétropropagation est appliquée en outre à partir du calcul du gradient desdits éléments échantillonnés par rapport auxdites mesures de similarité , et dans lequel lesdites critères d’évaluation sont déterminés à partir d’une fonction d’erreur de type MAP, ledit dispositif d’apprentissage de données (14) étant configuré pour déterminer un coefficient de gain à partir desdites critères d’évaluation , pour évaluer la valeur du coefficient de gain par rapport à un critère d’évaluation maximal prédéterminé, et pour générer un point d’arrêt de ladite fonction d’apprentissage si la valeur du coefficient de gain est supérieure ou égale au critère d’évaluation maximal , ledit dispositif d’apprentissage de données (14) étant alors configuré arrêter les itérations de ladite fonction d’apprentissage en réponse de la détection du point arrêt.System (10) according to one of claims 1 to 4, wherein the inference block (144) is further configured to sample, for each training individual , a plurality of elements among said probabilities , in which said backpropagation phase is further applied from the calculation of the gradient of said elements sampled with respect to said similarity measures , and in which said evaluation criteria are determined from a MAP type error function, said data learning device (14) being configured to determine a gain coefficient based on said evaluation criteria , to evaluate the value of the gain coefficient compared to a maximum evaluation criterion predetermined, and to generate a stopping point of said learning function if the value of the gain coefficient is greater than or equal to the maximum evaluation criterion , said data learning device (14) then being configured to stop the iterations of said learning function in response to detection of the stopping point.

Système (10) selon la revendication 6, dans lequel lesdites mesures de similarité sont en outre déterminées en fonction d’un paramètre de similarité , et dans lequel le dispositif d’apprentissage de données (14) est configuré pour mettre à jour à chaque itération ledit paramètre de similarité à partir d’un gradient de la fonction d’erreur en fonction dudit paramètre de similarité .System (10) according to claim 6, wherein said similarity measures are further determined based on a similarity parameter , and in which the data learning device (14) is configured to update said similarity parameter at each iteration from a gradient of the error function as a function of said similarity parameter .

Système (10) selon l’une des revendications précédentes, dans lequel la fonction de mapping comprend une implémentation et une utilisation d’un réseau de neurones convolutif construit à partir de blocs de type convolution à une dimension.System (10) according to one of the preceding claims, in which the mapping function includes an implementation and use of a convolutional neural network constructed from one-dimensional convolution-like blocks.

Procédé de détermination de recommandations de produits par filtrage collaboratif (10) mis en œuvre par un système de détermination de recommandations (10), ledit système (10) étant apte à traiter un ensemble de données initial comprenant un pluralité d’individus initiaux issu d’une source de données (12), caractérisé en ce que le procédé comprend une pluralité d’itérations d’étapes d’apprentissage supervisé de données consistant à :

échantillonner une pluralité d’individus d’entrainement issus d’un ensemble de données initial , chaque individu d’entrainement étant associé à un ensemble de caractéristiques multivariées et à un vecteur de comportement associé à une pluralité de produits ,
déterminer une pluralité d’individus voisins issus dudit ensemble de données initial , à partir d’une fonction de voisinage appliquée à chacun des ensembles de caractéristiques multivariées desdits individus d’entrainement ;
déterminer, pour chaque individu d’entrainement , une pluralité de probabilités associées auxdits produits et de mesures de similarité , lesdites mesures de similarité étant générées en appliquant une fonction de mapping auxdits individus d’entrainement et auxdits individus voisins , ladite fonction de mapping étant associée à des paramètres d’apprentissage ;
mettre à jour lesdits paramètres d’apprentissage en appliquant une phase de rétropropagation, pour chaque individu d’entrainement , à partir d’au moins une comparaison desdits probabilités avec ledit vecteur de comportement associé ;

le procédé comprenant en outre une étape consistant à déterminer un ou plusieurs desdits produits à recommander à un individu ciblé issu de ladite source de données (12), à partir d’une détermination de probabilités associées audit individu ciblé en utilisant les paramètres d’apprentissage mis à jour par la pluralité d’itérations d’étapes d’apprentissage.Method for determining product recommendations by collaborative filtering (10) implemented by a recommendation determination system (10), said system (10) being capable of processing an initial set of data comprising a plurality of initial individuals from a data source (12), characterized in that the method comprises a plurality of iterations of supervised data learning steps consisting of:

sample a plurality of training individuals from an initial dataset , each training individual being associated with a set of multivariate characteristics and a behavior vector associated with a plurality of products ,
determine a plurality of neighboring individuals from said initial data set , from a neighborhood function applied to each of the sets of multivariate characteristics of said training individuals ;
determine, for each training individual , a plurality of probabilities associated with said products and of similarity measures , said similarity measures being generated by applying a mapping function to said training individuals and to said neighboring individuals , said mapping function being associated with learning parameters ;
update said learning parameters by applying a backpropagation phase, for each training individual , from at least one comparison of said probabilities with said behavior vector partner ;

the method further comprising a step of determining one or more of said products to recommend to a targeted individual from said data source (12), from a determination of probabilities associated with said targeted individual using learning parameters updated by the plurality of iterations of learning steps.