CN113221017B - Rough arrangement method and device and storage medium - Google Patents

Rough arrangement method and device and storage medium Download PDF

Info

Publication number
CN113221017B
CN113221017B CN202110770191.XA CN202110770191A CN113221017B CN 113221017 B CN113221017 B CN 113221017B CN 202110770191 A CN202110770191 A CN 202110770191A CN 113221017 B CN113221017 B CN 113221017B
Authority
CN
China
Prior art keywords
model
loss
student
teacher
feature data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110770191.XA
Other languages
Chinese (zh)
Other versions
CN113221017A (en
Inventor
关乔
单厚智
张耀荣
孙志鹏
喻明鹤
王鹏宇
张瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhizhe Sihai Beijing Technology Co Ltd
Original Assignee
Zhizhe Sihai Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhizhe Sihai Beijing Technology Co Ltd filed Critical Zhizhe Sihai Beijing Technology Co Ltd
Priority to CN202110770191.XA priority Critical patent/CN113221017B/en
Publication of CN113221017A publication Critical patent/CN113221017A/en
Application granted granted Critical
Publication of CN113221017B publication Critical patent/CN113221017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a rough typesetting method, a rough typesetting device and a storage medium, wherein the method comprises the following steps: based on an online request, obtaining user side characteristic data, inputting the user side characteristic data into a double-tower coarse-row model, and outputting an item embedding vector sorting result corresponding to the online request, wherein the double-tower coarse-row model is obtained by taking a fine-row model as a teacher model and taking the double-tower coarse-row model as a student model for joint training, the effect of the double-tower coarse-row model can be greatly improved under the condition of not increasing online resource consumption and time consumption, and the overall effect of a recommendation system is improved at the same time.

Description

Rough arrangement method and device and storage medium
Technical Field
The present application relates to the field of intelligent recommendation technologies, and in particular, to a method and an apparatus for coarse sorting and a storage medium.
Background
The industry recommendation system typically includes four stages, recall, rough sort, fine sort, and rearrangement, each stage acting like a funnel to sift out items of greater interest to the user from a large collection of items. The rough arrangement model mainly plays a role in filtering the recalled results, and the calculation pressure of the fine arrangement model is reduced under the condition of ensuring the effect as much as possible.
The current mainstream rough model in the industry is a deep learning model represented by a double-tower model, the expression capability of the double-tower model is very strong, and the online resource overhead and performance are good, so that the rough model is increasingly widely used at present. However, the existing double-tower coarse row model still has the following defects:
1. in the double-tower rough-row model, for the consideration of on-line performance, the user-side feature and the item-side feature are completely separated, the cross feature cannot be utilized, and the model cannot cross the user-side feature and the item-side feature, which reduces the effect of the model.
2. Although both are models related to a deep neural network, the estimated score diff of a double-tower coarse model and a later fine model (generally, a Multilayer Perceptron (MLP), or a related structure) is large, and the overall effect of the recommendation system is affected.
Disclosure of Invention
The embodiment of the application provides a coarse-row method, a coarse-row device and a storage medium, which are used for solving the defects that in the prior art, a double-tower coarse-row model cannot utilize cross characteristics, so that the effect of the model is reduced, meanwhile, the pre-estimated fraction diff of the double-tower coarse-row model and a fine-row model is large, and the overall effect of a recommendation system is reduced, can greatly improve the effect of the double-tower coarse-row model under the conditions of not increasing online resource consumption and consuming time, and simultaneously improve the overall effect of the recommendation system.
In a first aspect, an embodiment of the present application provides a rough-arranging method, including:
acquiring user side characteristic data based on the online request;
inputting the user side feature data into a double-tower coarse arrangement model, and outputting an item embedding vector ordering result corresponding to the online request;
the double-tower coarse row model is obtained by performing joint training by taking a fine row model as a teacher model and taking the double-tower coarse row model as a student model.
Optionally, according to the rough ranking method in an embodiment of the present application, the step of training by using the fine ranking model as a teacher network and using the double-tower rough ranking model as a student network includes:
simultaneously training the teacher model and the student model based on a preset training sample set, and updating parameters of the teacher model and the student model based on loss of the teacher model, loss of the student model and distillation loss;
each training sample in the training sample set comprises shared feature data and unique feature data, the shared feature data is feature data shared by training of the teacher model and the student model, and the unique feature data is feature data unique to training of the teacher model; and the weight parameter of feature data shared by the teacher model and the student model is trained, and the unique feature data of the teacher model training is the cross feature of the user side feature data and the item side feature data.
Optionally, according to the rough ranking method of an embodiment of the present application, the updating the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model, and the distillation loss includes:
updating the characteristic embedding parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss;
meanwhile, under the condition that the teacher model completes training, updating the network layer parameters of the teacher model based on the loss of the teacher model, and updating the network layer parameters of the student model based on the loss of the student model and distillation loss; otherwise, updating the network layer parameters of the teacher model and the student model respectively based on the loss of the teacher model and the loss of the student model.
Optionally, according to the rough ranking method in one embodiment of the present application, the loss of the teacher model and the loss of the student model are classified cross entropy losses of the teacher model and the student model, respectively, and the distillation loss is a mean square error output by the teacher model and the student model locations.
Optionally, according to the rough-layout method in an embodiment of the present application, the inputting the user-side feature data into a double-tower rough-layout model, and outputting an item embedding vector sorting result corresponding to the online request includes:
determining a corresponding user embedding vector based on the user side feature data;
determining a corresponding candidate item embedding vector based on the online request;
and performing similarity calculation on the user embedding vector and the candidate item embedding vector, and outputting an item embedding vector sorting result corresponding to the on-line request based on the similarity calculation result.
Optionally, according to a rough ranking method in an embodiment of the present application, the determining a corresponding candidate item embedding vector based on the online request includes:
determining a corresponding candidate item embedding vector based on the candidate set item _ id requested on the line and the version information of the double-tower coarse-row model;
the version information of the double-tower coarse row model is determined after the double-tower coarse row model is trained, and the candidate item embedding vector is calculated in advance through the trained double-tower coarse row model based on item side feature data in a preset time period.
Optionally, according to the rough ranking method of an embodiment of the present application, the updating of the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model, and the distillation loss is realized in a back propagation manner.
In a second aspect, an embodiment of the present application further provides a rough exhaust apparatus, including:
the user side characteristic data determining unit is used for acquiring user side characteristic data based on the online request;
the rough arrangement unit is used for inputting the user side feature data into a double-tower rough arrangement model and outputting item embedding vector sorting results corresponding to the online requests;
the double-tower coarse row model is obtained by performing joint training by taking a fine row model as a teacher model and taking the double-tower coarse row model as a student model.
In a third aspect, an embodiment of the present application further provides an electronic device, including a processor, a memory, and a program or instructions stored on the memory and executable on the processor, where the program or instructions, when executed by the processor, implement the steps of the coarse-sizing method according to the first aspect.
In a fourth aspect, embodiments of the present application further provide a processor-readable storage medium, on which a program or instructions are stored, and when the program or instructions are executed by a processor, the program or instructions implement the steps of the coarse-sorting method according to the first aspect described above.
The method, the device and the storage medium for the coarse-row are characterized in that user side characteristic data are obtained based on an online request, the user side characteristic data are input into a double-tower coarse-row model, and item embedding vector sorting results corresponding to the online request are output.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart of a coarse-sizing method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an overall architecture of a teacher-student network provided in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an information pushing apparatus provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In the embodiment of the present application, the term "and/or" describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a coarse-row method, a coarse-row device and a storage medium, which are used for solving the defects that in the prior art, a double-tower coarse-row model cannot utilize cross characteristics, so that the effect of the model is reduced, meanwhile, the pre-estimated fraction diff of the double-tower coarse-row model and a fine-row model is large, and the overall effect of a recommendation system is reduced, and can greatly improve the effect of the double-tower coarse-row model and simultaneously improve the overall effect of the recommendation system under the conditions of not increasing online resource consumption and consuming time.
The method and the device are based on the same application concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.
Fig. 1 is a schematic flow chart of a coarse-row method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:
step 101, obtaining user side characteristic data based on an online request.
Step 102, inputting the user side feature data into a double-tower rough arrangement model, and outputting an item embedding vector sorting result corresponding to the online request;
the double-tower coarse row model is obtained by performing joint training by taking a fine row model as a teacher model and taking the double-tower coarse row model as a student model.
Specifically, the workflow of the existing mainstream double-tower rough model comprises an offline training stage and an online estimation stage, wherein the offline training stage trains the model based on training data, after the online estimation stage obtains a characteristic value related to a user, the value is input into the model to obtain a user embedding vector, then the item embedding vector corresponding to each candidate set item _ id is obtained through candidate set item _ id and redis is accessed, and the two calculate the similarity for sequencing.
The online prediction stage of the embodiment of the application is completely consistent with an original double-tower coarse-row model, and for an online request, firstly, feature data (namely user side feature data) depended by a user side of the double-tower coarse-row model is obtained, then the user side feature data is input into the double-tower coarse-row model, and an item embedding vector sorting result corresponding to the online request is output. Based on the design, the online prediction stage of the rough arrangement method does not increase extra resource consumption and time consumption. Meanwhile, the double-tower coarse row model of the embodiment of the application uses the fine row model as the teacher model, and the double-tower coarse row model is obtained by performing joint training as the student model, so that the effect of the double-tower coarse row model is improved, the problem that the estimated score diff of the coarse row model and the later fine row model is large is solved, and the overall effect of the recommendation system is improved.
The Row method that this application embodiment provided, through based on the request on line, obtain user side characteristic data, will user side characteristic data input to the two tower Row models, output with the item embedding vector sequencing result that the request on line corresponds, wherein, the two tower Row models is as the teacher model with the smart model of arranging, with the two tower Row models carry out the joint training as the student model and obtain, can be under the condition that does not increase on-line resource consumption and consuming time, promote the effect of two tower Row models by a wide margin, promote recommendation system's whole effect simultaneously.
Based on the above embodiment, the step of training by using the refined ranking model as the teacher network and the double-tower rough ranking model as the student network includes:
simultaneously training the teacher model and the student model based on a preset training sample set, and updating parameters of the teacher model and the student model based on loss of the teacher model, loss of the student model and distillation loss;
each training sample in the training sample set comprises shared feature data and unique feature data, the shared feature data is feature data shared by training of the teacher model and the student model, and the unique feature data is feature data unique to training of the teacher model; and the weight parameter of feature data shared by the teacher model and the student model is trained, and the unique feature data of the teacher model training is the cross feature of the user side feature data and the item side feature data.
Specifically, as shown in fig. 2, an overall architecture schematic diagram of the teacher-student network provided in the embodiment of the present application is shown, where the teacher network on the left is an MLP network commonly used in a standard fine-line model, and the student network on the right is a double-tower network commonly used in a coarse-line model. Each training sample comprises shared feature data and unique feature data, the shared features are features which can be used by a tenacher model and a student model, in addition, the tenacher model can also use some unique features, the unique features are mainly some cross features which can not be used by the student model, for example, a user has a topic list feature, the item also has a topic list feature, the user can simply do a cross feature to judge whether the topics of the user and the item have match, for the double-tower model, because an online user embedding vector and an item embedding vector directly do inner products, all offline training can not directly use the cross features, otherwise, the offline and online inconsistent condition can be caused.
Utilize cross characteristic can make the better that the teacher model can train, simultaneously, this application embodiment leads to the thought of knowledge distillation in the training of two tower thick row models, through the mode with the thick row model of smart row mould type area make student model (being two tower thick row models) study better, improve the effect of two tower thick row models.
In the off-line training stage, the teacher network and the student network are trained simultaneously in the training process, and the weight parameters of the same characteristics (namely shared characteristics) of the teacher model and the student model are shared, so that the student model characteristic embedding weight parameters are better trained by the teacher model.
Because the effect of the teacher model is better than that of the student model, the embodiment of the application utilizes the output of the teacher model to assist the training of the student model, and the mean square error loss of the output of the teacher model and the student model locations is calculated by adding, and the loss is recorded as distillation loss. In addition to distillation loss, the loss function of each of the teacher and student models is the original classified cross entropy loss, and the loss (i.e. total loss) of the entire teacher-student network is calculated as follows (the loss of the original double-tower model has the term student loss):
Total Loss = Teacher loss + student loss + distil_weight * Distil Loss
Distil loss = F(logits_t, logits_s)
the method comprises the following steps that a teacher loss and a student loss are respectively original classified cross entropy losses of a teacher model and a student model, Distil loss is distillation loss, Distil _ weight is an over-parameter, the proportion of the distillation loss in the whole loss is adjusted, when the Distil _ weight value is 0, the model is changed into a multi-task model which is similar to the multi-task model and is trained jointly by sharing characteristic weight parameters shared by a bottom layer by the teacher model and the student model, and experiments show that the effect of the student model trained only in the mode of sharing the characteristic weight parameters by the teacher model and the student model is better than that of the student model trained completely and independently. F () is the mean square error calculation formula, and logits _ t and logits _ s are the logits outputs of the teacher model and the student model, respectively. And updating parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss to obtain a trained student model (namely a double-tower coarse row model).
The coarse arranging method provided by the embodiment of the application trains the teacher model and the student model simultaneously through a preset training sample set, and updates parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss, wherein the weight parameters of the feature data shared by the teacher model and the student model are shared, the unique feature data of the teacher model training is the cross feature of the user side feature data and the item side feature data, and the training effect of the double-tower coarse arranging model can be better.
Based on the above embodiment, the updating the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model, and the distillation loss includes:
updating the characteristic embedding parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss;
meanwhile, under the condition that the teacher model completes training, updating the network layer parameters of the teacher model based on the loss of the teacher model, and updating the network layer parameters of the student model based on the loss of the student model and distillation loss; otherwise, updating the network layer parameters of the teacher model and the student model respectively based on the loss of the teacher model and the loss of the student model.
Specifically, the parameters of the teacher model and the student model include a feature embedding parameter and a network layer parameter, and the training process of the models is as follows:
in the initial training stage of the model, because the teacher model is not completely trained yet, the expression capacity is relatively poor, and the method for assisting the student model training by using the teacher model may have a reverse effect, for this reason, the embodiment of the present application provides a method for adaptively adjusting the loss.
When global step exceeds the threshold, the teacher model is considered to be completely trained (i.e. training is completed), at this time, the teacher model can be adopted to guide the training of the student model, and the loss function calculation of the whole model becomes the sum of the loss of the teacher model, the loss of the student model and the distillation loss. It is noted that directly doing so would cause the teacher model network layer parameters to be affected by distillation loss, that is, the update of the teacher network layer parameters would also be directly affected by the output of the student model, so that the final effects of both the teacher model and the student model would be deteriorated. In contrast, the embodiment of the application provides a method for shielding the influence of the distillation loss gradient on the training of the teacher model, and the values output by the teacher model logits are adopted to replace the original logits when the distillation loss is calculated in the training stage, so that the distillation loss can help the training of the student model, and meanwhile, the updating of network layer parameters of the teacher model cannot be directly influenced. That is, the parameters of the teacher model are updated based on the loss of the teacher model, and the parameters of the student model are updated based on the loss of the student model and the distillation loss. Experiments have shown that the method works better and converges faster than the way distillation losses are calculated directly from the start.
For the updating of the characteristic embedding parameter, in the whole training process of the model, the characteristic embedding parameter of the teacher model and the characteristic embedding parameter of the student model are updated always based on the sum of the loss of the teacher model, the loss of the student model and the distillation loss. Thus, student model feature embedding weight parameters can be better trained by the teacher model.
According to the rough arrangement method provided by the embodiment of the application, the characteristic embedding parameters of the teacher model and the student model are updated based on the loss of the teacher model, the loss of the student model and the distillation loss; meanwhile, under the condition that the teacher model completes training, updating the network layer parameters of the teacher model based on the loss of the teacher model, and updating the network layer parameters of the student model based on the loss of the student model and distillation loss; otherwise, updating the network layer parameters of the teacher model and the student model respectively based on the loss of the teacher model and the loss of the student model, and improving the training efficiency and effect of the double-tower coarse row model to the maximum extent.
Based on the above embodiment, the loss of the teacher model and the loss of the student model are classified cross entropy losses of the teacher model and the student model, respectively, and the distillation loss is a mean square error output by the teacher model and the student model locations.
The specific contents and effects thereof have been described in the foregoing embodiments, and are not described herein again.
Based on the above embodiment, the inputting the user-side feature data into the double-tower rough model, and outputting an item embedding vector sorting result corresponding to the online request includes:
determining a corresponding user embedding vector based on the user side feature data;
determining a corresponding candidate item embedding vector based on the online request;
and performing similarity calculation on the user embedding vector and the candidate item embedding vector, and outputting an item embedding vector sorting result corresponding to the on-line request based on the similarity calculation result.
Specifically, the online prediction stage of the double-tower coarse row model in the embodiment of the application is completely consistent with that of the traditional double-tower coarse row model, and extra resource consumption and time consumption are not increased.
According to the rough typesetting method provided by the embodiment of the application, the corresponding user embedding vector is determined based on the user side characteristic data, the corresponding candidate item embedding vector is determined based on the online request, the similarity calculation is carried out on the user embedding vector and the candidate item embedding vector, the item embedding vector sorting result corresponding to the online request is output based on the similarity calculation result, and extra resource consumption and time consumption cannot be increased on the basis of ensuring the model effect.
Based on the above embodiment, the determining a corresponding candidate item embedding vector based on the online request includes:
determining a corresponding candidate item embedding vector based on the candidate set item _ id requested on the line and the version information of the double-tower coarse-row model;
the version information of the double-tower coarse row model is determined after the double-tower coarse row model is trained, and the candidate item embedding vector is calculated in advance through the trained double-tower coarse row model based on item side feature data in a preset time period.
Specifically, after the double-tower rough model is trained offline, version information of the model is generated, meanwhile, all item side feature data in a period of history are deduplicated according to item _ id, and then an item embedding vector (also containing version information) corresponding to the item _ id is obtained through calculation once through the trained student model and written into a Remote Dictionary service (Remote Dictionary Server). The student model is then saved to online hdfs (Hadoop Distributed File System) and the version information of the new model is written to redis. And accessing the redis for storing the model version information once every a period of time on the line, if the redis has updating, pulling the model from hdfs according to the information for updating, and updating the version information of the model on the line. Through this mode, can constantly update online model and item embedding vector, can match fast according to the version information of model and item embedding vector to be used for the thick row process, constantly improve the effect of two tower thick row models.
In the online prediction stage, for an online request, firstly obtaining user side feature data of a student model, then inputting the data into the student model, calculating in real time to obtain a user embedding vector, simultaneously accessing redis according to item _ id of a candidate set of the request and version information of a current model to obtain a candidate set item embedding vector corresponding to the version model, and then performing similarity calculation with the user embedding vector to perform sequencing.
The method for the Rough Row model provided by the embodiment of the application determines a corresponding candidate item embedding vector through a candidate set item _ id based on the online request and version information of the double-tower Row model, wherein the version information of the double-tower Row model is determined after the double-tower Row model completes training, the candidate item embedding vector is based on item side characteristic data of a preset time period, is obtained through calculation of the double-tower Row model completing training in advance, can continuously update the online model and the item embedding vector, and continuously improves the effect of the double-tower Row model.
Based on the above embodiment, the updating of the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss is realized in a back propagation manner.
Specifically, based on the loss of the teacher model, the loss of the student model and the distillation loss, the network layer parameters and the characteristic embedding parameters of the model can be rapidly updated in a back propagation mode, and the convergence speed of the model is increased.
According to the coarse arranging method provided by the embodiment of the application, the parameters of the teacher model and the student model are updated in a back propagation mode through the loss based on the teacher model, the loss of the student model and the distillation loss, so that the model parameters can be updated quickly, and the convergence rate of the model is increased.
Fig. 3 is a schematic structural diagram of a coarse-row apparatus provided in an embodiment of the present application, as shown in fig. 3, the apparatus includes:
a user-side feature data determining unit 301, configured to obtain user-side feature data based on an online request;
a rough arrangement unit 302, configured to input the user-side feature data into a double-tower rough arrangement model, and output an item embedding vector ordering result corresponding to the online request;
the double-tower coarse row model is obtained by performing joint training by taking a fine row model as a teacher model and taking the double-tower coarse row model as a student model.
Based on the above embodiment, the apparatus further includes:
the model training unit is used for simultaneously training the teacher model and the student model based on a preset training sample set, and updating parameters of the teacher model and the student model based on loss of the teacher model, loss of the student model and distillation loss;
each training sample in the training sample set comprises shared feature data and unique feature data, the shared feature data is feature data shared by training of the teacher model and the student model, and the unique feature data is feature data unique to training of the teacher model; and the weight parameter of feature data shared by the teacher model and the student model is trained, and the unique feature data of the teacher model training is the cross feature of the user side feature data and the item side feature data.
Based on the above embodiment, the updating the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model, and the distillation loss includes:
updating the characteristic embedding parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss;
meanwhile, under the condition that the teacher model completes training, updating the network layer parameters of the teacher model based on the loss of the teacher model, and updating the network layer parameters of the student model based on the loss of the student model and distillation loss; otherwise, updating the network layer parameters of the teacher model and the student model respectively based on the loss of the teacher model and the loss of the student model.
Based on the above embodiment, the loss of the teacher model and the loss of the student model are classified cross entropy losses of the teacher model and the student model, respectively, and the distillation loss is a mean square error output by the teacher model and the student model locations.
Based on the above embodiment, the coarse arranging unit includes:
a user embedding vector determining subunit, configured to determine, based on the user-side feature data, a corresponding user embedding vector;
an item embedding vector determining subunit, configured to determine, based on the online request, a corresponding candidate item embedding vector;
and the sorting subunit is used for carrying out similarity calculation on the user embedding vector and the candidate item embedding vector, and outputting an item embedding vector sorting result corresponding to the online request based on the similarity calculation result.
Based on the above embodiment, the item embedding vector determining subunit is specifically configured to:
determining a corresponding candidate item embedding vector based on the candidate set item _ id requested on the line and the version information of the double-tower coarse-row model;
the version information of the double-tower coarse row model is determined after the double-tower coarse row model is trained, and the candidate item embedding vector is calculated in advance through the trained double-tower coarse row model based on item side feature data in a preset time period.
Based on the above embodiment, the updating of the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model and the distillation loss is realized in a back propagation manner.
It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
Fig. 4 illustrates a schematic physical structure diagram of an electronic device, and as shown in fig. 4, the electronic device may include: a processor (processor)401, a communication Interface (communication Interface)402, a memory (memory)403 and a communication bus 404, wherein the processor 401, the communication Interface 402 and the memory 403 complete communication with each other through the communication bus 404. Processor 401 may call logic instructions in memory 403 to perform the squash method provided by the various embodiments described above.
In addition, the logic instructions in the memory 403 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that, the apparatus provided in the embodiment of the present invention can implement all the method steps implemented by the method embodiment and achieve the same technical effect, and detailed descriptions of the same parts and beneficial effects as the method embodiment in this embodiment are omitted here.
On the other hand, an embodiment of the present application further provides a processor-readable storage medium, where the processor-readable storage medium stores a computer program, where the computer program is configured to cause the processor to execute the method provided in each of the above embodiments, and the method includes:
acquiring user side characteristic data based on the online request;
inputting the user side feature data into a double-tower coarse arrangement model, and outputting an item embedding vector ordering result corresponding to the online request;
the double-tower coarse row model is obtained by performing joint training by taking a fine row model as a teacher model and taking the double-tower coarse row model as a student model.
The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer-executable instructions. These computer-executable instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These processor-executable instructions may also be stored in a processor-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the processor-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These processor-executable instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (9)

1. A method of fat sheeting, comprising:
acquiring user side characteristic data based on the online request;
inputting the user side characteristic data into a double-tower rough arrangement model, and outputting an article embedding vector ordering result corresponding to the online request;
the double-tower coarse-row model is obtained by performing combined training by taking a fine-row model as a teacher model and taking the double-tower coarse-row model as a student model;
the steps of training by taking the fine model as a teacher network and taking the double-tower coarse model as a student network comprise:
simultaneously training the teacher model and the student models based on a preset training sample set, and updating parameters of the teacher model and the student models based on loss of the teacher model, loss of the student models and distillation loss;
each training sample in the training sample set comprises shared feature data and unique feature data, the shared feature data is feature data shared by the teacher model and the student model, and the unique feature data is feature data unique to the teacher model; the teacher model and the student model share weight parameters of feature data shared by training, and the feature data unique to the teacher model training is the cross feature of the user side feature data and the article side feature data.
2. The fat line method of claim 1, wherein updating the parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model, and the distillation loss comprises:
updating feature embedding parameters of the teacher model and the student model based on the loss of the teacher model, the loss of the student model, and the distillation loss;
meanwhile, under the condition that the teacher model completes training, updating network layer parameters of the teacher model based on the loss of the teacher model, and updating network layer parameters of the student model based on the loss of the student model and distillation loss; otherwise, updating the network layer parameters of the teacher model and the student model based on the loss of the teacher model and the loss of the student model respectively.
3. The fat line method of claim 2, wherein the teacher model loss and the student model loss are class cross entropy losses of the teacher model and the student model, respectively, and the distillation loss is a mean square error of the teacher model and the student model locations output.
4. The method for roughly arranging the items according to claim 1, wherein the inputting the user-side feature data into a double-tower roughly arranging model and outputting the item embedding vector ordering result corresponding to the online request comprises:
determining a corresponding user embedding vector based on the user side feature data;
determining a corresponding candidate item embedding vector based on the online request;
and performing similarity calculation on the user embedded vector and the candidate article embedded vector, and outputting an article embedded vector sorting result corresponding to the on-line request based on the similarity calculation result.
5. The method of claim 4, wherein determining a corresponding candidate item embedding vector based on the online request comprises:
determining a corresponding candidate item embedding vector based on the candidate set item id requested on the line and the version information of the double-tower rough-row model;
the version information of the double-tower coarse row model is determined after the double-tower coarse row model is trained, and the candidate article embedding vector is calculated in advance through the trained double-tower coarse row model based on article side characteristic data in a preset time period.
6. The fat line method of claim 1, wherein the updating of the parameters of the teacher model and the student model based on the teacher model loss, the student model loss, and the distillation loss is performed by back-propagation.
7. A rough-arranging apparatus, comprising:
the user side characteristic data determining unit is used for acquiring user side characteristic data based on the online request;
the rough arrangement unit is used for inputting the user side characteristic data into a double-tower rough arrangement model and outputting an article embedding vector ordering result corresponding to the online request;
the double-tower coarse-row model is obtained by performing combined training by taking a fine-row model as a teacher model and taking the double-tower coarse-row model as a student model;
the device further comprises:
the model training unit is used for simultaneously training the teacher model and the student models based on a preset training sample set, and updating parameters of the teacher model and the student models based on loss of the teacher model, loss of the student models and distillation loss;
each training sample in the training sample set comprises shared feature data and unique feature data, the shared feature data is feature data shared by the teacher model and the student model, and the unique feature data is feature data unique to the teacher model; the teacher model and the student model share weight parameters of feature data shared by training, and the feature data unique to the teacher model training is the cross feature of the user side feature data and the article side feature data.
8. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the fat method of any one of claims 1 to 6.
9. A processor-readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the fat line method of any one of claims 1 to 6.
CN202110770191.XA 2021-07-08 2021-07-08 Rough arrangement method and device and storage medium Active CN113221017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110770191.XA CN113221017B (en) 2021-07-08 2021-07-08 Rough arrangement method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110770191.XA CN113221017B (en) 2021-07-08 2021-07-08 Rough arrangement method and device and storage medium

Publications (2)

Publication Number Publication Date
CN113221017A CN113221017A (en) 2021-08-06
CN113221017B true CN113221017B (en) 2021-10-29

Family

ID=77081147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110770191.XA Active CN113221017B (en) 2021-07-08 2021-07-08 Rough arrangement method and device and storage medium

Country Status (1)

Country Link
CN (1) CN113221017B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334743A (en) * 2019-06-10 2019-10-15 浙江大学 A kind of progressive transfer learning method based on the long memory network in short-term of convolution
CN111445008A (en) * 2020-03-24 2020-07-24 暗物智能科技(广州)有限公司 Knowledge distillation-based neural network searching method and system
CN111709497A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Information processing method and device and computer readable storage medium
CN112883265A (en) * 2021-02-10 2021-06-01 北京三快在线科技有限公司 Information recommendation method and device, server and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113614738A (en) * 2019-03-22 2021-11-05 国际商业机器公司 Unification of multiple models with individual target classes using distillation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334743A (en) * 2019-06-10 2019-10-15 浙江大学 A kind of progressive transfer learning method based on the long memory network in short-term of convolution
CN111445008A (en) * 2020-03-24 2020-07-24 暗物智能科技(广州)有限公司 Knowledge distillation-based neural network searching method and system
CN111709497A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Information processing method and device and computer readable storage medium
CN112883265A (en) * 2021-02-10 2021-06-01 北京三快在线科技有限公司 Information recommendation method and device, server and computer readable storage medium

Also Published As

Publication number Publication date
CN113221017A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN109902222B (en) Recommendation method and device
US10649794B2 (en) Aggregate features for machine learning
CN110458663B (en) Vehicle recommendation method, device, equipment and storage medium
CN113361680B (en) Neural network architecture searching method, device, equipment and medium
EP4383136A2 (en) Population based training of neural networks
CN111815432B (en) Financial service risk prediction method and device
CN104182474A (en) Method for recognizing pre-churn users
CN111494964B (en) Virtual article recommendation method, model training method, device and storage medium
WO2019001359A1 (en) Data processing method and data processing apparatus
CN111597326B (en) Method and device for generating commodity description text
CN110096617B (en) Video classification method and device, electronic equipment and computer-readable storage medium
CN111369299A (en) Method, device and equipment for identification and computer readable storage medium
CN113610552B (en) User loss prediction method and device
CN111309887A (en) Method and system for training text key content extraction model
CN106897282B (en) User group classification method and device
JP6230987B2 (en) Language model creation device, language model creation method, program, and recording medium
CN113221017B (en) Rough arrangement method and device and storage medium
CN111984842B (en) Bank customer data processing method and device
US20230308360A1 (en) Methods and systems for dynamic re-clustering of nodes in computer networks using machine learning models
CN113392867A (en) Image identification method and device, computer equipment and storage medium
CN110008880A (en) A kind of model compression method and device
CN113259158B (en) Network flow prediction method and equipment, model construction and training method and device
CN113849634B (en) Method for improving interpretability of depth model recommendation scheme
CN109299321B (en) Method and device for recommending songs
CN114092162A (en) Recommendation quality determination method, and training method and device of recommendation quality determination model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant