CN115048560A

CN115048560A - Data processing method and related device

Info

Publication number: CN115048560A
Application number: CN202210326504.7A
Authority: CN
Inventors: 赖金财; 曹泽麟; 董振华; 徐君; 何秀强
Original assignee: Huawei Technologies Co Ltd; Renmin University of China
Current assignee: Huawei Technologies Co Ltd; Renmin University of China
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-09-13
Also published as: WO2023185925A1

Abstract

The application discloses a data processing method which can be applied to the field of artificial intelligence and comprises the following steps: acquiring an operation log, wherein the operation log comprises first operation data of a user in a first recommendation scene; according to the first operation data, a first feature representation and a second feature representation are obtained through a first feature extraction network and a second feature extraction network respectively; and according to the first characteristic representation, acquiring first tendency information and second tendency information through the task network, and fusing the first tendency information and the second tendency information through the first weight and the second weight of the first gating network to acquire first target tendency information. According to the method and the device, the interference among different recommended scenes can be reduced, and the problem that the prediction accuracy is reduced due to the fact that a single-task model is affected by different data in distribution under the condition of multi-scene combined modeling is solved.

Description

Data processing method and related device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a data processing method and related apparatus.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In the recommendation scenario, the system records the interaction information of the user and the system, such as an operation log, and trains the most core recommendation model (e.g., ranking model) in the search system by using the interaction information as a data source. The user operation log has the characteristics of large data volume, strong timeliness and the like. However, the operation of the user is often biased, the user tends to operate the object ranked at the top, the recommendation model is trained only according to the operation log of the user, and the recommendation model cannot accurately learn the real correlation between the real intention of the user and the recommendation object. The important reason for this is the presence of a severe position-bias in the system. Taking a search scene as an example, a user enters a search system, a search box inputs a query word, and the system immediately feeds back a query result and presents the query result on a user interaction interface. When the user is at different positions of the displayed objects, the attention of the user is different, which leads to position bias, that is, the user tends to interact with objects with better positions in the search result list, and the user's tendency is irrelevant to whether the objects can reflect the real intention of the user.

Due to the existence of the position bias, the implicit feedback data of the user for training the model, namely whether to operate or not, cannot reflect the real search intention of the user. If the implicit feedback data of the user is directly used as positive and negative samples for training, the obtained recommended model has deviation and forms a Martian effect along with the continuous updating of the model, so that the model is more and more biased. In order to obtain a more accurate recommendation model, the position offset needs to be corrected in the off-line training process, so that the influence of position-bias is eliminated. The Inverse Probability Score (IPS) technique is a commonly used position offset correction technique, and performs inverse weighting on a loss function during training by estimating a position tendency score of a sample, so that the sample with a high position tendency score has a lower weight, and a position correction effect is achieved. In different recommendation scenarios, the position offset is also different, for example, in some recommendation scenarios, a plurality of objects that the user wants to operate at this time may be actively recommended, in some recommendation scenarios, a plurality of objects related to the search term may be recommended based on the search term input by the user, and in the two recommendation scenarios, the position offsets of different positions are also different.

In an existing implementation, a contextual position-based model (CPBM) is used to calculate a position bias (or referred to as a tendency information), and an existing scheme trains the same CPBM based on operation data of multiple scenes, however, in the case of multi-scene joint modeling, since the position biases of different positions in different recommended scenes are different, prediction accuracy of the tendency information is reduced.

Disclosure of Invention

In a first aspect, the present application provides a data processing method, including: the method comprises the steps of obtaining an operation log, wherein the operation log comprises first operation data of a user in a first recommendation scene, and the first operation data comprises operation data of the user when the same recommendation object or a recommendation object with similarity higher than a threshold value is located at different recommendation positions in the first recommendation scene; according to the first operation data, a first feature representation and a second feature representation are obtained through a first feature extraction network and a second feature extraction network respectively; according to the first characteristic representation, obtaining first tendency information through a first task network, wherein the first tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of a user; according to the second feature representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user; according to the first operation data, respectively obtaining a first weight of the first tendency information and a second weight of the second tendency information through a first gating network; and according to the first weight and the second weight, fusing the first tendency information and the second tendency information to obtain first target tendency information, wherein the first target tendency information is used for representing the influence of a recommended position in the first recommended scene on the operation behavior of the user, and the first target tendency information is used for training a recommendation model.

In this embodiment of the present application, the first gating network may obtain a set of weights for the first recommended scenario, and fuse outputs of the multiple task networks based on the weights to obtain a position bias (first target tendency information) of the first recommended scenario. The first gating network can identify and fuse information related to the first recommended scene in the output of each task network through numerical control of the weight, on one hand, the relevance among different recommended scenes can be learned, on the other hand, interference among different recommended scenes can be reduced based on a dynamic weight mode, and further the problem that prediction accuracy is reduced due to the fact that a single task model is affected by different data in a distributed mode under the condition of multi-scene combined modeling is solved.

In a possible implementation, the obtaining, according to the first operation data, a first feature representation and a second feature representation through a first feature extraction network and a second feature extraction network, respectively, includes: according to the first operation data, a first initial feature representation and a second initial feature representation are obtained through a first feature extraction network and a second feature extraction network respectively; according to the first operation data, respectively obtaining a third weight represented by the first initial characteristic and a fourth weight represented by the second initial characteristic through a second gating network; according to the third weight and the fourth weight, fusing the first initial feature representation and the second initial feature representation to obtain the first feature representation; according to the first operation data, respectively obtaining a fifth weight represented by the first initial characteristic and a sixth weight represented by the second initial characteristic through a third gating network; and fusing the first initial feature representation and the second initial feature representation according to the fifth weight and the sixth weight to obtain the second feature representation.

In one possible implementation, the operation log further includes second operation data of the user in a second recommendation scene, where the second operation data includes operation data of the user when the same recommendation object or a recommendation object with a similarity higher than a threshold is located at a different recommendation position in the second recommendation scene; the method further comprises the following steps: according to the second operation data, respectively obtaining a third feature representation and a fourth feature representation through the first feature extraction network and the second feature extraction network; according to the third characteristic representation, obtaining third tendency information through the first task network, wherein the third tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of the user; according to the fourth feature representation, fourth tendency information is obtained through the second task network, and the fourth tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user; according to the second operation data, respectively obtaining a seventh weight of the third tendency information and an eighth weight of the fourth tendency information through the first gating network; and according to the seventh weight and the eighth weight, fusing the third tendency information and the fourth tendency information to obtain second target tendency information, wherein the second target tendency information is used for representing the influence of the recommended position in the second recommended scene on the operation behavior of the user, and the second target tendency information is used for training a recommendation model.

In one possible implementation, the fusing includes: and (4) weighted summation.

In one possible implementation, the first task network or the second task network is a context-dependent location bias model (CPBM).

In one possible implementation, the first feature extraction network or the second feature extraction network is a network comprising a multi-layer perceptron (MLP).

In one possible implementation, the method further comprises: acquiring a first true value (group truth) of tendency information corresponding to the first operation data; and determining a first loss according to the first tendency information and the first true value, and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the first loss.

In one possible implementation, the method further comprises: acquiring a second true value of the tendency information corresponding to the second operation data; and determining a second loss according to the second tendency information and the second true value, and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the second loss.

In one possible implementation, during each iterative update, the losses obtained for each recommended scene may be fused (e.g., added), and the parameters may be updated based on the fused losses. Due to the fact that the click rate and the observation tendency score of each position are different in different scenes, the MCPBM has different convergence speeds when training is carried out by using different scene data, and the weights of different tasks are adjusted in an exponential weighting mode.

In one possible implementation, the method further comprises: acquiring a first convergence degree; adjusting the first loss according to the first convergence degree to obtain an adjusted first loss, wherein the adjusted first loss is inversely related to the first convergence degree; the updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network, and the first gating network according to the first loss includes: and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the adjusted first loss.

That is to say, for a recommendation scene with a higher convergence degree, the weight of the loss function may be set to be smaller, and for a recommendation scene with a lower convergence degree, the weight of the loss function may be set to be higher, so that the convergence progress of each recommendation scene is kept substantially consistent, and the model prediction accuracy is improved.

In one possible implementation, the method further comprises:

acquiring a second loss of the recommendation model, wherein the second loss is obtained when the recommendation model is fed forward according to the first operation data;

and adjusting the second loss according to the first tendency information to obtain an adjusted second loss, wherein the adjusted second loss is used for updating parameters of the recommendation model.

In a second aspect, the present application provides a data processing apparatus, the apparatus comprising:

the obtaining module is used for obtaining an operation log, wherein the operation log comprises first operation data of a user in a first recommendation scene, and the first operation data comprises operation data of the user when the same or a recommendation object with similarity higher than a threshold value is at different recommendation positions in the first recommendation scene;

the feature extraction module is used for respectively obtaining a first feature representation and a second feature representation through a first feature extraction network and a second feature extraction network according to the first operation data;

the tendency information calculation module is used for obtaining first tendency information through a first task network according to the first characteristic representation, wherein the first tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of the user; according to the second feature representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

a weight determining module, configured to obtain, according to the first operation data, a first weight of the first tendency information and a second weight of the second tendency information through a first gating network, respectively;

and the fusion module is used for fusing the first tendency information and the second tendency information according to the first weight and the second weight to obtain first target tendency information, wherein the first target tendency information is used for representing the influence of the recommended position in the first recommended scene on the operation behavior of the user, and the first target tendency information is used for training a recommended model.

In a possible implementation, the feature extraction module is specifically configured to:

according to the first operation data, a first initial feature representation and a second initial feature representation are obtained through a first feature extraction network and a second feature extraction network respectively;

according to the first operation data, respectively obtaining a third weight represented by the first initial characteristic and a fourth weight represented by the second initial characteristic through a second gating network;

according to the third weight and the fourth weight, fusing the first initial feature representation and the second initial feature representation to obtain the first feature representation;

according to the first operation data, respectively obtaining a fifth weight represented by the first initial characteristic and a sixth weight represented by the second initial characteristic through a third gating network;

and fusing the first initial feature representation and the second initial feature representation according to the fifth weight and the sixth weight to obtain the second feature representation.

In one possible implementation, the operation log further includes second operation data of the user in a second recommendation scene, where the second operation data includes operation data of the user when the same recommendation object or a recommendation object with a similarity higher than a threshold is located at a different recommendation position in the second recommendation scene;

the feature extraction module is specifically configured to:

according to the second operation data, respectively obtaining a third feature representation and a fourth feature representation through the first feature extraction network and the second feature extraction network;

the tendency information calculation module is specifically configured to:

according to the third feature representation, third tendency information is obtained through the first task network, and the third tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user; according to the fourth feature representation, fourth tendency information is obtained through the second task network, and the fourth tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

the weight determining module is specifically configured to: according to the second operation data, respectively obtaining a seventh weight of the third tendency information and an eighth weight of the fourth tendency information through the first gating network;

the fusion module is specifically configured to:

and according to the seventh weight and the eighth weight, fusing the third tendency information and the fourth tendency information to obtain second target tendency information, wherein the second target tendency information is used for representing the influence of the recommended position in the second recommended scene on the operation behavior of the user, and the second target tendency information is used for training a recommendation model.

In one possible implementation, the obtaining module is further configured to:

acquiring a first true value (group truth) of tendency information corresponding to the first operation data;

the device further comprises:

and the model training module is used for determining a first loss according to the first tendency information and the first true value, and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the first loss.

In one possible implementation, the obtaining module is further configured to:

acquiring a second true value of the tendency information corresponding to the second operation data;

the model training module is further configured to determine a second loss according to the second tendency information and the second true value, and perform parameter update on the first feature extraction network, the second feature extraction network, the first task network, the second task network, and the first gating network according to the second loss.

In one possible implementation, the obtaining module is further configured to:

acquiring a first convergence degree;

the device further comprises:

a loss adjusting module, configured to adjust the first loss according to the first convergence degree to obtain an adjusted first loss, where the adjusted first loss is negatively correlated with the first convergence degree;

the model training module is specifically configured to:

and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the adjusted first loss.

In one possible implementation, the obtaining module is further configured to:

the loss adjusting module is further configured to adjust the second loss according to the first tendency information to obtain an adjusted second loss, where the adjusted second loss is used to perform parameter update on the recommendation model.

In a third aspect, an embodiment of the present application provides a data processing apparatus, which may include a memory, a processor, and a bus system, where the memory is used for storing a program, and the processor is used for executing the program in the memory to perform any one of the methods described in the first aspect.

In a fourth aspect, embodiments of the present application provide a training apparatus, which may include a memory, a processor, and a bus system, where the memory is used for storing programs, and the processor is used for executing the programs in the memory to perform any one of the methods described in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the first aspect and any optional method described above.

In a sixth aspect, embodiments of the present application provide a computer program product, which includes code for implementing the first aspect and any optional method when the code is executed.

In a seventh aspect, the present application provides a chip system, which includes a processor, configured to support an execution device or a training device to implement the functions recited in the above aspects, for example, to transmit or process data recited in the above methods; or, information. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

The embodiment of the application provides a data processing method, which comprises the following steps: obtaining an operation log, wherein the operation log comprises first operation data of a user in a first recommendation scene, and the first operation data comprises operation data of the user when the same recommendation object or recommendation objects with similarity higher than a threshold value are located at different recommendation positions in the first recommendation scene; according to the first operation data, a first feature representation and a second feature representation are obtained through a first feature extraction network and a second feature extraction network respectively; according to the first characteristic representation, obtaining first tendency information through a first task network, wherein the first tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of a user; according to the second characteristic representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user; according to the first operation data, respectively obtaining a first weight of the first tendency information and a second weight of the second tendency information through a first gating network; and according to the first weight and the second weight, fusing the first tendency information and the second tendency information to obtain first target tendency information, wherein the first target tendency information is used for representing the influence of a recommended position in the first recommended scene on the operation behavior of the user, and the first target tendency information is used for training a recommendation model. In this embodiment of the application, the first gating network may obtain a set of weights for the first recommended scenario, and fuse outputs of the plurality of task networks based on the weights to obtain a position bias (first target tendency information) of the first recommended scenario. The first gating network can identify and fuse information related to the first recommended scene in the output of each task network through numerical control of the weight, on one hand, the relevance among different recommended scenes can be learned, on the other hand, interference among different recommended scenes can be reduced based on a dynamic weight mode, and further the problem that prediction accuracy is reduced due to the fact that a single task model is affected by different data in a distributed mode under the condition of multi-scene combined modeling is solved.

Drawings

FIG. 1 is a schematic structural diagram of an artificial intelligence body framework;

fig. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;

fig. 3 is a schematic diagram of an information recommendation process according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a data processing method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a model provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of an execution device according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a training apparatus provided in an embodiment of the present application;

fig. 9 is a schematic diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in each field, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the application on the ground is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, wisdom city etc..

The embodiment of the application can be applied to the field of information recommendation, and particularly can be applied to application markets, music playing recommendation, video playing recommendation, reading recommendation, news information recommendation, information recommendation in webpages and the like. The application can be applied to a recommendation system, and the recommendation system can determine a recommendation object based on the recommendation method provided by the application, where the recommendation object may be, for example and without limitation, an Application (APP), an audio/video, a webpage, news information, and other items.

In a recommendation system, information recommendation may include processes such as prediction and recommendation. What needs to be solved for prediction is to predict the preference degree of the user for each item, and the preference degree can be reflected by the probability of the user selecting the item. The recommendation may be to sort the recommendation objects according to the predicted result, for example, according to the predicted preference degree, sorting in the order of preference degrees from high to low, and recommend information to the user based on the sorted result.

For example, in a scenario of an application market, the recommendation system may recommend an application program to the user based on the result of the ranking, in a scenario of a music recommendation, the recommendation system may recommend music to the user based on the result of the ranking, and in a scenario of a video recommendation, the recommendation system may recommend a video to the user based on the result of the ranking.

Next, an application architecture of the embodiment of the present application is described.

The system architecture provided by the embodiment of the present application is described in detail below with reference to fig. 2. Fig. 2 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in FIG. 2, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data collection system 560.

The execution device 510 includes a computation module 511, an I/O interface 512, a pre-processing module 513, and a pre-processing module 514. The target model/rule 501 may be included in the calculation module 511, with the pre-processing module 513 and the pre-processing module 514 being optional.

The data acquisition device 560 is used to acquire training samples. In an embodiment of the present application, the training sample may be a historical operation record of the user, and the historical operation record may be a behavior log (or referred to as an operation log) of the user, and the operation log may include operation information of the user for an item, where the operation information may include an operation type, an identification of the user, and an identification of the item, where the item is an e-commerce product, the operation type may include, but is not limited to, clicking, purchasing, returning, joining a shopping cart, and the like, and where the item is an application program, the operation type may be, but is not limited to, clicking, downloading, and the like.

In one possible implementation, the operation log may include first operation data of the user in the first recommendation scenario, and second operation data of the user in the first recommendation scenario

The training samples may be multi-gate contextual position-based (MCPBM) models or data used for training the initialized recommended model. After the training samples are collected, the data collection device 560 stores the training samples in the database 530.

The training device 520 may train the initialized recommendation model or MCPBM based on training samples maintained in the database 530 to derive the target model/rules 501. In this embodiment of the present application, the target model/rule 501 may be a trained MCPBM, the MCPBM may obtain a location bias (for example, tendency information in this embodiment) in a certain recommended scenario based on operation data of a user on the certain recommended scenario, the target model/rule 501 may be a recommended model, and the recommended model may predict, based on operation information of the user for an article, a probability that the user performs an operation corresponding to an operation type for the article, where the probability may be used to perform information recommendation.

It should be noted that, in practical applications, the training samples maintained in the database 530 do not necessarily all come from the collection of the data collection device 560, and may also be received from other devices, or may be obtained by performing data expansion based on the data collected by the data collection device 560 (for example, the second operation type of the target user on the first item in the embodiment of the present application). It should be noted that, the training device 520 does not necessarily perform the training of the target model/rule 501 based on the training samples maintained by the database 530, and may also obtain the training samples from the cloud or other places for performing the model training, and the above description should not be taken as a limitation to the embodiments of the present application.

The target model/rule 501 obtained by training according to the training device 520 may be applied to different systems or devices, for example, the executing device 510 shown in fig. 2, where the executing device 510 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR) device, a vehicle-mounted terminal, or a server or a cloud.

In fig. 2, the execution device 510 configures an input/output (I/O) interface 512 for data interaction with an external device, and a user may input data (e.g., an operation log in the embodiment of the present application) to the I/O interface 512 through a client device 540.

The pre-processing module 513 and the pre-processing module 514 are used for pre-processing according to the input data received by the I/O interface 512. It should be understood that there may be no pre-processing module 513 and pre-processing module 514 or only one pre-processing module. When the pre-processing module 513 and the pre-processing module 514 are not present, the input data may be processed directly using the calculation module 511.

During the process of preprocessing the input data by the execution device 510 or performing the calculation and other related processes by the calculation module 511 of the execution device 510, the execution device 510 may call the data, codes and the like in the data storage system 550 for corresponding processes, or store the data, instructions and the like obtained by corresponding processes in the data storage system 550.

Finally, the I/O interface 512 presents the processing results to the client device 540 for presentation to the user.

In this embodiment of the present application, the execution device 510 may obtain a code stored in the data storage system 550 to implement the data processing method in this embodiment of the present application.

In this embodiment, the execution device 510 may include a hardware circuit (e.g., an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor, a microcontroller, or a combination of these hardware circuits), for example, the execution device 510 may be a hardware system with an instruction execution function, such as a CPU, a DSP, or a hardware system without an instruction execution function, such as an ASIC, an FPGA, or a combination of the above hardware systems without an instruction execution function and the hardware system with an instruction execution function.

Specifically, the execution device 510 may be a hardware system having a function of executing instructions, the information recommendation method provided in the embodiment of the present application may be a software code stored in the data storage system 550, and the execution device 510 may acquire the software code from the data storage system 550 and execute the acquired software code to implement the data processing method provided in the embodiment of the present application.

It should be understood that the executing device 510 may be a combination of a hardware system without a function of executing instructions and a hardware system with a function of executing instructions, and some steps of the recommended method provided by the embodiment of the present application may also be implemented by a hardware system without a function of executing instructions in the executing device 510, which is not limited herein.

In the case shown in fig. 2, the user can manually give input data, and this "manually give input data" can be operated through an interface provided by the I/O interface 512. Alternatively, the client device 540 may automatically send the input data to the I/O interface 512, and if requiring the client device 540 to automatically send the input data requires authorization from the user, the user may set the corresponding permissions in the client device 540. The user can view the results output by the execution device 510 at the client device 540, and the specific presentation form can be display, sound, action, and the like. The client device 540 may also serve as a data collection terminal, collecting input data of the input I/O interface 512 and output results of the output I/O interface 512 as new sample data, as shown, and storing the new sample data in the database 530. Of course, the input data inputted to the I/O interface 512 and the output result outputted from the I/O interface 512 as shown in the figure may be directly stored in the database 530 as new sample data by the I/O interface 512 without being collected by the client device 540.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 2, the data storage system 550 is an external memory with respect to the execution device 510, and in other cases, the data storage system 550 may be disposed in the execution device 510. It is to be appreciated that the execution device 510 described above can be deployed in the client device 540.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.

1. Click probability (click-through, CTR)

The click probability may also be referred to as a click rate, and refers to a ratio of the number of clicks to the number of exposures of recommended information (e.g., recommended articles) on a website or an application, and the click rate is generally an important indicator for measuring a recommendation system in the recommendation system.

2. Personalized recommendation system

The personalized recommendation system is a system which analyzes by using a machine learning algorithm according to historical data (such as operation information in the embodiment of the application) of a user, predicts a new request according to the analysis, and gives a personalized recommendation result.

3. Off-line training (offretiraining)

The offline training refers to a module that iteratively updates recommendation model parameters according to an algorithm learned by a recommendation machine in a personalized recommendation system according to historical data (such as operation information in the embodiment of the present application) of a user until a set requirement is met.

4. Online prediction (onlineinterference)

The online prediction means that the preference degree of a user to recommended articles in the current context environment is predicted according to the characteristics of the user, the articles and the context and the probability of selecting the recommended articles by the user is predicted based on an offline trained model.

5. Position bias (position bias): in a search/advertisement/recommendation system, an implicit feedback data of a user is utilized to model a ranking model, the user's attention is different when the user is at different positions of the displayed documents, so that position bias is caused, namely the user tends to interact with the documents with better positions in a search result list, and the position bias tendency of the user is irrelevant to whether the documents can reflect the real intention of the user.

6. Multitask learning (multi task learning): in machine learning, a plurality of tasks can be modeled and solved at the same time, and the learning efficiency and the performance index of the model in one or more tasks can be improved by researching the commonality and the difference among the plurality of tasks.

For example, fig. 3 is a schematic diagram of a recommendation system provided in an embodiment of the present application. As shown in FIG. 3, when a user enters the system, a request for recommendation is triggered, and the recommendation system inputs the request and its related information (e.g., operational information in the embodiments of the present application) into the recommendation model, and then predicts the user's selection rate of items in the system. Further, the items may be sorted in descending order according to the predicted selection rate or based on some function of the selection rate, i.e., the recommendation system may present the items in different positions in order as a result of the recommendation to the user. The user browses various located items and takes user actions such as browsing, selecting, and downloading. Meanwhile, the actual behavior of the user can be stored in a log to be used as training data, and the parameters of the recommended model are continuously updated through the offline training module, so that the prediction effect of the model is improved.

For example, a user opening an application market in a smart terminal (e.g., a cell phone) may trigger a recommendation system in the application market. The recommendation system of the application market predicts the probability of downloading each recommended candidate APP by the user according to the historical behavior log of the user, for example, the historical downloading record and the user selection record of the user, and the self characteristics of the application market, such as the environmental characteristic information of time, place and the like. According to the calculated result, the recommendation system of the application market can display the candidate APPs in a descending order according to the predicted probability value, so that the downloading probability of the candidate APPs is improved.

For example, the APP with the higher predicted user selection rate may be presented at the front recommended position, and the APP with the lower predicted user selection rate may be presented at the rear recommended position.

The recommended model may be a neural network model, and the following describes terms and concepts related to a neural network that may be involved in embodiments of the present application.

(1) Neural network

The neural network may be composed of neural units, and the neural units may refer to operation units with xs (i.e. input data) and intercept 1 as inputs, and the output of the operation units may be:

where s is 1, 2, … … n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also known as multi-layer Neural networks, can be understood as Neural networks having many hidden layers, where "many" has no particular metric. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers being fully interconnected, i.e.That is, any neuron at the i-th layer must be connected to any neuron at the i + 1-th layer. Although DNN appears complex, it is not as complex as the work of each layer, in short the following linear relational expression:

wherein,

is the input vector of the input vector,

is the output vector of the digital video signal,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

Superscript 3 represents the number of layers in which the coefficient W lies, and the subscripts correspond to the third layer index 2 at the output and the second layer index 4 at the input. The summary is that: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(3) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value by comparing the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first update, namely parameters are pre-configured for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower in prediction, and the adjustment is carried out continuously until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(4) Back propagation algorithm

The size of the parameters in the initial model can be corrected in the training process by adopting a Back Propagation (BP) algorithm, so that the error loss of the model is smaller and smaller. Specifically, an error loss occurs when an input signal is transmitted in a forward direction until an output signal is output, and parameters in an initial model are updated by back-propagating error loss information, so that the error loss converges. The back propagation algorithm is an error-loss dominated back propagation motion aimed at obtaining optimal model parameters, such as weight matrices.

Due to the existence of the position bias, the implicit feedback data of the user for training the model, namely whether to operate or not, cannot reflect the real search intention of the user. If the implicit feedback data of the user is directly used as positive and negative samples for training, the obtained recommended model has deviation and forms a Matthex effect along with continuous updating of the model, so that the model is more and more biased. In order to obtain a more accurate recommendation model, the position offset needs to be corrected in the off-line training process, so that the influence of position-bias is eliminated. The Inverse Probability Score (IPS) technique is a commonly used position offset correction technique, and performs inverse weighting on a loss function during training by estimating a position tendency score of a sample, so that the sample with a high position tendency score has a lower weight, and a position correction effect is achieved. In different recommendation scenarios, the position offset is also different, for example, in some recommendation scenarios, a plurality of objects that the user wants to operate at this time may be actively recommended, in some recommendation scenarios, a plurality of objects related to the search term may be recommended based on the search term input by the user, and in the two recommendation scenarios, the position offsets of different positions are also different.

In an existing implementation, a contextual position-based model (CPBM) based on context information is used to calculate a position bias (or referred to as bias information), and an existing solution trains the same CPBM based on operation data of multiple scenes, however, in the case of multi-scene joint modeling, since the position biases of different positions in different recommended scenes are different, prediction accuracy of the bias information is reduced.

The data processing method provided by the application can solve the problem that the prediction accuracy is reduced due to the fact that a single task model is influenced by different data in distribution under the condition of multi-scene combined modeling.

Referring to fig. 4, fig. 4 is a schematic diagram of an embodiment of a data processing method provided in an embodiment of the present application, and as shown in fig. 4, a data processing method provided in an embodiment of the present application includes:

401. the method comprises the steps of obtaining an operation log, wherein the operation log comprises first operation data of a user in a first recommendation scene, and the first operation data comprises operation data of the user when the same recommendation object or a recommendation object with similarity higher than a threshold value is located at different recommendation positions in the first recommendation scene.

In embodiments of the present application, the executing subject of step 401 may be a terminal device, which may be a portable mobile device, such as, but not limited to, a mobile or portable computing device (e.g., a smartphone), a personal computer, a server computer, a handheld device (e.g., a tablet) or laptop, a multiprocessor system, a gaming console or controller, a microprocessor-based system, a set top box, a programmable consumer electronics, a mobile phone, a mobile computing and/or communication device with a wearable or accessory form factor (e.g., a watch, glasses, a headset or earpiece), a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and so on.

In this embodiment of the application, the execution main body in step 401 may be a server on a cloud side, and the server may receive an operation log sent by the terminal device, and then the server may obtain the operation log.

For convenience of description, the following description will be made as an execution apparatus without distinguishing the form of the execution main body. In the embodiment of the application, the execution device may obtain an obtained operation log, where the operation log includes first operation data of a user in a first recommended scene, and the first operation data includes operation data of the user when the same recommended object or a recommended object whose similarity is higher than a threshold is located at different recommended positions in the first recommended scene.

In a search system, models are continuously updated in an iterative mode or different types of recommendation models exist, so that different recommendation algorithms are generated to process similar search contents to obtain data with different recommendation results.

In a possible implementation, the intervision transforming may be used to obtain data of different recommendation results, and the intervision transforming utilizes a natural intervention strategy to find the probability of a position being observed by focusing on the difference of document ranking when a user interacts with different recommendation algorithms. And collecting user click data of the same item at different positions under the same search term by utilizing the difference of different sorting algorithms, wherein the relevance of the item is the same, and the user click is only related to the position bias.

In a possible implementation, first operation data of the user in the first recommendation scene may be acquired, where the first operation data may be data of the above-described different recommendation results, that is, the first operation data includes operation data of the user when the same recommendation object or a recommendation object with a similarity higher than a threshold is located at different recommendation positions in the first recommendation scene. Through the first operation data, the position bias (i.e., tendency information) of each recommended position in the first recommended scene can be mined.

The operation data may include operation results of a plurality of objects recommended by the user for the recommendation model, where the objects may also be described as objects, and the objects may be physical objects or virtual objects, such as APP, audio/video, web pages, news information, and the like, and the attribute information of the objects may be at least one of an object name, a developer, an installation package size, a category, and a goodness degree, where, taking the objects as an application program as an example, the category of the objects may be a chat category, a cool game, an office category, and the like, and the goodness degree may be a score, a comment, and the like for the objects; the application does not limit the specific type of attribute information for the article.

The operation result may include whether to operate or an operation type of the operation, the operation type may be a behavior operation type of the user for the article, and on the network platform and the application, the user often has various interaction forms (i.e., various operation types) with the article, such as operation types of browsing, clicking, adding to a shopping cart, purchasing and the like in the e-commerce platform behavior. These various behaviors reflect the preferences of the user and are helpful for accurately characterizing the user.

In a possible implementation, the operation log further includes second operation data of the user in a second recommendation scene, where the second operation data includes operation data of the user when the same recommendation object or a recommendation object with a similarity higher than a threshold is located at a different recommendation position in the second recommendation scene.

In addition, the operation log further includes operation data of the user in the first recommendation scenario and the second recommendation scenario, which is not limited in the embodiment of the present application.

In one possible implementation, the operational data may be context information.

402. And according to the first operation data, respectively obtaining a first feature representation and a second feature representation through a first feature extraction network and a second feature extraction network.

In one possible implementation, the first feature extraction network or the second feature extraction network is a different feature extraction network. That is, the operation data may be input to a plurality of feature extraction networks, respectively, to obtain a plurality of feature representations.

In one possible implementation, the first or second feature extraction network is a network comprising a multilayer perceptron (MLP).

In a possible implementation, the obtaining, according to the first operation data, a first feature representation and a second feature representation through a first feature extraction network and a second feature extraction network respectively specifically includes: according to the first operation data, a first initial feature representation and a second initial feature representation are obtained through a first feature extraction network and a second feature extraction network respectively; according to the first operation data, respectively obtaining a third weight represented by the first initial characteristic and a fourth weight represented by the second initial characteristic through a second gating network; according to the third weight and the fourth weight, fusing the first initial feature representation and the second initial feature representation to obtain the first feature representation; according to the first operation data, a fifth weight represented by the first initial characteristic and a sixth weight represented by the second initial characteristic are obtained through a third gating network; and fusing the first initial feature representation and the second initial feature representation according to the fifth weight and the sixth weight to obtain the second feature representation.

That is, a first initial feature representation and a second initial feature representation may be obtained through a first feature extraction network and a second feature extraction network, respectively, and the first initial feature representation and the second initial feature representation may be fused to obtain the first feature representation through a third weight (corresponding to the first initial feature representation) and a fourth weight (corresponding to the second initial feature representation) obtained through a second gating network. Similarly, the third initial feature representation and the fourth initial feature representation may be fused by a fifth weight (corresponding to the third initial feature representation) and a sixth weight (corresponding to the fourth initial feature representation) derived by the second gating network to derive the second feature representation.

Referring to FIG. 5, FIG. 5 is a drawing of an embodiment of the present applicationFor structural illustration of a network model, wherein the feature extraction network can be called as an Expert network, each Expert network is composed of Multiple Layers of Perceptron (MLP), and each Expert network is Expert _i The input of (a) may be context information (e.g., may include first operation data) for all scenarios; and Gate control network Gate _i The weighted outputs for selecting multiple expert networks are used as inputs to the upper network, and the input to each gating network is context information (which may include, for example, first operational data) for all scenarios. In one possible implementation, the weight calculation may be as shown in equation (1). G when the data of other scenes are more correlated with the target scene data _i The larger the value, the higher the degree of data sharing between the two tasks. Conversely, G is smaller when the data of other scenes is correlated with the target scene data _i The smaller the data sharing between the two tasks is. The flexible information sharing mode has the functions of information selection and information isolation, and can transmit the information to be shared to an upper layer task layer network.

G _i ＝softmax(X ₁ ，X ₂ ，...，X _N ) (1)；

Therefore, each Expert layer network Expert _i Output M _i Comprises the following steps:

in a possible implementation, a third feature representation and a fourth feature representation may be obtained through the first feature extraction network and the second feature extraction network, respectively, according to the second operation data.

In a possible implementation, the obtaining, according to the second operation data, a third feature representation and a fourth feature representation through the first feature extraction network and the second feature extraction network respectively specifically includes: according to the second operation data, respectively obtaining a third initial feature representation and a fourth initial feature representation through a first feature extraction network and a second feature extraction network; according to the second operation data, respectively obtaining the weight represented by the third initial characteristic and the weight represented by the fourth initial characteristic through the second gating network; according to the weight of the third initial feature representation and the weight of a fourth initial feature representation, fusing the third initial feature representation and the fourth initial feature representation to obtain a third feature representation; according to the second operation data, respectively obtaining the weight represented by the third initial characteristic and the weight represented by the fourth initial characteristic through a third gating network; and fusing the third initial feature representation and the fourth initial feature representation according to the weight of the third initial feature representation and the weight of the fourth initial feature representation to obtain the fourth feature representation.

403. And according to the first characteristic representation, obtaining first tendency information through a first task network, wherein the first tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of the user.

In one possible implementation, the first task network or the second task network may be a context-dependent location bias model (CPBM). When modeling the position bias, the CPBM may display an introduced context feature, and the position bias is affected by the context feature (such as query category), so as to use the specificity Model to estimate the relationship between the position tendency score and the context feature.

In one possible implementation, the first tendency information may include a position bias of each recommended position of the recommended scene, that is, an influence of the position on the user operation behavior. For example, the first recommendation scene may include a plurality of recommendation positions (for example, a plurality of recommendation objects may be displayed in the first recommendation scene, and a position where each recommendation object is located is a recommendation position).

404. And obtaining second tendency information through a second task network according to the second characteristic representation, wherein the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user.

The first task network and the second task network may be different task networks, for example, may be different CPBMs.

In one possible implementation, the second tendency information may include a position bias of each recommended position of the recommended scene, that is, an influence of the position on the user operation behavior. For example, the first recommendation scene may include a plurality of recommendation positions (e.g., a plurality of recommendation objects may be displayed in the first recommendation scene, and a position where each recommendation object is located is a recommendation position).

In a possible implementation, for a third feature representation and a fourth feature representation of the second operation data, similarly, third tendency information may be obtained through the first task network according to the third feature representation, where the third tendency information is used to represent an influence of the recommended position in the recommended scene on the operation behavior of the user; and according to the fourth feature representation, obtaining fourth tendency information through the second task network, wherein the fourth tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user.

405. And respectively obtaining a first weight of the first tendency information and a second weight of the second tendency information through a first gating network according to the first operation data.

In a possible implementation, the first tendency information and the second tendency information may be fused, and the fusion result may be used as the tendency information of the first recommended scene (e.g., the first target tendency information in this embodiment of the application).

In a possible implementation, in order to fuse the first tendency information and the second tendency information, a first weight of the first tendency information and a second weight of the second tendency information may be obtained through a first gating network according to the first operation data. Furthermore, the first tendency information and the second tendency information may be fused (for example, weighted and summed) according to the first weight and the second weight to obtain the first target tendency information.

Referring to fig. 5, the first task network and the second task network may belong to a task network layer (task layer), wherein each task network may be composed of a CPBM model for estimating a tendency score of each scene. Similar to the expert layer network, the task layer network also has a gating network for predicting the weight of each task network (CPBM) and capturing the similarity of the label layer. The input of the Gate network of the task layer network is also the context information of all search scenarios, where the weight calculation can be as shown in equation (3):

Q _i ＝softmax(X ₁ ，X2，...，X _N ) (3)；

task layer Task _i Output Y of _i Can be as follows:

406. and according to the first weight and the second weight, fusing the first tendency information and the second tendency information to obtain first target tendency information, wherein the first target tendency information is used for representing the influence of a recommended position in the first recommended scene on the operation behavior of the user, and the first target tendency information is used for training a recommendation model.

In a possible implementation, similarly, for second operation data of a second recommended scenario, a seventh weight of the third tendency information and an eighth weight of the fourth tendency information may be obtained through the first gating network according to the second operation data; and according to the seventh weight and the eighth weight, fusing the third tendency information and the fourth tendency information to obtain second target tendency information, wherein the second target tendency information is used for representing the influence of the recommended position in the second recommended scene on the operation behavior of the user, and the second target tendency information is used for training a recommendation model.

In one possible implementation, the first target tendency information is used for representing the influence of the recommended position in the first recommendation scene on the operation behavior of the user, and the first target tendency information is used for training a recommendation model.

In one possible implementation, the first target tendency information may be used as a position bias for the first recommendation scenario when training the recommendation model. Specifically, when a recommendation model is trained, a second loss of the recommendation model may be obtained, where the second loss is obtained when the recommendation model is fed forward according to the first operation data; and adjusting the second loss according to the first tendency information to obtain an adjusted second loss, wherein the adjusted second loss is used for updating parameters of the recommendation model.

For example, in the training process, each sample in the biased data set may be weighted by the corresponding tendency score;

the first term is a regular term, and the learnable parameter W in the training model is constrained through the hyper-parameter lambda; the second term is the model loss term, y _l The true label representing the l-th sample,

prediction of the sample by the code model label, z _l Representing the inverse-tendency score corresponding to the location feature of the sample.

In this embodiment of the application, the first gating network may obtain a set of weights for the first recommended scenario, and fuse outputs of the plurality of task networks based on the weights to obtain a position bias (first target tendency information) of the first recommended scenario. The first gating network can identify and fuse information related to the first recommended scene in the output of each task network through numerical control of the weight, on one hand, the relevance among different recommended scenes can be learned, on the other hand, interference among different recommended scenes can be reduced based on a dynamic weight mode, and further the problem that prediction accuracy is reduced due to the fact that a single task model is affected by different data in a distributed mode under the condition of multi-scene combined modeling is solved.

It should be understood that the data processing method described above may be an inference process of the model.

The data processing method in the embodiment of the present application is described next from the training process of the model:

in a possible implementation, the data processing method corresponding to fig. 4 may be a feed-forward action during model training, and may obtain a first true value (group true) of the tendency information corresponding to the first operation data in a training process performed on the first operation data, determine a first loss according to the first tendency information and the first true value, and perform parameter updating on the first feature extraction network, the second feature extraction network, the first task network, the second task network, and the first gating network according to the first loss.

In a possible implementation, the data processing method corresponding to fig. 4 may be a feed-forward action during model training, and a second true value of the tendency information corresponding to the second operation data may be obtained in a training process performed on the second operation data; and determining a second loss according to the second tendency information and the second true value, and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the second loss.

Illustratively, after obtaining the output of each task network, a loss function defined by equation (5) may be employed:

wherein alpha is _i Weight for each task, loss _i The loss function for each task is calculated from equation (6):

In one possible implementation, a first degree of convergence may be obtained; the higher the first degree of convergence, the closer to the converged state is possible. Adjusting the first loss according to the first convergence degree to obtain an adjusted first loss, wherein the adjusted first loss is inversely related to the first convergence degree; and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the adjusted first loss.

Exemplary, the adjustment of the loss can be seen in equations (7) - (9):

α _i (t+1)＝γα _i (t)+(1-γ)σ _i (t+1) (9)；

wherein s is the super parameter for controlling the smoothness of the task weight, and gamma is the super parameter of the task loss before controlling the t round in the loss of the t +1 round.

Illustratively, taking the recommended scene as the search scene as an example, the search page includes a search box displaying the current search terms of the user, such as "design game". The subject content below this page is the ranking of relevant apps presented to the user by the search system of the application marketplace for the search term. The search recommendation system of the application market predicts the click probability of the user on the candidate set apps according to the user, the candidate set apps and the context characteristics, arranges the candidate commodities in a descending order according to the probability, and arranges the most possibly downloaded applications at the top position.

After seeing the recommendation result of the application market, the user selects operations such as browsing, clicking or downloading according to personal interests, and the user behaviors are stored in the log. The application market trains a click rate prediction model by using the accumulated user behavior logs as training data, and the model can be trained by using the MCPBM framework in the embodiment of the application in this scenario.

With a position offset in the search page, items that are positioned further forward will have a greater probability of being noticed by the user, and thus will be more likely to be clicked. Specifically, taking the search term "design game" as an example, the first step is: designing an Intervention experiment strategy which can be random flow Intervention or Intervention Harvest, and collecting user click data under an Intervention experiment to obtain Intervention experiment data; step two: jointly modeling a plurality of search scenes by using collected intervention experiment data, and training a tendency score prediction model (MCPBM) based on multi-task learning; step three: correcting the user click log by using the trained MCPBM tendency score model to obtain unbiased user click data; step four: and (4) training a downstream fine ranking model by using the corrected user click data as unbiased training data, and obtaining an unbiased fine ranking model for the current network reasoning sequencing.

The beneficial effects of the embodiments of the present application are described below in conjunction with experiments: complete off-line experiments were performed on the MQ2007 million query term dataset and the Yahoo LTR dataset, and both datasets compared MCPBM with LE, PBM, and CPBM on two typical indexes, AvgRank and Error. Through experimental tests, the experimental results are shown in the following table:

table one Error index comparison for each search scenario

Table two Error index comparison for each search scenario

In tables I and II, LE (local estimators), PBM (Position Based Model), and CPBM (Contextual Position-Based Model) are three baseline methods. The value theta is more than or equal to 0 to control the influence degree of the Context information position bias in the search scene, the value theta is 0 to indicate that the Context information has no influence on the position bias, and the larger the value theta is, the larger the influence of the position bias on the Context information is.

From the first table and the second table, it can be seen that in the Yahoo and MQ2007 data set of the MCPBM model, the tendency score prediction accuracy of each scene is better than that of the three baseline models, the depolarization effect of the MCPBM model is better than that of the three baseline depolarization models, and the overall ranking index AvgRank is improved by 1% -5%.

The embodiment of the application also provides a data processing method, which comprises the following steps:

obtaining an operation log, wherein the operation log comprises first operation data of a user in a first recommendation scene, and the first operation data comprises operation data of the user when the same recommendation object or recommendation objects with similarity higher than a threshold value are located at different recommendation positions in the first recommendation scene;

according to the first operation data, a first feature representation and a second feature representation are obtained through a first feature extraction network and a second feature extraction network respectively;

according to the first characteristic representation, obtaining first tendency information through a first task network, wherein the first tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of a user;

according to the second characteristic representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

according to the first operation data, a first weight of the first tendency information and a second weight of the second tendency information are obtained through a first gating network;

according to the first weight and the second weight, the first tendency information and the second tendency information are fused to obtain first target tendency information, and the first target tendency information is used for representing the influence of a recommended position in the first recommended scene on the operation behavior of the user;

and determining a first loss according to the first tendency information and the first true value, and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the first loss.

In a possible implementation, the obtaining, according to the first operation data, a first feature representation and a second feature representation through a first feature extraction network and a second feature extraction network, respectively, includes:

the method further comprises the following steps:

according to the third feature representation, third tendency information is obtained through the first task network, and the third tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

according to the fourth feature representation, fourth tendency information is obtained through the second task network, and the fourth tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

according to the second operation data, respectively obtaining a seventh weight of the third tendency information and an eighth weight of the fourth tendency information through the first gating network;

In one possible implementation, the method further comprises:

and determining a second loss according to the second tendency information and the second true value, and updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network and the first gating network according to the second loss.

In one possible implementation, the method further comprises:

acquiring a first convergence degree;

adjusting the first loss according to the first convergence degree to obtain an adjusted first loss, wherein the adjusted first loss is inversely related to the first convergence degree;

the updating parameters of the first feature extraction network, the second feature extraction network, the first task network, the second task network, and the first gating network according to the first loss includes:

Referring to fig. 6, fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus 600 may include:

an obtaining module 601, configured to obtain an operation log, where the operation log includes first operation data of a user in a first recommendation scene, and the first operation data includes operation data of the user when a same recommendation object or a recommendation object with a similarity higher than a threshold is located at different recommendation positions in the first recommendation scene;

for a specific description of the obtaining module 601, reference may be made to the description of step 401 in the foregoing embodiment, which is not described herein again.

A feature extraction module 602, configured to obtain a first feature representation and a second feature representation through a first feature extraction network and a second feature extraction network, respectively, according to the first operation data;

for a specific description of the feature extraction module 602, reference may be made to the description of step 402 in the foregoing embodiment, which is not described herein again.

The tendency information calculation module 603 is configured to obtain first tendency information through a first task network according to the first feature representation, where the first tendency information is used to represent an influence of a recommended position in a recommended scene on an operation behavior of a user; according to the second feature representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

for a specific description of the tendency information calculation module 603, reference may be made to the description of step 403 and step 404 in the foregoing embodiment, which is not described herein again.

A weight determining module 604, configured to obtain a first weight of the first tendency information and a second weight of the second tendency information through a first gating network according to the first operation data;

for a detailed description of the weight determining module 604, reference may be made to the description of step 405 in the foregoing embodiment, which is not described herein again.

A fusion module 605, configured to fuse the first tendency information and the second tendency information according to the first weight and the second weight to obtain first target tendency information, where the first target tendency information is used to represent an influence of a recommended position in the first recommended scene on an operation behavior of a user, and the first target tendency information is used to train a recommendation model.

For a detailed description of the fusion module 605, reference may be made to the description of step 406 in the foregoing embodiment, which is not described herein again.

the feature extraction module is specifically configured to:

the tendency information calculation module is specifically configured to:

the fusion module is specifically configured to:

In one possible implementation, the obtaining module is further configured to:

the device further comprises:

In one possible implementation, the obtaining module is further configured to:

acquiring a first convergence degree;

the device further comprises:

the model training module is specifically configured to:

In one possible implementation, the obtaining module is further configured to:

Referring to fig. 7, fig. 7 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 700 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, and the like, which is not limited herein. The data processing apparatus described in the embodiment corresponding to fig. 6 may be deployed on the execution device 700, and is used to implement the function of data processing in the embodiment corresponding to fig. 4. Specifically, the execution apparatus 700 includes: a receiver 701, a transmitter 702, a processor 703 and a memory 704 (where the number of processors 703 in the execution device 700 may be one or more), wherein the processor 703 may comprise an application processor 7031 and a communication processor 7032. In some embodiments of the present application, the receiver 701, the transmitter 702, the processor 703, and the memory 704 may be connected by a bus or other means.

The memory 704 may include both read-only memory and random-access memory and provides instructions and data to the processor 703. A portion of the memory 704 may also include non-volatile random access memory (NVRAM). The memory 704 stores processors and operating instructions, executable modules or data structures, or subsets thereof, or expanded sets thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 703 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiment of the present application may be applied to the processor 703 or implemented by the processor 703. The processor 703 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method may be implemented by hardware integrated logic circuits in the processor 703 or by instructions in the form of software. The processor 703 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, a Vision Processor (VPU), a Tensor Processor (TPU), or other processors suitable for AI operation, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, and a discrete hardware component. The processor 703 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 704, and the processor 703 reads information in the memory 704 and completes the steps 401 to 406 in the above embodiment in combination with hardware thereof.

The receiver 701 may be used to receive input numeric or character information and to generate signal inputs related to performing device related settings and function control. The transmitter 702 may be configured to output numeric or character information via the first interface; the transmitter 702 may also be configured to send instructions to the disk group via the first interface to modify data in the disk group; the transmitter 702 may also include a display device such as a display screen.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a training device provided in the embodiment of the present application, specifically, the training device 800 is implemented by one or more servers, and the training device 800 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 88 (e.g., one or more processors) and a memory 832, and one or more storage media 830 (e.g., one or more mass storage devices) storing an application 842 or data 844. Memory 832 and storage medium 830 may be transient or persistent storage, among other things. The program stored on the storage medium 830 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, central processor 88 may be configured to communicate with storage medium 830 to execute a series of instructional operations on storage medium 830 on exercise device 800.

Training device 800 may also include one or more power supplies 826, one or more wired or wireless network interfaces 850, one or more input-output interfaces 858; or one or more operating systems 841, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Specifically, the training device may perform the steps related to model training in the above embodiments.

Embodiments of the present application also provide a computer program product, which when executed on a computer causes the computer to execute the steps performed by the aforementioned execution device, or causes the computer to execute the steps performed by the aforementioned training device.

Also provided in an embodiment of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is run on a computer, the program causes the computer to execute the steps executed by the aforementioned execution device, or causes the computer to execute the steps executed by the aforementioned training device.

The execution device, the training device, or the terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored by the storage unit to cause the chip in the execution device to execute the data processing method described in the above embodiment, or to cause the chip in the training device to execute the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, referring to fig. 9, fig. 9 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU900, and the NPU900 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core part of the NPU is an arithmetic circuit 903, and the arithmetic circuit 903 is controlled by the controller 904 to extract matrix data in the memory and perform multiplication.

The NPU900 may implement the information recommendation method provided in the embodiment described in fig. 4 and the method provided in the embodiment described in fig. 4 through cooperation between internal devices.

More specifically, in some implementations, the arithmetic circuitry 903 in the NPU900 includes multiple processing units (PEs) therein. In some implementations, the operational circuit 903 is a two-dimensional systolic array. The arithmetic circuit 903 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 903 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 902 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 901 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 908.

The unified memory 906 is used to store input data as well as output data. The weight data directly passes through a Memory Access Controller (DMAC) 905, and the DMAC is transferred to the weight Memory 902. The input data is also carried into the unified memory 906 by the DMAC.

The BIU is a Bus Interface Unit 910, which is used for the interaction of the AXI Bus with the DMAC and the Instruction Fetch memory (IFB) 909.

A Bus Interface Unit 910 (BIU for short) is configured to fetch an instruction from the instruction fetch memory 909 from an external memory, and also to fetch the original data of the input matrix a or the weight matrix B from the external memory by the memory Unit access controller 905.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 906 or to transfer weight data into the weight memory 902 or to transfer input data into the input memory 901.

The vector calculation unit 907 includes a plurality of operation processing units, and further processes the output of the operation circuit 903, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 907 can store the processed output vector to the unified memory 906. For example, the vector calculation unit 907 may calculate a linear function; alternatively, a non-linear function is applied to the output of the arithmetic circuit 903, such as linear interpolation of the feature planes extracted from the convolutional layers, and then, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 907 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 903, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer (instruction fetch buffer)909 connected to the controller 904 and configured to store instructions used by the controller 904;

the unified memory 906, the input memory 901, the weight memory 902, and the instruction fetch memory 909 are On-Chip memories. The external memory is private to the NPU hardware architecture.

The processor mentioned in any of the above may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.

It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optics, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. A method of data processing, the method comprising:

according to the second feature representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

according to the first operation data, respectively obtaining a first weight of the first tendency information and a second weight of the second tendency information through a first gating network;

and according to the first weight and the second weight, fusing the first tendency information and the second tendency information to obtain first target tendency information, wherein the first target tendency information is used for representing the influence of the recommended position in the first recommended scene on the operation behavior of the user.

2. The method of claim 1, wherein obtaining a first feature representation and a second feature representation from the first operational data through a first feature extraction network and a second feature extraction network, respectively, comprises:

3. The method according to claim 1 or 2, wherein the operation log further comprises second operation data of the user in a second recommendation scene, and the second operation data comprises operation data of the user when the same recommendation object or a recommendation object with a similarity higher than a threshold value is in a different recommendation position in the second recommendation scene;

the method further comprises the following steps:

and according to the seventh weight and the eighth weight, fusing the third tendency information and the fourth tendency information to obtain second target tendency information, wherein the second target tendency information is used for representing the influence of the recommended position in the second recommended scene on the operation behavior of the user, and the second target tendency information is used for training a recommended model.

4. The method of any one of claims 1 to 3, wherein said fusing comprises: and (4) weighted summation.

5. The method according to any of claims 1 to 4, wherein the first task network or the second task network is a context dependent location bias model (CPBM).

6. The method according to any one of claims 1 to 5, wherein the first or second feature extraction network is a network comprising a multi-layer perceptron (MLP).

7. The method of any of claims 1 to 6, further comprising:

8. The method of any of claims 3 to 7, further comprising:

9. The method according to claim 7 or 8, characterized in that the method further comprises:

acquiring a first convergence degree;

10. The method according to any one of claims 1 to 9, further comprising:

11. A data processing apparatus, characterized in that the apparatus comprises:

the characteristic extraction module is used for respectively obtaining a first characteristic representation and a second characteristic representation through a first characteristic extraction network and a second characteristic extraction network according to the first operation data;

the tendency information calculation module is used for obtaining first tendency information through a first task network according to the first characteristic representation, wherein the first tendency information is used for representing the influence of a recommended position in a recommended scene on the operation behavior of the user; according to the second characteristic representation, second tendency information is obtained through a second task network, and the second tendency information is used for representing the influence of the recommended position in the recommended scene on the operation behavior of the user;

and the fusion module is used for fusing the first tendency information and the second tendency information according to the first weight and the second weight to obtain first target tendency information, wherein the first target tendency information is used for representing the influence of the recommended position in the first recommended scene on the operation behavior of the user.

12. The apparatus of claim 11, wherein the feature extraction module is specifically configured to:

according to the first operation data, a fifth weight represented by the first initial characteristic and a sixth weight represented by the second initial characteristic are obtained through a third gating network;

13. The apparatus according to claim 11 or 12, wherein the operation log further comprises second operation data of the user in a second recommendation scene, and the second operation data comprises operation data of the user when the same recommendation object or recommendation objects with similarity higher than a threshold are in different recommendation positions in the second recommendation scene;

the feature extraction module is specifically configured to:

the tendency information calculation module is specifically configured to:

the fusion module is specifically configured to:

14. The apparatus of any of claims 11 to 13, wherein the fusing comprises: and (4) weighted summation.

15. The apparatus of any of claims 11 to 14, wherein the first task network or the second task network is a context-dependent location bias model (CPBM).

16. The apparatus of any one of claims 11 to 15, wherein the first or second feature extraction network is a network comprising a multi-layer perceptron (MLP).

17. The apparatus according to any one of claims 11 to 16, wherein the obtaining module is further configured to:

the device further comprises:

18. The apparatus according to any one of claims 13 to 17, wherein the obtaining module is further configured to:

19. The apparatus of claim 17 or 18, wherein the obtaining module is further configured to:

acquiring a first convergence degree;

the device further comprises:

the model training module is specifically configured to:

20. The apparatus according to any one of claims 11 to 19, wherein the obtaining module is further configured to:

21. A computing device, wherein the computing device comprises a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of any of claims 1 to 10.

22. A computer storage medium, characterized in that the computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to implement the method of any of claims 1 to 10.

23. A computer program product comprising code for implementing a method as claimed in any one of claims 1 to 10 when executed.