CN116957678A

CN116957678A - Data processing method and related device

Info

Publication number: CN116957678A
Application number: CN202310553911.6A
Authority: CN
Inventors: 石志林
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-10-27

Abstract

The embodiment of the application discloses a data processing method and a related device, which can be applied to the fields of artificial intelligence and the like. The click rate prediction model is based on the fact that the interested multimedia information can be recommended to the object, and the click rate of the multimedia data is improved through accurate recommendation.

Description

Data processing method and related device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method and a related device.

Background

With the development of network and computer technology, the object can see different multimedia data when browsing the same page, thereby realizing personalized display of the multimedia data. In general, the interested multimedia data of different objects can be known by predicting the click rate of the different objects on different multimedia information, so that the interested multimedia data of each object can be accurately displayed to each object, the click rate of the multimedia data is improved, and the multimedia data throwing effect and the access amount of pages are improved.

In the related art, the probability of clicking a certain multimedia data by an object is generally obtained by predicting the input data such as the object data and the multimedia data through a click rate prediction model. The click rate prediction model generally adopts a deep learning model, and a feature vector layer included in the deep learning model can perform feature extraction on input data, which is equivalent to a lookup table and can map the input data into feature vectors. The input data generally includes a plurality of features, taking the age feature as an example, a row of numbers in the lookup table corresponds to each feature value corresponding to the age feature, as shown in fig. 1, after the age feature of "20 years old" is input into the feature vector layer, the "1.2, -0.12,4.32,3.2" output by the feature vector layer is the feature vector corresponding to the feature value.

As the input data includes more features, the feature vector layer increases. For example, input data such as object data and multimedia data include hundreds of millions of features, so that there are hundreds of millions of lines in the feature vector layer, and parameters of the feature vector layer are very huge, so that not only are parameters of the click rate prediction model increased, resulting in increased storage overhead, but also complexity of the click rate prediction model is high, and training efficiency is low.

Disclosure of Invention

In order to solve the technical problems, the application provides a data processing method and a related device, which are used for reducing the parameters of a click rate prediction model, reducing the complexity of the click rate prediction model and improving the training efficiency while reducing the storage cost.

The embodiment of the application discloses the following technical scheme:

in one aspect, an embodiment of the present application provides a data processing method, including:

acquiring a sample data set comprising a plurality of features, wherein each sample data in the sample data set has a real click result;

determining a plurality of characteristic value frequencies corresponding to target characteristics in a plurality of characteristics according to the sample data set, wherein the characteristic value frequencies are the frequency of occurrence of the plurality of characteristic values corresponding to the target characteristics in the sample data set respectively;

Dividing a plurality of characteristic values corresponding to the target characteristic based on the characteristic value frequencies to obtain a plurality of characteristic groups, wherein the difference value between the characteristic value frequencies corresponding to the characteristic values in the same characteristic group is smaller than a first preset frequency threshold;

extracting features of a plurality of feature groups corresponding to the features respectively through a feature vector layer included in the initial click rate prediction model to obtain initial shared feature vectors corresponding to the feature groups respectively, wherein feature values in the same feature group correspond to the same initial shared feature vector;

predicting through an interaction layer included in the initial click rate prediction model according to a plurality of initial shared feature vectors to obtain a predicted click result;

and according to the difference between the predicted click result and the corresponding real click result, adjusting the model parameters of the initial click rate prediction model to obtain a click rate prediction model.

In another aspect, an embodiment of the present application provides a data processing apparatus, including: the device comprises an acquisition unit, a determination unit, a division unit, a feature extraction unit, a prediction unit and an adjustment unit;

the acquisition unit is used for acquiring a sample data set comprising a plurality of characteristics, and each sample data in the sample data set has a real click result;

The determining unit is configured to determine, according to the sample data set, a plurality of feature value frequencies corresponding to a target feature in a plurality of features, where the feature value frequencies are the number of times that a plurality of feature values corresponding to the target feature in the sample data set appear respectively;

the dividing unit is configured to divide a plurality of feature values corresponding to the target feature based on a plurality of feature value frequencies, so as to obtain a plurality of feature groups, where a difference value between feature value frequencies corresponding to feature values in the same feature group is smaller than a first preset frequency threshold;

the feature extraction unit is used for extracting features of feature groups corresponding to the features respectively through a feature vector layer included in the initial click rate prediction model to obtain initial shared feature vectors corresponding to the feature groups respectively, wherein the feature values in the same feature group correspond to the same initial shared feature vector;

the prediction unit is used for predicting through an interaction layer included in the initial click rate prediction model according to a plurality of initial shared feature vectors to obtain a predicted click result;

the adjusting unit is used for adjusting the model parameters of the initial click rate prediction model according to the difference between the predicted click result and the corresponding real click result to obtain a click rate prediction model.

In another aspect, an embodiment of the present application provides a computer device including a processor and a memory:

the memory is used for storing a computer program and transmitting the computer program to the processor;

the processor is configured to perform the method of the above aspect according to instructions in the computer program.

In another aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for executing the method described in the above aspect.

In another aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method described in the above aspect.

According to the technical scheme, a sample data set comprising a plurality of features is obtained, the frequency of the feature values corresponding to the feature values is obtained based on the frequency of occurrence of the feature values corresponding to the target features in the sample data set, the feature values corresponding to the target features are divided into a plurality of feature groups based on the frequency of the feature values, so that the difference value of the feature values corresponding to the feature values in the same feature group is smaller than a first preset frequency threshold, namely, the feature values with similar feature values are divided into the same feature group. And respectively inputting the feature groups corresponding to the features into an initial click rate prediction model, and extracting the features through a feature vector layer included in the initial click rate prediction model to obtain initial shared feature vectors corresponding to the features, so that feature values in the same feature group correspond to the same initial shared feature vector, and compared with the feature vectors corresponding to each feature value, the feature vector quantity and the size of a search space are reduced. Based on the initial shared feature vector, predicting through an interaction layer included in the initial click rate prediction model to obtain a predicted click result, and adjusting model parameters of the initial click rate prediction model according to differences between the predicted click result and the corresponding real click result to obtain the click rate prediction model.

Therefore, the feature values are divided into a plurality of feature groups based on the occurrence times of each feature value in the sample data set, namely the feature value frequency, and when feature extraction is carried out on each feature group through the feature vector layer, the feature values in the same feature group share an initial shared feature vector, so that compared with the feature vector corresponding to each feature value, the parameters of the feature vector layer are reduced, the storage cost is reduced, the number of the feature vectors is reduced, the size of a search space is reduced, the complexity of a click rate prediction model is reduced, and the training efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a lookup table corresponding to a feature vector layer;

fig. 2 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a click rate prediction model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a click rate prediction model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

The click rate prediction model comprises a feature vector layer which is equivalent to a lookup table, each row in the lookup table corresponds to a feature value of a feature, the feature vector layer is increased along with the increase of the number of the features and the feature values, and the parameters of the feature vector layer are also increased, so that the parameters of the click rate prediction model are also increased, and the storage cost is high. Moreover, as each feature value is mapped into one feature vector, the number of feature vectors is large, so that the search space is large, the complexity of the click rate prediction model is high, and the training efficiency is low.

Based on the above, the embodiment of the application provides a data processing method and related device, which divide feature values into a plurality of feature groups based on the occurrence times of each feature value in a sample data set, namely the feature value frequency, and when feature extraction is performed on each feature group through a feature vector layer, feature values in the same feature group share an initial shared feature vector, so that compared with each feature value corresponding to one feature vector, the method and the related device not only reduce the storage cost by reducing the parameters of the feature vector layer, but also reduce the number of the feature vectors, thereby reducing the size of a search space, reducing the complexity of a click rate prediction model and improving the training efficiency.

The data processing method provided by the embodiment of the application is realized based on artificial intelligence. Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

In the embodiment of the application, the artificial intelligence technology mainly comprises the machine learning/deep learning and other directions.

The data processing method provided by the application can be applied to computer equipment with data processing capability, such as terminal equipment and servers. The terminal device may be a desktop computer, a notebook computer, a mobile phone, a tablet computer, an internet of things device, a portable wearable device, the internet of things device may be an intelligent sound box, an intelligent television, an intelligent air conditioner, an intelligent vehicle-mounted device, etc., the intelligent vehicle-mounted device may be a vehicle-mounted navigation terminal, a vehicle-mounted computer, etc., and the portable wearable device may be an intelligent watch, an intelligent bracelet, a head-mounted device, etc., but is not limited thereto; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server or a server cluster for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

The computer device may also have machine learning capabilities. Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

In the embodiment of the application, a click rate prediction model for predicting the click rate is obtained through training by a machine learning technology based on a sample data set such as object data, multimedia data and the like.

In order to facilitate understanding of the data processing method provided by the embodiment of the present application, an application scenario of the data processing method is described below by taking an execution body of the data processing method as an example of a server.

Referring to fig. 2, the application scenario of the data processing method provided by the embodiment of the present application is shown. As shown in fig. 2, the application scenario includes a terminal device 210 and a server 220, where the terminal device 210 and the server 220 may communicate through a network.

The user may browse a page through the terminal device 210, where multimedia data is displayed in addition to data required by the user, and advertisement is described below as an example. The advertisement space is arranged on the page and used for displaying advertisements, in order to improve the click rate of the advertisements and improve the advertisement putting effect and the access amount of the page, the server 220 can determine the advertisements interested by the user through a trained click rate prediction model based on the object data and the advertisement data related to the user, so that the advertisements are put in the page used for browsing, and personalized display of the advertisements is realized.

The process by which the server 220 trains the click rate prediction model is described below.

A sample data set comprising a plurality of features is obtained, wherein the sample data set comprises a plurality of sample data, each sample data comprises object data and advertisement data, and the sample data has a corresponding real click result. For example, "user 1, male, 30 years old, swimming", "user 2, female, 20 years old, reading" and user 1 clicks on the natatorium recommended advertisement, user 2 does not click on the natatorium recommended advertisement. Each sample data includes a plurality of characteristics, such that the sample data set includes a plurality of characteristics, such as gender characteristics, age characteristics, hobby characteristics, and the like.

And aiming at the target feature in the plurality of features, obtaining the frequency of the feature value corresponding to each feature value based on the frequency of the occurrence of the plurality of feature values corresponding to the target feature in the sample data set. Taking the age characteristic as an example, in the sample data set, the characteristic values corresponding to the age characteristic are 21 characteristic values in total from 20 years old to 40 years old, wherein the characteristic value frequency corresponding to the age of 30 years old is 100 times, the characteristic value frequency corresponding to the age of 35 years old is 90 times, and the characteristic value frequency of the rest 19 characteristic values is more than 40 times.

Based on the feature value frequencies corresponding to the respective feature values, the feature values corresponding to the target feature are divided into a plurality of feature groups, for example, if the first preset frequency threshold is 50, the feature groups are divided into the same feature group by 30 years old and 35 years old, and the remaining 19 feature values are divided into another feature group. Therefore, the characteristic values in the same characteristic group have the corresponding characteristic value frequency difference smaller than the first preset frequency threshold, namely, the characteristic values with similar characteristic value frequencies are divided into the same characteristic group.

And respectively taking the plurality of features as target features, respectively inputting the feature groups corresponding to the features into an initial click rate prediction model, and extracting the features through a feature vector layer included in the initial click rate prediction model to obtain initial shared feature vectors corresponding to the features respectively, so that feature values in the same feature group correspond to the same initial shared feature vector. For example, 30 years old and 35 years old correspond to one feature vector, the remaining 19 feature values correspond to another feature vector, and the number of feature vectors is reduced from 21 to 2 compared to one feature vector for each feature value, reducing the number of feature vectors.

Based on the initial shared feature vector, predicting through an interaction layer included in the initial click rate prediction model to obtain a predicted click result, and adjusting model parameters of the initial click rate prediction model according to the difference between the predicted click result and the real click result to obtain the click rate prediction model.

The data processing method provided by the embodiment of the application can be executed by a server. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to perform the data processing method provided in the embodiment of the present application, or the terminal device and the server may jointly perform the data processing method provided in the embodiment of the present application, which is not limited in this embodiment.

The data processing method provided by the application is described in detail through the method embodiment.

Referring to fig. 3, the flow chart of the data processing method provided by the embodiment of the application is shown. For convenience of description, the following embodiments will be described by taking an execution body of the data processing method as a server as an example. As shown in fig. 3, the data processing method includes the steps of:

s301: a sample data set including a plurality of features is acquired.

The sample data set includes a plurality of sample data, each sample data including at least one feature, the features included in each sample data may not be identical, and thus the sample data set may include a plurality of features. Features generally refer to various attributes that describe factors such as objects, multimedia data, context, etc., which can be used to predict whether an object will click on certain multimedia data. For clarity, some common features are described below:

(1) Object characteristics: object basic attribute features such as gender features, age features, features based on geographical location information (Location Based Service, LBS), interest features, behavioral sequence features such as search records, purchase records, etc.

(2) Advertisement features: category of advertisement, brand, price, description, picture, historical click-through rate, etc.

(3) Contextual characteristics: context information such as the current search keyword of the object, search time, device type, operating system, network environment, and the like.

In order to train to obtain the click rate prediction model, each sample data in the sample data set also has a corresponding real click result, so as to identify whether an object in the current sample data actually clicks on a certain multimedia data.

It can be understood that in the embodiment of the present application, the related context information, access time, and machine type of the device used for accessing the object need to be implemented by using a data capturing technical scheme, and when the embodiment of the present application is applied to a specific product or technology, the related data collecting, using and processing processes should comply with the national legal and legal requirements, and comply with legal, legal and necessary principles, and do not relate to obtaining the data types prohibited or limited by the legal and legal rules, and do not hinder the normal operation of the target website. In addition, when the embodiment of the application is applied to specific products or technologies, the technical scheme for collecting and processing sample data such as object data, multimedia data and the like is implemented, the informed consent or independent consent of the personal information body is obtained strictly according to the requirements of relevant national laws and regulations, and the subsequent data use and processing behaviors are developed within the authorized range of the laws and regulations and the personal information body.

S302: and determining a plurality of characteristic value frequencies corresponding to the target characteristic in the plurality of characteristics according to the sample data set.

The sample data set includes a plurality of features, and the target feature may be one feature or a set of features of the plurality of features. The following description will be given separately.

(1) The target feature is one of a plurality of features: one feature can be selected from a plurality of features to serve as a target feature, a plurality of feature value frequencies corresponding to the target feature are determined, and each feature can be subsequently used as the target feature, so that a plurality of feature value frequencies corresponding to the features are determined.

The feature value frequency refers to the number of times that a plurality of feature values corresponding to the target feature in the sample data set respectively appear. Taking the age characteristic as an example, the range covered by the characteristic values corresponding to the age characteristic included in the different sample data sets may be different, for example, the characteristic value corresponding to the age characteristic in the sample data set a is 20 years old to 40 years old, and the characteristic value corresponding to the age characteristic in the sample data set B is 1 year old to 100 years old. The number of occurrences of each eigenvalue in the sample data set is also different, such as 20000 occurrences at age 30 and 1 occurrence at age 100 in sample data set B.

(2) The target feature is a set of features from a plurality of features: the method comprises the steps of dividing a plurality of characteristics into a plurality of attribute groups, wherein the characteristics in the same attribute group have similar meaning and function, taking the characteristics of one attribute group as target characteristics, determining a plurality of characteristic value frequencies corresponding to the target characteristics respectively, and subsequently taking each attribute group as the target characteristics respectively, thereby determining a plurality of characteristic value frequencies corresponding to the attribute groups respectively. Thereby, the dimension of the feature is subsequently reduced, reducing the complexity of the model.

For example, feature values that are the same in frequency but belong to different features may be divided into a feature group if they have similar meaning and effect. For example, the category of advertisements and the placement of advertisements may have the same frequency of characteristic values, but they have different meanings and roles, and thus should be divided into different characteristic groups. However, the click rate, the exposure rate and the conversion rate of the advertisement are classified into one attribute group, so that the effect of the advertisement can be better reflected, and the advertiser can conveniently analyze and optimize the effect.

S303: and dividing a plurality of characteristic values corresponding to the target characteristic based on the characteristic value frequencies to obtain a plurality of characteristic groups.

The plurality of characteristic values are grouped, so that the characteristic values in the same characteristic group have the difference value between the corresponding characteristic value frequencies smaller than a first preset frequency threshold. For example, in sample data set B, with 30 years of occurrence 20000 times, 35 years of occurrence 20001 times, and a first preset frequency threshold of 100 times, then 30 years and 35 years of occurrence would be divided into the same feature group.

The embodiment of the application does not particularly limit the division manner of the feature groups, for example, a plurality of feature values corresponding to the target features are sequenced from high to low according to the frequency of the feature values, and a first preset frequency threshold value is determined, so that the feature values with similar frequency of the feature values are divided into one feature group based on the first preset frequency threshold value. For example, feature values having a feature value frequency of less than 10 are grouped together, feature values having a feature value frequency between 10 and 100 are grouped together, and feature values having a feature value frequency of greater than 100 are grouped together. It should be noted that for some feature values, the number of occurrences may be very small, but it has a great influence on the predictive performance of the click-through rate prediction model, these feature values may need to be treated separately, rather than being grouped into which feature group.

For another example, the high-frequency characteristic value and the low-frequency characteristic value may be determined based on the characteristic value frequency, and the high-frequency characteristic value is divided into a characteristic group, and the low-frequency characteristic value is divided into a characteristic group, so that the two characteristic groups are separately processed. The low frequency eigenvalues occur less frequently and may occur only in a small number of sample data, so their statistical information is not accurate enough and is susceptible to noise. The high-frequency characteristic values are more in occurrence frequency, the statistical information of the high-frequency characteristic values is more accurate, and the distribution condition of sample data can be reflected better. Therefore, the low-frequency characteristic value and the high-frequency characteristic value are respectively processed, so that the statistical information of the data can be better utilized, and the prediction performance of the click rate prediction model is improved.

Therefore, based on the frequency of the plurality of characteristic values, the plurality of characteristic values corresponding to the target characteristic are grouped, so that not only can the statistical information of the data be better utilized, but also the redundancy among the characteristics can be avoided. For example, there is a strong correlation between many eigenvalues, which may lead to repeated learning of these relevant eigenvalues by the click rate prediction model if the eigenvalues are not grouped, wasting computing resources. And the characteristic values can be grouped according to frequency, so that the related characteristic values can be grouped into the same group, repeated learning is avoided, and the generalization capability of the click rate prediction model is improved.

In addition, the feature values are grouped according to the frequency of the feature values, so that the interpretability of the click rate prediction model can be improved. The number of features included in a general sample set is very large, which can make the search space of the click rate prediction model very complex and difficult to interpret. And the characteristic values with similar characteristic value frequencies are grouped into a group, so that the search space of the click rate prediction model is more concise and clear, and easy to explain. If the search space has K eigenvalues, the length of the eigenvector corresponding to each eigenvalue is N, the search space comprises 2 ^NK A possible choice is made, if the K feature values are divided into L feature groups, the size of the search space is from 2 ^NK Reduced to 2 ^LK . Moreover, many features are sparse, which can make click rate prediction model training very difficult. The feature dimension can be effectively reduced by grouping the feature value based frequency, and the click rate prediction model is reducedAnd the training efficiency of the model is improved.

In deep learning, the cost of the search space generally refers to the number of parameters that can be used to train the click rate prediction model, and the larger the size of the search space, the higher the complexity of the click rate prediction model, thus requiring more sample data and computing resources to train. After the search space is reduced, the required computing resources are reduced, the click rate prediction model can be trained faster, and the training time is shortened.

S304: and respectively extracting the characteristics of the characteristic groups corresponding to the characteristics through the characteristic vector layer included in the initial click rate prediction model to obtain initial shared characteristic vectors corresponding to the characteristic groups.

After each feature in the plurality of features is used as a target feature, a feature group corresponding to each feature is obtained, the feature group is used as a unit, the feature group is input into an initial click rate prediction model, feature extraction is carried out through a feature vector layer included in the initial click rate prediction model, initial sharing feature vectors corresponding to each feature group are obtained, and feature values in the same feature group correspond to the same initial sharing feature vector.

The advantage of sharing one feature vector by feature values with similar frequency is as follows:

(1) Memory occupation is reduced: for large-scale click rate prediction models, the feature quantity is often very large, and if each feature value uses an independent feature vector, a large amount of memory space is occupied. And the characteristic values with similar characteristic value frequencies share the same initial shared characteristic vector, so that the occupation of the memory can be greatly reduced, and the training and prediction efficiency of the click rate prediction model can be improved.

(2) Reducing overfitting: if each eigenvalue uses independent eigenvectors, the number of parameters of the click rate prediction model is huge, and the overfitting phenomenon is easy to occur. And the characteristic values with similar characteristic value frequencies share the same initial shared characteristic vector, so that the parameter quantity of the click rate prediction model can be reduced, and the risk of overfitting is reduced.

(3) The generalization capability of the model is improved: sharing the same initial shared feature vector can classify similar features into one class, avoid repeated learning and improve the generalization capability of the click rate prediction model.

(4) Improving the interpretation of the model: the same initial shared feature vector can be shared, so that the search space of the click rate prediction model is simpler and more clear, and the method is easy to explain.

S305: and predicting through an interaction layer included in the initial click rate prediction model according to the plurality of initial shared feature vectors to obtain a predicted click result.

After the feature groups are respectively input into the click rate prediction model, feature extraction can be carried out on the feature vector layers included in the click rate prediction model, so that initial shared feature vectors corresponding to the feature groups are obtained, the initial shared feature vectors are transmitted to the interaction layers included in the click rate prediction model, prediction is carried out through the interaction layers, and a predicted click result is obtained.

S306: and according to the difference between the predicted click result and the corresponding real click result, adjusting model parameters of the initial click rate prediction model to obtain the click rate prediction model.

Based on the predicted click result and the difference of the corresponding real click result, model parameters of the initial click rate prediction model are continuously adjusted, so that the difference between the predicted click result and the corresponding real click result is continuously reduced until convergence or preset iteration times are met, and the click rate prediction model is obtained.

It should be noted that, the click rate prediction model is a trained model for predicting the click rate, and the initial click rate prediction model is a model which is not trained yet, and compared with the click rate prediction model, the accuracy of the predicted click result obtained based on the initial click rate prediction model is lower.

As a possible implementation manner, the initial click rate prediction model further comprises a feature selection layer, the feature selection layer is located between the feature vector layer and the interaction layer, after the initial shared feature vector is obtained based on the feature vector layer, the initial shared feature vector can be input into the feature selection layer, the feature selection layer can calculate importance parameters of each initial shared feature vector, the importance parameters refer to influence degree of the initial shared feature vector on a predicted click result output by the click rate prediction model, and the initial shared feature vector with the importance parameters larger than a preset importance threshold is selected from a plurality of initial shared feature vectors and is input into the interaction layer as a shared feature vector. The following is a description with reference to fig. 4.

The embodiment of the application is not particularly limited to the way in which the importance parameter is calculated, such as a tree-based model (e.g., decision tree, random forest, etc.), a coefficient of kunity, or an information gain, etc.

Referring to fig. 4, the structure of a click rate prediction model according to an embodiment of the present application is shown. In fig. 4, the initial click rate prediction model includes a feature selection layer in addition to a feature vector layer and an interaction layer. After the plurality of features are divided into a plurality of feature groups based on the feature value frequency, the feature value is input into a feature vector layer by taking the feature groups as units, the feature vector layer can perform feature extraction on the feature groups to obtain initial shared feature vectors, the initial shared feature vectors are input into a feature selection layer, the feature selection layer can calculate importance parameters of the initial shared feature vectors, at least one shared feature vector is selected from the plurality of initial shared feature vectors based on the importance parameters, and the shared feature vectors are input into an interaction layer for prediction, so that the prediction accuracy of a predicted click result is improved.

Therefore, the contribution degree of each initial shared feature vector to a batch of sample data set, namely the importance parameter, is calculated through the feature selection layer, then the shared feature vector helpful for prediction is selected based on the importance parameter, and the accuracy of a predicted click result can be improved based on the shared feature vector helpful for prediction, so that the prediction accuracy of the click rate prediction model is improved.

In addition, the feature selection layer may adjust the length of the feature vector. For example, if the importance of some initial shared feature vectors is low, the length of the feature vectors may be reduced by deleting these initial shared feature vectors. Or, the multiple initial shared feature vectors obtained by the feature vector layer form a multidimensional matrix, and the multidimensional matrix is input to the interaction layer for prediction, if the shared feature vectors are selected from the initial shared feature vectors to form a multidimensional matrix, the dimension of the multidimensional matrix can be reduced, and in this case, the feature selection layer adjusts the dimension of the multidimensional matrix by deleting some initial shared feature vectors, so as to improve the performance and efficiency of the click rate prediction model.

As can be seen from the foregoing, the click rate prediction model includes a feature vector layer corresponding to a lookup table, each row in the lookup table corresponds to a feature value, and the feature vector layer maps each feature map value to a feature vector with the same length, for example, the feature vector has a length of 256 bits, and the value of each bit in the feature vector belongs to a floating point (float) type value. With the increase of the feature quantity, the feature vector layer is increased, so that huge storage cost is brought, and the parameters of the feature vector layer are huge and occupy more than 95% of the parameters of the whole click rate prediction model, so that the parameters of the feature vector layer have a great influence on the accuracy of the click rate prediction model, and the prediction accuracy of the click rate prediction model is possibly lower.

Based on the above, the embodiment of the application can adjust the length of the initial shared feature vector according to the preset length threshold value to obtain the adaptive feature vector, so that the length of the adaptive feature vector is smaller than or equal to the length of the initial shared feature vector, and based on the adaptive feature vector, the prediction is performed through an interaction layer included in the initial click rate prediction model to obtain a predicted click result. Therefore, the initial shared feature vector is enabled to carry out subsequent prediction on the self-adaptive feature vector with shorter length through the preset length threshold, so that the storage cost can be reduced, and the parameters of the click rate prediction model can be reduced.

The embodiment of the application is not particularly limited to the preset length threshold, and can be set by a person skilled in the art according to actual needs. For example, the size of the preset length threshold can be adjusted according to the performance and the parameter size of the click rate prediction model, if the parameters of the click rate prediction model are more, the preset length threshold can be set larger, so that the length of the adaptive feature vector is reduced, and the parameters of the click rate prediction model are further reduced.

The embodiment of the application does not specifically determine the manner of obtaining the adaptive feature vector based on the preset length threshold, and the following two ways are taken as examples for illustration.

Mode one: the initial shared feature vector comprises a plurality of feature components, and feature components smaller than a preset length threshold value in the plurality of feature components are deleted from the initial shared feature vector, so that the self-adaptive feature vector is obtained. Specifically, if the feature component is smaller than the preset length threshold, setting the feature component smaller than the preset length threshold as a preset value in the initial shared feature vector, for example, setting the feature component as 0; if the feature component is greater than or equal to the preset length threshold, keeping the feature component greater than or equal to the preset length threshold unchanged in the initial shared feature vector; and finally deleting a preset value from the adjusted initial shared feature vector to obtain the self-adaptive feature vector.

For example, the initial shared feature vector is 64 bits in length, each bit is a feature component, each bit is compared with a predetermined length threshold, and bits smaller than the predetermined length threshold are deleted from the initial shared feature vector to obtain the adaptive feature vector.

As a possible implementation, the location of the feature component in the initial shared feature component is recorded while deleting the feature component smaller than the preset length threshold, so that when the prediction is performed based on the adaptive feature vector, a preset value may be added to the adaptive feature vector based on the recorded location. For example, the initial shared feature vector has 32 bits in total, the 3 rd bit and the 13 th bit are deleted, the obtained adaptive feature vector has 30 bits in total, 0 is added after the 2 nd bit and the 12 th bit of the adaptive feature vector before the adaptive feature vector is input to the interaction layer, thereby the adaptive feature vector is predicted to have a predicted feature vector of 32 bits in total, and the predicted feature vector is predicted by the interaction layer, thereby obtaining a predicted click result.

It should be noted that, the embodiment of the present application is not particularly limited to a manner of increasing the preset value based on the position, for example, increasing the preset value by the same amount after the adaptive feature vector based on the number of positions of the deleted feature component. For another example, based on the positions of the deleted feature components, a preset value is added at the same position, so that the constitution of the predicted feature vector is the same as that of the initial shared feature vector as far as possible, and the prediction precision of the click rate prediction model is ensured while the storage cost is reduced.

Thus, the lengths of the adaptive feature vectors are different from each other, which is disadvantageous for the interactive layer to use, so that the lengths of the adaptive feature vectors can be adjusted before prediction by the interactive layer. Since the lengths of the initial shared feature vectors are the same, the positions of the deleted feature components can be memorized while the feature components of the initial shared feature vectors are deleted, and the lengths of the adaptive vectors are changed into predictive feature vectors consistent with the lengths of the initial shared feature vectors based on the positions, so that the lengths of the predictive feature vectors are consistent, and the prediction can be performed through an interaction layer.

Mode two: related information such as users and multimedia data generally belong to discrete features, and the frequency of occurrence of feature values corresponding to each discrete feature in sample data is large. It may occur that the high frequency eigenvalues occur millions of times in the sample data and the low frequency eigenvalues occur only a few times in the sample data. In the process of training the click rate prediction model, it is generally expected that the dimension of the vector corresponding to the high-frequency eigenvalue is higher, because the higher the dimension is, the more abundant the information can be expressed; for low-frequency eigenvalues, too high vector dimensionality can easily cause overfitting of click rate prediction models, and influence model effects. Therefore, the feature vector with fixed length is endowed to all the feature values, so that the learning ability of the feature vector layer to sample data is reduced, and the effect of the click rate prediction model is in a suboptimal state, namely, the prediction precision of the click rate prediction model is lower.

Based on this, in the embodiment of the present application, the preset length threshold is not fixed any more, but is determined based on the feature value frequency corresponding to the initial shared feature vector, and the higher the feature value frequency corresponding to the initial shared feature vector is, the smaller the preset length threshold corresponding to the initial shared feature vector is, so that the adaptive feature vector is obtained by truncating the initial shared feature vector based on the preset length threshold, and the longer the length of the adaptive feature vector, that is, the higher the feature value frequency corresponding to the initial shared feature vector is, the longer the length of the adaptive feature vector is obtained by adjusting, so as to improve the prediction accuracy of the click rate prediction model.

Specifically, a preset length threshold corresponding to an initial shared feature vector is obtained, and feature components with the length being the preset length threshold are continuously deleted from the preset component position in the initial shared feature vector, so that an adaptive feature vector is obtained. For example, the preset component position is the last feature component of each initial shared feature vector, and feature components with a length of a preset length threshold are continuously deleted from the back to the front. For another example, the preset component position is the first feature component of each initial shared feature vector, and feature components with a length equal to a preset length threshold are continuously deleted from front to back.

Therefore, when the adaptive feature vector is used for prediction, a preset value can be increased from a preset component position according to a preset length threshold value, for example, the length of the adaptive feature vector is complemented by 0, so that an interaction layer can predict based on the prediction feature vector with consistent length, and the prediction method is simple and convenient.

The embodiment of the application is not particularly limited to a determination mode of the preset length threshold, for example, a twin adaptive masking Layer network (AMTL) can be adopted to flexibly calculate the length which needs to be reserved of the feature vector corresponding to each feature value, so that the model precision is improved, meanwhile, the storage cost is saved by 60%, and the hot start of the model can be supported.

It should be noted that, in the two ways of obtaining the adaptive feature vector, the adaptive feature vector may be obtained based on the initial shared feature vector, and may also be obtained based on the shared feature vector, so as to further improve the prediction accuracy of the click rate prediction model.

Referring to fig. 5, the structure of a click rate prediction model according to an embodiment of the present application is shown. In fig. 5, the click rate prediction model includes a feature vector layer, a feature selection layer, and an interaction layer. After the feature groups are respectively input into the feature vector layer, feature extraction is carried out through the feature vector layer to obtain a plurality of initial shared feature vectors, the plurality of initial shared feature vectors are input into the feature selection layer, at least one shared feature vector is obtained through importance parameter selection, the length of the shared feature vector is adjusted based on a preset length threshold value, and an adaptive feature vector is obtained and stored. When the adaptive feature vectors are used for prediction, the lengths of the adaptive feature vectors are complemented based on a preset value to obtain the predicted feature vectors, so that the interaction layer predicts based on the predicted feature vectors with consistent lengths to obtain a predicted click result.

As a possible implementation manner, if the click-through rate prediction model further includes a feature selection layer, in the process of training the initial click-through rate prediction model, the model parameters of the initial click-through rate prediction model may include feature selection layer parameters and other trainable parameters, where the other trainable parameters are model parameters that need to be adjusted by the initial click-through rate prediction model when the feature selection layer is not present, and the feature selection layer parameters are parameters that the feature selection layer has.

Other trainable parameters and feature selection layer parameters are independent of each other, but there is a relationship between them. The feature selection layer selects the shared feature vector with higher importance parameter, so that the complexity and the calculated amount of the click rate prediction model can be reduced, and the efficiency of the click rate prediction model is improved. Meanwhile, other trainable parameters are optimized, so that the performance of the click rate prediction model can be further improved. Adjustment of model parameters of the initial click rate prediction model is described below.

An initial sample data set is obtained, the initial sample data set comprising a plurality of sample data, and each sample data having a true click result. It should be noted that the initial sample data set includes a larger data amount of sample data than the sample data set includes. Each sample data includes at least one feature, and the features included in each sample data may not be exactly the same, so that the initial sample data set may include a plurality of features.

The initial sample data set is divided into a training set and a validation set, for example, the initial validation set is divided into 10 equal parts, the training set includes 9 equal parts, and the remaining one equal part is used as the validation set. And taking the training set as a sample data set, executing the steps of S301-S305, thereby obtaining a first predicted click result based on the training set, and then adjusting other trainable parameters based on the difference between the first predicted click result and the corresponding real click result. And taking the verification set as a sample data set, executing the steps of S301-S305, so as to obtain a second predicted click result based on the verification set, and adjusting the characteristic selection layer parameters based on the difference between the second predicted click result and the corresponding real click result, thereby realizing iteration of the initial click rate prediction model, and continuously and alternately training or obtaining the click rate prediction model through continuous iteration training.

As a possible implementation manner, an optimization algorithm such as gradient descent may be further used, where the feature selection layer is trained first, that is, the verification set is taken as a sample data set, and the foregoing S301-S305 are performed, so as to obtain a second predicted click result based on the verification set, then adjust the feature selection layer parameters based on a difference between the second predicted click result and the corresponding real click result, then calculate the importance parameters based on the trained feature selection layer and select to obtain a shared feature vector, and then continue to train the initial click rate prediction model based on the shared feature vector, that is, the training set is taken as a sample data set, and then the foregoing S301-S305 are performed, so as to obtain a first predicted click result based on the training set, and then adjust other trainable parameters based on the difference between the first predicted click result and the corresponding real click result.

As a possible implementation manner, according to the difference between the predicted click result and the corresponding real click result, the model parameters of the initial click rate prediction model are adjusted based on a gradient descent mode, so as to obtain the click rate prediction model. The gradient refers to a partial derivative vector of the model parameter of the loss function, and is used for guiding the updating direction and step length of the model parameter. The following is a detailed description.

Assume thatTraining loss function of click rate prediction model, < ->And verifying the loss function of the click rate prediction model. Then the model parameters of the initial click rate prediction model adjusted based on the first predicted click result and the second predicted click result may be expressed as:

where α represents a feature selection layer parameter, Θ= { θ, E } is other trainable parameter, Θ ^* Is theta, alpha after optimization _l,k Parameters representing the kth feature in the first row. The above formula essentially defines a double-layer optimization problem, alpha and theta representing the parameter variables of the upper and lower layers of the model, respectively, and is optimized in an alternating manner during training. Theta (theta) ^* The (α) update may be expressed as:

where ζ is the learning rate of the model parameters Θ being updated in one step. Alpha is then optimized based on the way the gradient drops, expressed as follows:

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing the model parameters after one step update.

As a kind ofIn a possible implementation, consider that ζ is close to zero, and the second derivative term in the above formula is omitted, thereby becomingKnown as a first order approximation, to further increase computational efficiency. />

As a possible implementation, the gradient may also be gradient normalized. Gradient normalization (Gradient Normalization) is a method of optimizing a neural network such that the norms of the gradients remain stable during training. In particular, in passing gradientsWhen optimizing α, the gradient of α in each training batch can be normalized by gradient normalization, as follows:

wherein g (alpha) _l ) G is the gradient before normalization _norm (α _l ) For normalized gradient, E _g Is a value close to 0 (e.g., 1e 7) and is small to avoid value overflow. g (alpha) _l ) Representing the gradient of the feature selection layer parameter alpha in line 1, g (alpha _l,k ) Representing the gradient of the parameter of the kth feature in the parameter of the feature selection layer in the first line,representing the gradient sum of all features. Thus enabling the use of line-by-line gradient normalization to handle the gradient high variance of a during back propagation.

From the above, through gradient normalization, (1) the norm of the gradient vector is kept in a small range, so as to reduce the oscillation and jitter of the gradient descent algorithm and improve the convergence speed and stability. (2) The problem that the gradient explosion and the gradient vanish due to the fact that the norms of the gradient vectors are too large or too small can be avoided, and the problem that the performance and the efficiency of the model are affected due to unstable parameters is avoided. (3) The dependence degree of the model on any feature can be reduced, so that the generalization capability and the interpretability of the model are improved.

Because the characteristic value with higher characteristic value frequency often appears in the sample data set, the weight of the characteristic value with higher characteristic value frequency occupies larger proportion in the click rate prediction model obtained by training, so that the parameter updating needs to be cautious to avoid excessively fitting training data, and meanwhile, the generalization capability of the model is ensured.

Based on the above, in the embodiment of the application, based on a plurality of characteristic value frequencies, the learning rate corresponding to each characteristic group is determined, so that the higher the characteristic value frequency is, the lower the learning rate corresponding to the characteristic group is, thereby playing a regularization role, namely controlling the complexity of the click rate prediction model by slowing down the weight update speed of the characteristic value with higher characteristic value frequency, and avoiding overfitting. Meanwhile, the click rate prediction model can be more stable due to the lower learning rate, and unstable conditions such as gradient explosion or gradient disappearance and the like in the training process are avoided.

The learning rate refers to the step size when updating the model parameters each time. The convergence speed and performance of the model are affected by the learning rate. When the learning rate is higher, the step length of updating the model parameters each time is larger, the convergence rate of the model is accelerated, but the model can oscillate back and forth near the optimal point of the objective function, and even the situation of non-convergence can occur. In addition, the high learning rate may also cause the model parameter updating process to skip the local optimal solution, thereby affecting the performance of the model. When the learning rate is lower, the step length of updating the model parameters each time is smaller, the convergence rate of the model is slower, but the optimal solution of the objective function can be found more accurately, so that the performance and generalization capability of the model are improved. However, if the learning rate is too low, the model may not be able to find the globally optimal solution before it is stopped at the locally optimal solution of the objective function.

It should be noted that the use of a lower learning rate for feature values with higher frequency does not mean that their weights are not updated at all, but are adjusted according to the actual situation. In general, the best learning value can be found through methods such as experiments and cross-validation, and is adjusted according to a specific sample data set and click rate prediction model. In addition, some regularization technologies, such as L1 regularization and L2 regularization, can be adopted to further control the complexity of the click rate prediction model and improve the generalization capability of the click rate prediction model.

As a possible implementation manner, not only the feature group with higher feature value frequency may be made to have lower corresponding learning, but also the feature group with lower feature value frequency may be made to have higher corresponding learning rate. Because of the difference in the number of eigenvalues with higher eigenvalue frequencies and eigenvalues with lower eigenvalue frequencies in the sample data set, the gradient corresponding to the eigenvalue with higher eigenvalue frequencies may be several orders of magnitude greater than the gradient of the eigenvalue with lower eigenvalue frequencies. Therefore, the influence of orders of magnitude can be removed through gradient normalization, so that different learning rates can be applied to rows corresponding to different feature groups in the feature selection layer during optimization. Thus, the characteristic values with higher characteristic value frequency and the characteristic values with lower characteristic value frequency are better distinguished.

As one possible implementation manner, since some features only correspond to two feature values, for example, the feature value corresponding to the gender feature is male or female, and the two feature values represent different meanings, if the frequency of the feature value corresponding to the male is similar to that of the feature value corresponding to the female, the two feature values are divided into one feature group, which may cause feature loss, thereby reducing the prediction accuracy of the click rate prediction model. One implementation of S303 is as follows:

identifying the kind of the feature value included in the sample data set of the target feature, and if the kind of the feature value included in the sample data set of the target feature value is equal to 2, directly dividing the two feature values into two different feature groups, for example, dividing a male into one feature group and dividing a female into the other feature group. If the types of the characteristic values included in the sample data set of the target characteristic are greater than 2, dividing a plurality of characteristic values corresponding to the target characteristic based on the frequency of the plurality of characteristic values to obtain a plurality of characteristic groups.

By determining the types of the feature values included in the sample data set, it is possible to further determine whether the feature needs to be divided based on the feature value frequency, and thus the prediction accuracy of the click rate prediction model can be further improved.

As can be seen from the foregoing, since the feature group with higher feature value frequency is more important for the click rate prediction model, in the process of training the click rate prediction model, the feature values can be divided based on preset frequency thresholds with different sizes, so that the feature group with higher feature value frequency is more, and further, initial shared feature vectors corresponding to the feature group with higher feature value frequency are obtained, thereby improving the accuracy of the click rate prediction model. The following is a detailed description.

Dividing the characteristic values with the characteristic value frequency smaller than the preset high-frequency threshold value based on the first preset frequency threshold value to obtain at least one low-frequency characteristic group, wherein the difference value between the characteristic value frequencies corresponding to the characteristic values in the same low-frequency characteristic group is smaller than the first preset frequency threshold value; based on a second preset frequency threshold, dividing the characteristic values with the characteristic value frequency larger than or equal to the preset high-frequency threshold to obtain at least one high-frequency characteristic group, wherein the difference value between the characteristic value frequencies corresponding to the characteristic values in the same high-frequency characteristic group is smaller than the second preset frequency threshold.

The second preset frequency threshold value is smaller than the first preset frequency threshold value, so that feature values with feature value frequency larger than or equal to the preset high-frequency threshold value can be divided into more feature groups, more initial shared feature vectors are obtained based on the feature groups, an initial click rate prediction model is trained based on the initial shared feature vectors which are rich in representation, and the accuracy of the click rate prediction model is higher.

As a possible implementation manner, after training to obtain the click rate prediction model, data to be predicted may be obtained, where the data to be predicted includes object data, multimedia data, and the like, and the data to be predicted includes features such as object features, advertisement features, context features, and the like. And inputting the data to be predicted into a click rate prediction model, and predicting through the click rate prediction model to obtain the probability that whether the object identified by the object data clicks the multimedia data, namely predicting a click result.

It can be understood that the object data and the multi-candidate multimedia data are combined to obtain a plurality of data to be predicted, so that the probability of clicking the multi-candidate multimedia data by the object can be obtained based on the click rate prediction model, and the multi-media data to be recommended to the object can be determined based on the sorting.

It should be noted that, the computer device with the click rate prediction model may also have cloud computing capability. Cloud computing is realized by means of Cloud technology, wherein Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

In the embodiment of the application, a large amount of sample data can be processed by a big data technology, and the click rate is predicted in real time based on the click rate prediction model, so that the multimedia data of interest is recommended for the object. Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which needs a new processing mode to have stronger decision-making ability, insight discovery ability and flow optimization ability. With the advent of the cloud age, big data has attracted more and more attention, and special techniques are required for big data to effectively process a large amount of data within a tolerant elapsed time. Technologies applicable to big data include massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the internet, and scalable storage systems.

The application also provides a corresponding data processing device for the data processing method, so that the data processing method can be practically applied and realized.

Referring to fig. 6, the structure of a data processing apparatus according to an embodiment of the present application is shown. As shown in fig. 6, the data processing apparatus 600 includes: an acquisition unit 601, a determination unit 602, a division unit 603, a feature extraction unit 604, a prediction unit 605, and an adjustment unit 606;

the acquiring unit 601 is configured to acquire a sample data set including a plurality of features, where each sample data in the sample data set has a real click result;

the determining unit 602 is configured to determine, according to the sample data set, a plurality of feature value frequencies corresponding to a target feature in a plurality of features, where the feature value frequencies are the number of times that a plurality of feature values corresponding to the target feature in the sample data set appear respectively;

the dividing unit 603 is configured to divide, based on a plurality of the feature value frequencies, a plurality of feature values corresponding to the target feature to obtain a plurality of feature groups, where a difference value between feature value frequencies corresponding to feature values in the same feature group is smaller than a first preset frequency threshold;

The feature extraction unit 604 is configured to perform feature extraction on feature groups corresponding to a plurality of features respectively through feature vector layers included in an initial click rate prediction model, so as to obtain initial shared feature vectors corresponding to the feature groups respectively, where feature values in the same feature group correspond to the same initial shared feature vector;

the prediction unit 605 is configured to predict, according to a plurality of the initial shared feature vectors, through an interaction layer included in the initial click rate prediction model, to obtain a predicted click result;

the adjusting unit 606 is configured to adjust model parameters of the initial click rate prediction model according to the difference between the predicted click result and the corresponding real click result, so as to obtain a click rate prediction model.

As a possible implementation manner, the adjusting unit 606 is further configured to:

the length of the initial shared feature vector is adjusted according to a preset length threshold value, and an adaptive feature vector is obtained, wherein the length of the adaptive feature vector is smaller than or equal to the length of the initial shared feature vector;

the prediction unit 605 is specifically configured to:

and predicting through an interaction layer included in the initial click rate prediction model according to the plurality of self-adaptive feature vectors to obtain a predicted click result.

As a possible implementation manner, the initial shared feature vector includes a plurality of feature components, and the adjusting unit 606 is specifically configured to:

And deleting the characteristic components smaller than the preset length threshold value from the characteristic components to obtain the self-adaptive characteristic vector.

As a possible implementation manner, the data processing apparatus 600 further includes a recording unit, configured to:

recording the position of the feature component smaller than the preset length threshold in the initial shared feature vector;

the prediction unit 605 is specifically configured to:

in the self-adaptive feature vector, a preset value is added based on the position to obtain a predicted feature vector;

and predicting through an interaction layer included in the initial click rate prediction model according to the plurality of prediction feature vectors to obtain a predicted click result.

acquiring a preset length threshold corresponding to the initial shared feature vector, wherein the preset length threshold is determined based on the feature value frequency corresponding to the initial shared feature vector, and the higher the feature value frequency corresponding to the initial shared feature vector is, the smaller the preset length threshold corresponding to the initial shared feature vector is;

And continuously deleting the feature components with the length being a preset length threshold corresponding to the initial shared feature vector from the preset component position in the initial shared feature vector to obtain the self-adaptive feature vector.

As a possible implementation manner, the initial click rate prediction model further includes a feature selection layer, and the data processing apparatus 600 further includes a selection unit configured to:

selecting at least one shared feature vector from a plurality of initial shared feature vectors through the feature selection layer, wherein the importance parameter of the shared feature vector is larger than a preset importance threshold;

the prediction unit 605 is specifically configured to:

and predicting through an interaction layer included in the initial click rate prediction model according to at least one shared feature vector to obtain a predicted click result.

As a possible implementation manner, the model parameters of the initial click rate prediction model include feature selection layer parameters and other trainable parameters, and the obtaining unit 601 is further configured to obtain an initial sample data set;

the dividing unit 603 is further configured to divide the initial sample data set into a training set and a verification set;

The training set and the verification set are respectively used as the sample data sets, a sample data set comprising a plurality of features is obtained through an obtaining unit 601, a plurality of feature value frequencies corresponding to target features in a plurality of features are determined through a determining unit 602 according to the sample data set, a plurality of feature values corresponding to the target features are divided through a dividing unit 603 based on the feature value frequencies to obtain a plurality of feature groups, a feature vector layer comprised by an initial click rate prediction model is made through a feature extraction unit 604, feature extraction is respectively carried out on feature groups corresponding to the features to obtain initial shared feature vectors corresponding to the feature groups respectively, feature values in the same feature group correspond to the same initial shared feature vector, and a first predicted click result and a second predicted click result are obtained through prediction through an interaction layer comprised by the initial click rate prediction model according to the plurality of the initial shared feature vectors, wherein the first predicted click result is obtained based on the training set, and the second predicted click result is obtained based on the verification set;

Wherein, the adjusting unit 606 is further configured to:

and adjusting the other trainable parameters according to the difference between the first predicted click result and the corresponding real click result, and adjusting the characteristic selection layer parameters according to the difference between the second predicted click result and the corresponding real click result to obtain a click rate prediction model.

As a possible implementation manner, the determining unit 602 is further configured to:

determining the learning rate corresponding to each characteristic group based on a plurality of characteristic value frequencies, wherein the higher the characteristic value frequency is, the lower the corresponding learning rate is;

the adjusting unit 606 is specifically configured to:

and according to the difference between the predicted click result and the corresponding real click result and a plurality of learning rates, adjusting model parameters of the initial click rate prediction model to obtain a click rate prediction model.

As a possible implementation manner, the dividing unit 603 is specifically configured to:

identifying a category of feature values included in the sample data set for the target feature;

if the types of the feature values included in the sample data set of the target feature are equal to 2, dividing the two feature values corresponding to the target feature into two feature groups respectively;

And if the types of the characteristic values included in the sample data set by the target characteristic are more than 2, dividing a plurality of characteristic values corresponding to the target characteristic based on the characteristic value frequencies to obtain a plurality of characteristic groups.

dividing the characteristic value of which the characteristic value frequency is smaller than a preset high-frequency threshold value based on the first preset frequency threshold value to obtain at least one low-frequency characteristic group, wherein the difference value between the characteristic value frequencies corresponding to the characteristic values in the same low-frequency characteristic group is smaller than the first preset frequency threshold value;

dividing the characteristic value with the characteristic value frequency larger than or equal to the preset high-frequency threshold value based on a second preset frequency threshold value to obtain at least one high-frequency characteristic group, wherein the difference value between the characteristic value frequencies corresponding to the characteristic values in the same high-frequency characteristic group is smaller than the second preset frequency threshold value, and the second preset frequency threshold value is smaller than the first preset frequency threshold value.

As a possible implementation manner, the data processing apparatus 600 further includes an application unit, configured to:

obtaining data to be predicted, wherein the data to be predicted comprises object data and multimedia data;

And predicting according to the data to be predicted through the click rate prediction model to obtain a click rate prediction result aiming at the data to be predicted, wherein the click rate prediction result is used for predicting the probability of clicking the multimedia data by the object identified by the object data.

The embodiment of the application also provides a computer device, which is the computer device introduced above, the computer device can be a server or a terminal device, the data processing device can be built in the server or the terminal device, and the computer device provided by the embodiment of the application is introduced from the aspect of hardware materialization. Fig. 7 is a schematic structural diagram of a server, and fig. 8 is a schematic structural diagram of a terminal device.

Referring to fig. 7, which is a schematic diagram of a server structure according to an embodiment of the present application, the server 1400 may have a relatively large difference between configurations or performances, and may include one or more processors 1422, such as a central processing unit (Central Processing Units, CPU), a memory 1432, one or more application programs 1442, or a storage medium 1430 (e.g., one or more mass storage devices) for data 1444. Wherein the memory 1432 and storage medium 1430 can be transitory or persistent storage. The program stored in the storage medium 1430 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a processor 1422 may be provided in communication with a storage medium 1430 to execute a series of instructions operations on the storage medium 1430 on the server 1400.

The server 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input/output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 7.

Wherein, the CPU 1422 is configured to perform the following steps:

Optionally, the CPU 1422 may also perform method steps of any specific implementation of the data processing method in the embodiment of the present application.

Referring to fig. 8, the structure of a terminal device according to an embodiment of the present application is shown. Fig. 8 is a block diagram illustrating a part of a structure of a smart phone related to a terminal device provided by an embodiment of the present application, where the smart phone includes: radio Frequency (RF) circuitry 1510, memory 1520, input unit 1530, display unit 1540, sensor 1550, audio circuitry 1560, wireless fidelity (WiFi) module 1570, processor 1580, power supply 1590, and the like. Those skilled in the art will appreciate that the smartphone structure shown in fig. 8 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes each component of the smart phone in detail with reference to fig. 8:

the RF circuit 1510 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 1580; in addition, the data of the design uplink is sent to the base station.

The memory 1520 may be used to store software programs and modules, and the processor 1580 implements various functional applications and data processing of the smartphone by running the software programs and modules stored in the memory 1520.

The input unit 1530 may be used to receive input numerical or character information and generate key signal inputs related to user settings and function control of the smart phone. In particular, the input unit 1530 may include a touch panel 1531 and other input devices 1532. The touch panel 1531, also referred to as a touch screen, may collect touch operations on or near the user and drive the corresponding connection device according to a predetermined program. The input unit 1530 may include other input devices 1532 in addition to the touch panel 1531. In particular, other input devices 1532 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 1540 may be used to display information input by a user or information provided to the user and various menus of the smart phone. The display unit 1540 may include a display panel 1541, and optionally, the display panel 1541 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The smartphone may also include at least one sensor 1550, such as a light sensor, a motion sensor, and other sensors. Other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the smart phone are not described in detail herein.

Audio circuitry 1560, speaker 1561, and microphone 1562 may provide an audio interface between a user and a smart phone. The audio circuit 1560 may transmit the received electrical signal converted from audio data to the speaker 1561, and be converted into a sound signal by the speaker 1561 for output; on the other hand, the microphone 1562 converts the collected sound signals into electrical signals, which are received by the audio circuit 1560 for conversion into audio data, which is processed by the audio data output processor 1580 for transmission to, for example, another smart phone via the RF circuit 1510 or for output to the memory 1520 for further processing.

Processor 1580 is a control center of the smartphone, connects various parts of the entire smartphone with various interfaces and lines, performs various functions of the smartphone and processes data by running or executing software programs and/or modules stored in memory 1520, and invoking data stored in memory 1520. In the alternative, processor 1580 may include one or more processing units.

The smart phone also includes a power source 1590 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 1580 via a power management system, such as to provide for managing charging, discharging, and power consumption.

Although not shown, the smart phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In an embodiment of the present application, the memory 1520 included in the smart phone may store program codes and transmit the program codes to the processor.

The processor 1580 included in the smart phone may execute the data processing method provided in the foregoing embodiment according to the instructions in the program code.

The embodiment of the application also provides a computer readable storage medium for storing a computer program for executing the data processing method provided in the above embodiment.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the data processing methods provided in the various alternative implementations of the above aspects.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-Only Memory (ROM), RAM, magnetic disk or optical disk, etc.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

The foregoing is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method of data processing, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

predicting through an interaction layer included in the initial click rate prediction model according to the plurality of initial shared feature vectors to obtain a predicted click result, wherein the method comprises the following steps:

3. The method according to claim 2, wherein the initial shared feature vector includes a plurality of feature components, and the adjusting the length of the initial shared feature vector according to a preset length threshold value, to obtain an adaptive feature vector, includes:

4. A method according to claim 3, characterized in that the method further comprises:

and predicting through an interaction layer included in the initial click rate prediction model according to the plurality of self-adaptive feature vectors to obtain a predicted click result, wherein the method comprises the following steps:

5. The method according to claim 2, wherein the initial shared feature vector includes a plurality of feature components, and the adjusting the length of the initial shared feature vector according to a preset length threshold value, to obtain an adaptive feature vector, includes:

6. The method of claim 1, wherein the initial click-through rate prediction model further comprises a feature selection layer, the method further comprising:

7. The method of claim 6, wherein the model parameters of the initial click-through rate prediction model include feature selection layer parameters and other trainable parameters, the method further comprising:

acquiring an initial sample data set;

dividing the initial sample data set into a training set and a verification set;

respectively taking the training set and the verification set as the sample data set, and executing the steps of determining a plurality of characteristic value frequencies corresponding to target characteristics in a plurality of characteristics according to the sample data set to obtain a first predicted click result and a second predicted click result, wherein the first predicted click result is obtained based on the training set, and the second predicted click result is obtained based on the verification set;

the step of adjusting the model parameters of the initial click rate prediction model according to the difference between the predicted click result and the corresponding real click result to obtain a click rate prediction model comprises the following steps:

8. The method according to claim 1, wherein the method further comprises:

and adjusting model parameters of the initial click rate prediction model according to the difference between the predicted click result and the corresponding real click result to obtain a click rate prediction model, wherein the method comprises the following steps:

9. The method according to claim 1, wherein the dividing the plurality of feature values corresponding to the target feature based on the plurality of feature value frequencies to obtain a plurality of feature groups includes:

10. The method according to claim 1, wherein the dividing the plurality of feature values corresponding to the target feature based on the plurality of feature value frequencies to obtain a plurality of feature groups includes:

11. The method according to any one of claims 1-10, wherein the method further comprises:

12. A data processing apparatus, the apparatus comprising: the device comprises an acquisition unit, a determination unit, a division unit, a feature extraction unit, a prediction unit and an adjustment unit;

13. A computer device, the computer device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-11 according to the computer program.

14. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for executing the method of any one of claims 1-11.

15. A computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of claims 1-11.