WO2024104233A1 - Method and apparatus for predicting cyclic data, and device and medium - Google Patents

Method and apparatus for predicting cyclic data, and device and medium Download PDF

Info

Publication number
WO2024104233A1
WO2024104233A1 PCT/CN2023/130543 CN2023130543W WO2024104233A1 WO 2024104233 A1 WO2024104233 A1 WO 2024104233A1 CN 2023130543 W CN2023130543 W CN 2023130543W WO 2024104233 A1 WO2024104233 A1 WO 2024104233A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
sample
target
periodic
prediction
Prior art date
Application number
PCT/CN2023/130543
Other languages
French (fr)
Chinese (zh)
Inventor
杨迎翔
张思钧
Original Assignee
抖音视界有限公司
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 抖音视界有限公司, 脸萌有限公司 filed Critical 抖音视界有限公司
Publication of WO2024104233A1 publication Critical patent/WO2024104233A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • Example embodiments of the present disclosure generally relate to the field of computer technology, and more particularly to methods, devices, apparatuses, and computer-readable storage media for prediction of periodic data.
  • Periodic or cyclic data is often encountered in a wide range of machine learning scenarios.
  • a recommendation system it can be observed that users can usually log in to the application within a relatively fixed time window every day (for example, before going to bed or after get off work). Based on the recommendation strategy, the recommendations provided to users will have a strong periodic pattern.
  • asset prices may rise and fall cyclically every year, a phenomenon often referred to as "seasonality”.
  • search engines the search popularity or click volume of certain keywords can also show periodic patterns. Therefore, machine learning models are able to track and learn such periodic data and give correct prediction results.
  • a method for predicting periodic data comprises: obtaining a trained prediction model, the prediction model being configured to process input data having a target period, the prediction model being trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating a time at which the first annotation information is obtained within the target period; obtaining the target data sample; The target sample time and the associated target sample time indicate the time when the target data sample is obtained within the target period; and using the prediction model, determining a prediction result for the target data sample based on the target data sample and the target sample time.
  • an electronic device in a third aspect of the present disclosure, includes at least one processing unit; and at least one memory, the at least one memory is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit. When the instructions are executed by the at least one processing unit, the device executes the method of the first aspect.
  • a computer-readable storage medium wherein a computer program is stored on the medium, and when the computer program is executed by a processor, the method of the first aspect is implemented.
  • FIG1 shows a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented
  • Figure 2 shows a comparison of the example model training process
  • FIG3 illustrates an architecture for prediction of periodic data according to some embodiments of the present disclosure
  • FIG4 illustrates a model training process according to some embodiments of the present disclosure
  • FIG5 shows an example structure of a prediction model according to some embodiments of the present disclosure
  • FIG6A shows prediction results using the periodic modeling part based on Fourier expansion under some example prediction tasks
  • FIG6B shows an example pattern of a set of periodic kernel functions according to some embodiments of the present disclosure
  • FIG6C shows an example pattern of a set of periodic kernel functions according to some other embodiments of the present disclosure.
  • FIG. 7A and 7B illustrate the detailed structure of a prediction model according to some embodiments of the present disclosure
  • FIG8 shows a flow chart of a process for prediction of periodic data according to some embodiments of the present disclosure
  • FIG9 shows a block diagram of an apparatus for prediction of periodic data according to some embodiments of the present disclosure.
  • FIG. 10 illustrates a block diagram of an electronic device in which one or more embodiments of the present disclosure may be implemented.
  • a prompt message is sent to the user to clearly prompt the user that the operation requested to be performed will require obtaining and using the user's personal information, so that the user can independently choose whether to provide personal information to software or hardware such as electronic devices, applications, servers or storage media that execute operations of the technical solution of the present disclosure based on the prompt message.
  • the prompt information in response to receiving an active request from the user, is sent to the user in the form of a pop-up window, in which the prompt information can be presented in text form.
  • the pop-up window can also carry a selection control for the user to choose "agree” or “disagree” to provide personal information to the electronic device.
  • model can learn the association between the corresponding input and output from the training data, so that after the training is completed, the corresponding output can be generated for a given input.
  • the generation of the model can be based on machine learning technology.
  • Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs by using multi-layer processing units.
  • a neural network model is an example of a model based on deep learning.
  • model may also be referred to as “machine learning model”, “learning model”, “machine learning network” or “learning network”, and these terms are used interchangeably in this article.
  • a neural network is a machine learning network based on deep learning.
  • a neural network can process input and provide corresponding output. It usually includes an input layer and an output layer and one or more hidden layers between the input layer and the output layer. The network usually includes many hidden layers, thereby increasing the depth of the network.
  • the layers of the neural network are connected in sequence, so that the output of the previous layer is provided as the input of the next layer, where the input layer receives the input of the neural network and the output of the output layer serves as the final output of the neural network.
  • Each layer of the neural network includes one or more nodes (also called processing nodes or neurons), each of which processes the input from the previous layer.
  • machine learning can be roughly divided into three stages, namely the training stage, the verification stage, and the application stage (also called the inference stage).
  • the training stage a given model can be trained using a large amount of training data, and the parameter values are continuously updated iteratively until the model can obtain consistent inferences that meet the expected goals from the training data.
  • the model can be considered to be able to learn the association from input to output (also called input-to-output mapping) from the training data.
  • the parameter values of the trained model are determined.
  • the verification stage the verification input is applied to the trained model to verify whether the model can provide the correct output, thereby determining the performance of the model.
  • the model can be used to process the actual input based on the parameter values obtained from the training to determine the corresponding output.
  • FIG1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented.
  • the environment 100 includes a model training system 110 and a model application system 120.
  • the model training system 110 is configured to train a prediction model 105 using a training data set 112.
  • the model application system 120 can be configured to apply the trained prediction model 105.
  • the prediction model 105 can be configured to process input data and determine corresponding prediction results. At each prediction, the prediction model 105 receives an input data sample and generates a prediction result corresponding to the data sample.
  • data sample refers to the unit granularity of input data that the prediction model 105 can process.
  • the prediction task to be performed by the prediction model 105 can be defined according to the actual application.
  • the prediction task is to predict the conversion result of the recommended item and determine whether to recommend the item to the user based on the predicted conversion result.
  • the recommended item can be any content or resource to be recommended, and examples thereof may include applications, physical goods, virtual goods, audio and video content, etc.
  • the conversion result of the recommended item can be defined based on the specific item and actual needs. Some example conversion results may include clicks, downloads, Registration, adding to shopping cart, payment, activation, or other resource demand behaviors.
  • the data sample input to the prediction model 105 may include at least information related to the recommended item. In some cases, the data sample may also include information related to the user to be recommended.
  • the prediction result output by the prediction model 105 may be the probability that the recommended item will be converted when recommended, or the probability that a specific user will be converted for the recommended item, etc.
  • the prediction task of the prediction model 105 may be to predict the sales volume of a product at a future time.
  • the data samples input to the prediction model 105 may include further time, information related to the product and/or other related products, historical sales volume of the product and/or other related products, information related to the target geographic area and target users of the product, etc.
  • the output of the prediction model 105 may include the predicted sales volume of the product at a certain time.
  • the prediction model 105 can be configured to implement any other prediction tasks.
  • the application scenario of the recommendation system is used as an example, but it should be understood that the embodiments of the present disclosure can be applied to other prediction tasks with similar characteristics.
  • the prediction model 105 may be constructed to process input data samples and generate output as a function of the prediction result.
  • the prediction model 105 may be configured with a set of parameters, the values of which are learned from the training data through a training process.
  • the training data set 112 used may include training data samples 114 provided to the prediction model 105, and annotation information 116 indicating the corresponding true prediction results of the training data samples 114.
  • FIG. 1 only shows a pair of training data samples and their annotation information, a certain number of training data samples and annotation information may be required during training.
  • an objective function is used to measure the error (or distance) between the output given by the prediction model 105 for the training data sample 114 and the annotation information 116.
  • This error is also called the loss of machine learning, and the objective function can also be called the loss function.
  • the loss function can be expressed as l(f(x), y), where x represents the training data sample, f() represents the machine learning model, f(x) represents the output of the prediction model, and y represents the annotation information of x, indicating the true prediction result of x.
  • the parameter values of the prediction model 105 are updated to reduce the error from the objective function.
  • the learning objective is achieved when the objective function is optimized, for example, the calculated error is minimized or reaches a desired threshold.
  • the trained prediction model 105 configured with updated parameter values may be provided to the model application system 120 , which applies the target data sample 122 that actually needs to be predicted to the prediction model 105 to output a prediction result 124 of the target data sample 122 .
  • the model training system 110 and the model application system 120 can be any system with computing capabilities, such as various computing devices/systems, terminal devices, servers, etc.
  • the terminal device can be any type of mobile terminal, fixed terminal or portable terminal, including a mobile phone, a desktop computer, a laptop computer, a notebook computer, a netbook computer, a tablet computer, a media computer, a multimedia tablet, or any combination of the foregoing, including accessories and peripherals of these devices or any combination thereof.
  • Servers include but are not limited to mainframes, edge computing nodes, computing devices in cloud environments, and the like.
  • FIG. 1 the components and arrangements in the environment shown in FIG. 1 are merely examples, and a computing system suitable for implementing the example embodiments described in the present disclosure may include one or more different components, other components, and/or different arrangements.
  • the model training system 110 and the model application system 120 may be integrated into the same system or device.
  • the embodiments of the present disclosure are not limited in this respect.
  • the input data processed by the machine learning model may have a certain periodicity.
  • This kind of data is called periodic or cyclical data.
  • the behavior of the user group is periodic over time.
  • the users of the application may usually log in to the application in a relatively fixed time window every day (for example, before bed and after get off work) or use the application frequently on weekends, and show the same interests in the same time window on different days.
  • the impact of these regular behaviors on the prediction model is that the prediction model is often very similar at the same time in two adjacent cycles.
  • f(x) the prediction results will be difficult to reflect the periodic characteristics. Therefore, we need to use a method that can model periodic data.
  • the prediction model for processing periodic data can be expressed as f(x,t), where x represents the data sample input to the model and t represents the time when the data sample x is obtained in a period. In this way, different prediction strategies can be implemented for data samples at different times.
  • the characteristics of the input data that need to be predicted may also change within a period. For example, during a period of time, users always show interest in certain content during the day, but during another period of time, users may show corresponding interests at night, or show interest in other content during the day.
  • This requires continuous updating of the prediction model so that the model can track the updated characteristics of the input data to be processed. Therefore, after a training is completed, the prediction model that has been put into use may need to be retrained using new training data. In this case, the training of the prediction model can actually be considered to have two stages. The first stage is the stage of model training using a large amount of historical training data, which is called batch training. After the trained model is put into use, the model continues to be updated based on the subsequent training data. This process is called streaming training.
  • the problem of exploiting the periodicity of input data to train a better prediction model can be set up as follows. Given samples represented by triples (x, y, t), where x is a training data sample for model input, y is the prediction result of the training data sample, and t is the time when the training data sample x is obtained within a certain period, it is desired to learn a prediction model (denoted as f) that can predict any given time t.
  • the training data samples may also arrive at the model training system in a cyclic manner. More specifically, between two consecutive updates of the model, only samples that are available within the update interval can be used for training. This may lead to certain estimation errors in the learning process of the prediction model that models periodic data.
  • the conversion result corresponding to this recommendation will be fed back to the recommendation platform.
  • the conversion result is fed back to the recommendation platform, which is called conversion return.
  • the time when the recommended item is provided is called the “sending time” of the recommended item, and the feedback time of the conversion result is called the "return time" of the recommended item.
  • conversion return flow delay Sometimes the conversion result corresponding to a recommendation may not always be fed back in real time.
  • conversion return flow delay There are many reasons for the conversion return flow delay, such as the conversion behavior of some recommended items occurs after a period of time after the recommended items are received (for example, payment behavior, activation behavior, etc.), or the conversion results are deliberately delayed to achieve privacy protection, etc.
  • the specific conversion return flow delay is different for different recommended items or different conversion results. For example, for a certain recommended item, if the user does not convert, the conversion result indicating no conversion may be fed back in real time. If the user performs the conversion after a period of time, the conversion result indicating a successful conversion may be fed back after a delay of a period of time.
  • a more extreme example is used to illustrate the error caused by the delay of the training sample.
  • f(x) which is used as a benchmark.
  • the model does not model the periodicity of the data, but is continuously updated with the latest annotated data.
  • f(x, t) which models the periodicity of the data.
  • these two types of samples are used together to update the prediction model.
  • the sample mean learned by the prediction model 210 is 0.5.
  • the delay of the labeling information of the negative sample is large and the delay of the labeling information of the positive sample is small, then updating the prediction model 220 will lead to an overestimation of the prediction result (for example, 1>0.5).
  • the training of the prediction model 220 will have the same effect as that of the model 210.
  • the inventors have found through a large number of research experiments that if the acquisition time of the training data sample is used as a time feature, there will be prediction errors in the AB experiment. This is because, during the AB experiment, the currently trained prediction model needs to be applied immediately to estimate the currently arriving data samples, which will cause samples with longer feedback delays to be more difficult to be used in time for model learning and updating, resulting in the model estimation performance observed during the AB experiment being mainly determined by samples fed back in a short time. If there is a large difference between samples with longer feedback delays and samples with shorter feedback delays, there will be a difference between the estimated performance of the prediction model on the online instant arrival data and the estimated performance on the offline data that has been fully fed back.
  • an improved solution is provided to solve the problem of processing peripheral The problem of updating the prediction model of periodic data.
  • the prediction model configured to process input data with a target period
  • the time of obtaining the annotation information corresponding to the training data sample is used as the time feature of the training data sample to perform model training.
  • the input of the prediction model is the training data sample itself and the sample time
  • the sample time indicates the time of obtaining the annotation information within the target period. In this way, the update of the prediction model can be determined based on the obtained annotation information.
  • the trained prediction model is applied to process the target data sample and determine the corresponding prediction result, and the time feature of the target data sample is the time of obtaining the data sample within the target period.
  • FIG3 shows an architecture for prediction of periodic data according to some embodiments of the present disclosure.
  • the prediction model 310 is configured to process input data with a target period.
  • Such a prediction model will be constructed to take data samples and sample times as model inputs and output corresponding prediction results.
  • the target period can be set depending on the specific application, and the sample time refers to a time within the target period, which can be a time point or time period of any granularity within the target period.
  • the following first describes the training of the model, and then discusses the detailed architecture of the prediction model 310 with periodic modeling capabilities.
  • the training process for the prediction model 310 is performed using newly collected training data.
  • the training data of the prediction model 310 may include training data samples 302-1, 302-2, ... 302-N (N is an integer greater than or equal to 1), which are collectively or individually referred to as training data samples 302 for ease of discussion.
  • each training data sample also requires corresponding annotation information 304-1, 304-2, ... 304-N, which are collectively or individually referred to as annotation information 304 for ease of discussion.
  • Annotation information 304 Indicates the actual prediction result of the corresponding training data sample 302. For example, in a recommendation scenario, if the training data sample 302 is information related to a recommended item, the annotation information 304 may indicate the actual conversion result of the recommended item.
  • the prediction model 310 processes the input training data sample 302 and the corresponding sample time based on the current model parameters and gives a prediction result.
  • the update module 312 can update the prediction model 310 based on the prediction result of the training data sample 302 and the error between the corresponding annotation information 304. Through iterative updates, the prediction model 310 can learn the characteristics exhibited by the training data sample 302, so that more accurate prediction results can be given in the future.
  • the annotation information obtained at the current time t corresponds to the acquisition time of the training data sample t- ⁇ , where ⁇ represents the delay time.
  • the acquisition time of the training data sample is relied upon as the sample time input to the prediction model, it will cause the trained prediction model to have prediction errors on some data samples for which the annotation information has not been fully fed back.
  • the prediction model 310 among the training data samples 302 used in the training process, at least some of the training data samples 302 have sample times indicating the acquisition time of the corresponding annotation information 304 in the target period during the training process.
  • the acquisition time of the annotation information that has been delayed can be used as the sample time of the training data sample, and used together to train the prediction model 310.
  • the prediction model 310 can learn the characteristics of the periodic input data from more and more comprehensive training data for the current time, thereby preventing the prediction model from overestimating or underestimating certain data samples during the model application process.
  • the acquisition time of these data samples 302 can be directly input into the prediction model 310 as the sample time. That is, the input of the prediction model 310 can be divided into two categories.
  • the first category of input includes the first training data sample and the first sample time, and the first sample time indicates the acquisition time of the annotation information of the first training data sample.
  • the second category of input includes the second training data sample and the first sample time. This and the second sample time, the second sample time indicates the time when the second training data sample is obtained.
  • the input of the prediction model 310 can also always use the time when the training data sample and the labeling information of the training data sample are obtained (ie, the first sample time).
  • the prediction model 310 is configured to predict the conversion result of the recommended item.
  • the first training data sample at least indicates the relevant information of the training recommendation item (and may also indicate the relevant information of the user), and the first sample time indicates the time when the real conversion result of the training recommendation item is obtained within the target period.
  • the real conversion result is the annotation information of the first training data sample, and its acquisition time is delayed relative to the time when the training recommendation item is recommended.
  • a plurality of first training data samples, their annotation information, and their respective first sample times can be used to train the prediction model 310.
  • a second training data sample can also be used in addition, wherein the second training data sample also at least indicates the relevant information of the training recommendation item (and may also indicate the relevant information of the user), and the associated second sample time indicates the time when the training recommendation item is obtained within the target period.
  • FIG. 4 is in a scenario similar to the example of FIG. 2.
  • the negative sample 232 can also be obtained for model training.
  • Figure 4 shows that the positive sample annotation information has feedback delay
  • the negative sample annotation information may have feedback delay
  • the positive and negative sample annotation information may have the same or different feedback delays.
  • the model training method proposed in the embodiment of the present disclosure can be applied to any scenario where there is a delay between the time when the input data is obtained and the time when its annotation information is obtained when learning periodic data.
  • the training of the prediction model 310 may be implemented by, for example, the model training system 110 in the environment 100 of FIG. 1 , and the update module 312 may be implemented as a part of the model training system 110 .
  • the trained prediction model 310 can be put into application, for example, it can be applied by the model application system 120 of Figure 1.
  • the target data sample 304 to be predicted and the corresponding target sample time are obtained as inputs of the prediction model 310, and the target sample indicates the time when the target data sample 304 is obtained.
  • the target data sample can at least indicate the relevant information of the target recommendation item to be recommended (and may also indicate the relevant information of the user), and the target sample time can indicate the time when the target recommendation item is to be recommended within the target period.
  • the prediction model 310 is used to determine the prediction result for the target data sample based on the target data sample and the target sample time.
  • the training process for the prediction model 310 may be repeatedly performed at a certain time interval or according to other conditions. Each time the training is performed, the corresponding training data may be obtained in a similar manner as discussed above to perform model updating.
  • the prediction model 310 may include at least a non-periodic modeling part, a periodic modeling part, and an output layer.
  • FIG5 shows an example structure of the prediction model 310, which includes a non-periodic modeling part 510, a periodic modeling part 520, and an output layer 530.
  • the non-periodic modeling part 510 is configured to extract intermediate feature representations from input data samples.
  • the input data samples are training data samples; in the application phase, the input data samples are target data samples.
  • the non-periodic modeling part 510 is used to learn the non-periodic part of the model input.
  • the periodic modeling part 520 is configured to process the intermediate feature representation based on the sample time corresponding to the data sample in the target period to obtain the periodic feature representation.
  • the periodic modeling part 520 is used to learn the periodic part of the model input.
  • the sample time is the time when the annotation information of the training data samples is obtained, and for other training data samples, the sample time is the time when the training data samples are obtained.
  • the sample time input to the periodic modeling part 520 is the time when the target data samples are obtained.
  • the output layer 530 in the prediction model 310 is configured to determine the prediction result for the data sample based on at least the periodic feature representation.
  • the periodic modeling portion 520 can be configured to process the intermediate feature representation provided by the non-periodic modeling portion based on the sample time using a Fourier expansion function.
  • a periodic modeling portion 520 can be referred to as a Fourier layer.
  • the model constructed based on Fourier learning can intuitively utilize the periodicity of the training data and can be expressed as a periodic function with periodicity. Therefore, Fourier learning can be applied to prediction models based on machine learning.
  • the periodic modeling portion 520 based on Fourier expansion can be expressed as follows:
  • N is a hyperparameter
  • T represents the target period of the input data to be processed by the prediction model 310 (which is also a hyperparameter)
  • t is the sample time
  • x is the input of the periodic modeling part, that is, the intermediate feature representation obtained from the non-periodic modeling part.
  • the periodic modeling part 520 can be constructed to implement the Fourier expansion shown in the above formula (1) to obtain a periodic feature representation.
  • the output of the periodic modeling part 520 will be provided to be mapped to the prediction result by the output layer 530.
  • the introduction of the periodic modeling part 520 based on Fourier expansion can allow more accurate prediction results to be generated by considering the periodicity in the input data.
  • the periodic modeling part based on Fourier expansion may mainly learn the high-frequency components and noise in the input data, while the output of the original model will still play the main role of learning the periodic information in the data, which will lead to the prediction results not meeting expectations in actual predictions.
  • the periodic Gaussian kernel function can also be used to directly model the periodicity of the input data.
  • the periodic modeling part 520 based on the periodic Gaussian kernel function can be expressed as follows:
  • K(x, y) is the Gaussian kernel function: p is half of the target period of the input data to be processed, that is, 2p represents the target period, and l is a hyperparameter.
  • t represents the sample time
  • tn is a hyperparameter
  • x is the input of the periodic modeling part, that is, the intermediate feature representation obtained from the non-periodic modeling part. It can be seen that the periodic Gaussian kernel function only needs to model the component corresponding to the sin function in the Fourier expansion.
  • the periodic modeling part 520 based on the periodic Gaussian kernel function can well learn the periodicity of the input data.
  • K( tj , ti ) K( ti , ti ).
  • the model of each hour is represented by a Gaussian kernel function centered at tn , and the relationship between the kernel functions is not large.
  • the size of the kernel function is determined by the parameter l.
  • FIG6C shows the output of each a n (x)K(t, t n ) when the Gaussian kernel function is used in the recommendation-related prediction model. It can be seen that compared with FIG6A , the output in FIG6C has obvious periodicity.
  • the coefficient an (x) will determine the impact of the periodic Gaussian kernel function of the 24 center points at 0-23 hours on the sample, among which the kernel function corresponding to the tn closest to t has the greatest impact, while the kernel function far away from t has a relatively small effect.
  • the comprehensive effect is obtained by linearly combining all kernel functions according to the weight an (x).
  • different periodic Gaussian kernel functions will be responsible for estimating the sample at different times within a day, thereby realizing the model's periodic estimation of the same sample feature x at different sample times t.
  • the periodic Gaussian kernel function can switch the meaning of the prediction model from the frequency domain back to the time domain, especially, it can switch the physical meaning expressed by a n from the frequency domain back to the time domain.
  • a n represents the energy corresponding to the nth frequency component
  • a n represents the weight of the kernel function centered at the nth hour in the estimation.
  • a periodic Gaussian kernel function can also be regarded as a linear combination of a set of Fourier functions according to a set of predetermined proportions.
  • the periodic Gaussian kernel function can effectively prevent the phenomenon of periodicity not being learned well in the Fourier expansion scheme due to the inability to learn individual frequency components well.
  • the periodic Gaussian kernel function is smoother, and the introduction of high-frequency components and the occurrence of overfitting in the modeling process can be naturally prevented by controlling the bandwidth l of the kernel function. Therefore, After adding the periodic modeling part based on the periodic Gaussian kernel function, the prediction model is closer to the original baseline model. In addition, it can make the model lighter and add fewer parameters to the model.
  • the non-periodic modeling part and the periodic modeling part may have multiple deployment modes.
  • the non-periodic modeling part 510 may include multiple prediction parts, each of which may be configured to provide an intermediate prediction result.
  • the periodic modeling part 510 can be constructed as one of the prediction parts, and the output of the periodic modeling part 510 is used as an intermediate prediction result, and the intermediate prediction results of other prediction parts are aggregated to form a final prediction result.
  • FIG. 7A shows such an example of a prediction model 310.
  • the prediction model 310 includes one or more prediction parts 710-1, 710-2, ... 710-M (M is an integer greater than or equal to 1), collectively or individually referred to as a prediction part 710.
  • Each prediction part 710 can be constructed based on different machine learning modeling methods, and can be processed with a data sample as input to obtain an intermediate feature representation, and provided to the output layer 530.
  • the prediction model 310 also includes a shared part 712, which is configured to extract an intermediate feature representation from the data sample and provide the intermediate feature representation to a non-shared part 714 and a periodic modeling part 520.
  • the shared part 712 and the non-shared part 714 can be constructed based on a deep learning model, for example.
  • the non-shared part 714 processes the intermediate feature representation and extracts a further intermediate feature representation to provide to the output layer 530.
  • the periodic modeling part 520 processes the intermediate feature representations from the shared part 712 and the sample time of the data samples and provides the periodic feature representations to the output layer 530.
  • the output layer 530 aggregates the feature representations from various parts and maps them to prediction results.
  • multiple prediction parts in the non-periodic modeling part can each process the input data sample and provide multiple intermediate prediction results. These intermediate prediction results can be cascaded as intermediate feature representations and input to the periodic prediction part 530.
  • Figure 7B shows such an example of a prediction model 310.
  • the prediction model 310 includes multiple prediction parts 710-1, 710-2,...710-M and a prediction part 716. These prediction parts process the input data samples respectively to obtain multiple intermediate prediction results, which are cascaded to obtain a cascade 720 of intermediate prediction results.
  • the cascade 720 of intermediate prediction results is input to the periodic modeling part 520.
  • the periodic modeling part 520 also receives data samples The sample time is used to provide accurate feature extraction based on the learned periodicity of the input data, which helps to accurately predict the results.
  • the periodicity modeling part 520 determines the periodic feature representation and provides it to the output layer 530 for determining the prediction result.
  • the periodicity modeling part 520 may be deployed in other ways in the prediction model, which is not limited in this document.
  • FIG8 shows a flow chart of a process 800 for prediction of periodic data according to some embodiments of the present disclosure.
  • the process 800 may be implemented, for example, at the model application system 120 of FIG1 .
  • the model application system 120 obtains a trained prediction model, the prediction model being configured to process input data having a target period.
  • the prediction model is trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating a time at which the first annotation information is obtained within the target period.
  • the training of the prediction model can be implemented, for example, at the model training system 120.
  • the model application system 110 can obtain the trained prediction model from the model training system 110.
  • the model application system 120 obtains a target data sample and an associated target sample time, the target sample time indicating the time at which the target data sample is obtained within a target period.
  • the model application system 120 uses the prediction model to determine a prediction result for the target data sample based on the target data sample and the target sample time.
  • the prediction model includes a non-periodic modeling part, a periodic modeling part, and an output layer.
  • the non-periodic modeling part is configured to extract an intermediate feature representation from an input data sample.
  • the periodic modeling part is configured to process the intermediate feature representation based on a sample time corresponding to the data sample within a target period to obtain a periodic feature representation.
  • the output layer is configured to determine a prediction result for the data sample based at least on the periodic feature representation.
  • the periodic modeling portion is configured to process the intermediate feature representation based on the sample time using a periodic Gaussian kernel function.
  • the periodic Gaussian kernel function is represented as a kernel function corresponding to the corresponding time in the target period.
  • the period modeling portion is configured to process the intermediate feature representation based on the sample time using a Fourier expansion function.
  • the non-periodic modeling part includes multiple prediction parts, which are configured to process input data samples and output multiple intermediate prediction results, and the multiple intermediate prediction results are cascaded as intermediate feature representations.
  • the prediction model is configured to predict the conversion result of the recommended item.
  • the first training data sample indicates at least the relevant information of the training recommended item, and the first sample time indicates the time when the real conversion result of the training recommended item is obtained within the target period.
  • the target data sample indicates at least the relevant information of the target recommended item to be recommended, and the target sample time indicates the time when the target recommended item is to be recommended within the target period.
  • the time when the actual conversion result is obtained is delayed relative to the time when the training recommendation item is recommended.
  • the prediction model is further trained during the first training process based on second training data samples, associated second sample times, and second annotation information, where the second sample time indicates a time at which the second annotation information is obtained within the target period.
  • FIG. 9 shows a schematic structural block diagram of an apparatus 900 for predicting periodic data according to some embodiments of the present disclosure.
  • the apparatus 900 may be implemented as or included in the model application system 120.
  • Each module/component in the apparatus 900 may be implemented by hardware, software, firmware, or any combination thereof.
  • the device 900 includes a model acquisition module 910, which is configured to acquire a trained prediction model, the prediction model is configured to process input data with a target period, and the prediction model is trained in a first training process based on at least a first training data sample, an associated first sample time, and a first annotation information, and the first sample time indicates the time when the first annotation information is obtained within the target period.
  • the device 900 also includes a target acquisition module 920, which is configured to acquire a target data sample and an associated target sample time, and the target sample time indicates the time when the target data sample is obtained within the target period.
  • the device 900 also includes a prediction execution module 930, which is configured to use the prediction model to determine a prediction result for the target data sample based on the target data sample and the target sample time.
  • the prediction model includes a non-periodic modeling part, a periodic modeling part, and an output layer.
  • the non-periodic modeling part is configured to extract intermediate feature representations from input data samples.
  • the periodic modeling part is configured to generate an output layer based on the data samples in the target period.
  • the intermediate feature representation is processed at the corresponding sample time to obtain a periodic feature representation.
  • the output layer is configured to determine a prediction result for the data sample based at least on the periodic feature representation.
  • the periodic modeling portion is configured to process the intermediate feature representation based on the sample time using a periodic Gaussian kernel function.
  • the periodic Gaussian kernel function is represented as a kernel function corresponding to the corresponding time in the target period.
  • the period modeling portion is configured to process the intermediate feature representation based on the sample time using a Fourier expansion function.
  • the non-periodic modeling part includes multiple prediction parts, which are configured to process input data samples and output multiple intermediate prediction results, and the multiple intermediate prediction results are cascaded as intermediate feature representations.
  • the prediction model is configured to predict the conversion result of the recommended item.
  • the first training data sample indicates at least the relevant information of the training recommended item, and the first sample time indicates the time when the real conversion result of the training recommended item is obtained within the target period.
  • the target data sample indicates at least the relevant information of the target recommended item to be recommended, and the target sample time indicates the time when the target recommended item is to be recommended within the target period.
  • the time when the actual conversion result is obtained is delayed relative to the time when the training recommendation item is recommended.
  • the prediction model is further trained during the first training process based on second training data samples, associated second sample times, and second annotation information, where the second sample time indicates a time at which the second annotation information is obtained within the target period.
  • FIG10 shows a block diagram of an electronic device 1000 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the electronic device 1000 shown in FIG10 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein.
  • the electronic device 1000 shown in FIG10 may be used to implement the model training system 110 and/or the model application system 120.
  • the electronic device 1000 may include or be implemented as the device 900 of FIG9 .
  • the electronic device 1000 is in the form of a general-purpose computing device.
  • the components of the electronic device 1000 may include, but are not limited to, one or more processors or processing units 1010, Memory 1020, storage device 1030, one or more communication units 1040, one or more input devices 1050, and one or more output devices 1060.
  • Processing unit 1010 may be a real or virtual processor and is capable of performing various processes according to a program stored in memory 1020. In a multi-processor system, multiple processing units execute computer executable instructions in parallel to improve the parallel processing capability of electronic device 1000.
  • the electronic device 1000 typically includes a plurality of computer storage media. Such media may be any available media accessible to the electronic device 1000, including but not limited to volatile and non-volatile media, removable and non-removable media.
  • the memory 1020 may be a volatile memory (e.g., a register, a cache, a random access memory (RAM)), a non-volatile memory (e.g., a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof.
  • the storage device 1030 may be a removable or non-removable medium, and may include a machine-readable medium, such as a flash drive, a disk, or any other medium, which may be capable of being used to store information and/or data (e.g., training data for training) and may be accessed within the electronic device 1000.
  • a machine-readable medium such as a flash drive, a disk, or any other medium, which may be capable of being used to store information and/or data (e.g., training data for training) and may be accessed within the electronic device 1000.
  • the electronic device 1000 may further include additional removable/non-removable, volatile/non-volatile storage media.
  • a disk drive for reading or writing from a removable, non-volatile disk e.g., a “floppy disk”
  • an optical drive for reading or writing from a removable, non-volatile optical disk may be provided.
  • each drive may be connected to a bus (not shown) by one or more data media interfaces.
  • the memory 1020 may include a computer program product 1025 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
  • the communication unit 1040 implements communication with other electronic devices through a communication medium. Additionally, the functions of the components of the electronic device 1000 can be implemented in a single computing cluster or multiple computing machines that can communicate through a communication connection. Therefore, the electronic device 1000 can operate in a networked environment using a logical connection with one or more other servers, a network personal computer (PC), or another network node.
  • PC network personal computer
  • the input device 1050 may be one or more input devices, such as a mouse, a keyboard, a tracking ball, etc.
  • the output device 1060 may be one or more output devices, such as a display, Speakers, printers, etc.
  • the electronic device 1000 can also communicate with one or more external devices (not shown) through the communication unit 1040 as needed, such as storage devices, display devices, etc., communicate with one or more devices that allow users to interact with the electronic device 1000, or communicate with any device that allows the electronic device 1000 to communicate with one or more other electronic devices (e.g., a network card, a modem, etc.). Such communication can be performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • a computer-readable storage medium on which computer-executable instructions are stored, wherein the computer-executable instructions are executed by a processor to implement the method described above.
  • a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
  • These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processing unit of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement
  • each square box in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of a module, program segment or instruction includes one or more executable instructions for realizing the logical function of the specification.
  • the function marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in reverse order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be realized by a special hardware-based system that performs the function or action of the specification, or can be realized by a combination of special hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

According to the embodiments of the present disclosure, a method and apparatus for predicting cyclic data, and a device and a medium are provided. The method comprises: acquiring a trained prediction model, wherein the prediction model is configured to process input data having a target cycle, the prediction model is trained during a first training process at least on the basis of a first training data sample, an associated first sample time, and first annotation information, and the first sample time indicates the time at which the first annotation information is obtained within the target cycle; acquiring a target data sample and an associated target sample time, wherein the target sample time indicates the time at which the target data sample is obtained within the target cycle; and using the prediction model to determine a prediction result for the target data sample on the basis of the target data sample and the target sample time.

Description

用于周期性数据的预测的方法、装置、设备和介质Method, device, apparatus and medium for prediction of periodic data
本申请要求2022年11月18日递交的,标题为“用于周期性数据的预测的方法、装置、设备和介质”、申请号为202211447814.0的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese invention patent application entitled “Methods, devices, equipment and media for prediction of periodic data” filed on November 18, 2022 and application number 202211447814.0, the entire contents of which are incorporated by reference into this application.
技术领域Technical Field
本公开的示例实施例总体涉及计算机技术领域,特别地涉及用于周期性数据的预测的方法、装置、设备和计算机可读存储介质。Example embodiments of the present disclosure generally relate to the field of computer technology, and more particularly to methods, devices, apparatuses, and computer-readable storage media for prediction of periodic data.
背景技术Background technique
在广泛的机器学习场景中经常会遇到周期性或循环性数据。例如,在推荐***中,可以观察到用户通常可以在每天相对固定的时间窗口内(例如,睡前或下班后)登录应用程序。基于推荐策略,向用户提供的推荐内容将会具有强烈的周期性模式。在金融市场中,资产价格可能每年周期性地上涨和下跌,这一现象通常被称为“季节性”。在搜索引擎中,某些关键词的搜索热度或点击量也可以显示周期性的模式。因此,机器学习模型能够追踪和学习这样的周期性数据,并给出正确的预测结果。Periodic or cyclic data is often encountered in a wide range of machine learning scenarios. For example, in a recommendation system, it can be observed that users can usually log in to the application within a relatively fixed time window every day (for example, before going to bed or after get off work). Based on the recommendation strategy, the recommendations provided to users will have a strong periodic pattern. In the financial market, asset prices may rise and fall cyclically every year, a phenomenon often referred to as "seasonality". In search engines, the search popularity or click volume of certain keywords can also show periodic patterns. Therefore, machine learning models are able to track and learn such periodic data and give correct prediction results.
发明内容Summary of the invention
在本公开的第一方面,提供了一种用于周期性数据的预测的方法。该方法包括:获取经训练的预测模型,预测模型被配置为处理具有目标周期的输入数据,预测模型在第一训练过程中至少基于第一训练数据样本、相关联的第一样本时间以及第一标注信息而被训练,第一样本时间指示在目标周期内第一标注信息的获得时间;获取目标数据样 本和相关联的目标样本时间,目标样本时间指示在目标周期内目标数据样本的获得时间;以及利用预测模型,基于目标数据样本和目标样本时间来确定针对目标数据样本的预测结果。In a first aspect of the present disclosure, a method for predicting periodic data is provided. The method comprises: obtaining a trained prediction model, the prediction model being configured to process input data having a target period, the prediction model being trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating a time at which the first annotation information is obtained within the target period; obtaining the target data sample; The target sample time and the associated target sample time indicate the time when the target data sample is obtained within the target period; and using the prediction model, determining a prediction result for the target data sample based on the target data sample and the target sample time.
在本公开的第二方面,提供了一种用于周期性数据的预测的装置。该装置包括:模型获取模块,被配置为获取经训练的预测模型,预测模型被配置为处理具有目标周期的输入数据,预测模型在第一训练过程中至少基于第一训练数据样本、相关联的第一样本时间以及第一标注信息而被训练,第一样本时间指示在目标周期内第一标注信息的获得时间;目标获取模块,被配置为获取目标数据样本和相关联的目标样本时间,目标样本时间指示在目标周期内目标数据样本的获得时间;以及预测执行模块,被配置为利用预测模型,基于目标数据样本和目标样本时间来确定针对目标数据样本的预测结果。In a second aspect of the present disclosure, a device for predicting periodic data is provided. The device includes: a model acquisition module configured to acquire a trained prediction model, the prediction model is configured to process input data with a target period, the prediction model is trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating the time when the first annotation information is obtained within the target period; a target acquisition module configured to acquire a target data sample and an associated target sample time, the target sample time indicating the time when the target data sample is obtained within the target period; and a prediction execution module configured to use the prediction model to determine a prediction result for the target data sample based on the target data sample and the target sample time.
在本公开的第三方面,提供了一种电子设备。该设备包括至少一个处理单元;以及至少一个存储器,至少一个存储器被耦合到至少一个处理单元并且存储用于由至少一个处理单元执行的指令。指令在由至少一个处理单元执行时使设备执行第一方面的方法。In a third aspect of the present disclosure, an electronic device is provided. The device includes at least one processing unit; and at least one memory, the at least one memory is coupled to the at least one processing unit and stores instructions for execution by the at least one processing unit. When the instructions are executed by the at least one processing unit, the device executes the method of the first aspect.
在本公开的第四方面,提供了一种计算机可读存储介质。该介质上存储有计算机程序,计算机程序被处理器执行时实现第一方面的方法。In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, wherein a computer program is stored on the medium, and when the computer program is executed by a processor, the method of the first aspect is implemented.
应当理解,该部分中所描述的内容并非旨在限定本公开的实施例的关键特征或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的描述而变得容易理解。It should be understood that the content described in this section is not intended to limit the key features or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标记表示相同或相似的元素,其中:The above and other features, advantages and aspects of the embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. In the accompanying drawings, the same or similar reference numerals represent the same or similar elements, wherein:
图1示出了能够在其中实现本公开的实施例的示例环境的示意图;FIG1 shows a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented;
图2示出了示例模型训练过程的对比; Figure 2 shows a comparison of the example model training process;
图3示出了根据本公开的一些实施例的用于周期性数据的预测的架构;FIG3 illustrates an architecture for prediction of periodic data according to some embodiments of the present disclosure;
图4示出了根据本公开的一些实施例的模型训练过程;FIG4 illustrates a model training process according to some embodiments of the present disclosure;
图5示出了根据本公开的一些实施例的预测模型的示例结构;FIG5 shows an example structure of a prediction model according to some embodiments of the present disclosure;
图6A示出了在一些示例预测任务下利用基于傅里叶展开的周期性建模部分的预测结果;FIG6A shows prediction results using the periodic modeling part based on Fourier expansion under some example prediction tasks;
图6B示出了根据本公开的一些实施例所示的一组周期性核函数的示例样式;FIG6B shows an example pattern of a set of periodic kernel functions according to some embodiments of the present disclosure;
图6C示出了根据本公开的另一些实施例所示的一组周期性核函数的示例样式;FIG6C shows an example pattern of a set of periodic kernel functions according to some other embodiments of the present disclosure;
图7A和7B示出了根据本公开的一些实施例的预测模型的详细结构;7A and 7B illustrate the detailed structure of a prediction model according to some embodiments of the present disclosure;
图8示出了根据本公开的一些实施例的用于周期性数据的预测的过程的流程图;FIG8 shows a flow chart of a process for prediction of periodic data according to some embodiments of the present disclosure;
图9示出了根据本公开的一些实施例的用于周期性数据的预测的装置的框图;以及FIG9 shows a block diagram of an apparatus for prediction of periodic data according to some embodiments of the present disclosure; and
图10示出了其中可以实施本公开的一个或多个实施例的电子设备的框图。FIG. 10 illustrates a block diagram of an electronic device in which one or more embodiments of the present disclosure may be implemented.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be construed as being limited to the embodiments set forth herein. On the contrary, these embodiments are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes and are not intended to limit the scope of protection of the present disclosure.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至 少一个实施例”。术语“一些实施例”应当理解为“至少一些实施例”。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "including" and similar terms should be understood as open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least in part on". The term “some embodiments” should be understood as “at least some embodiments”. Other explicit and implicit definitions may also be included below.
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。It is understandable that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and relevant provisions.
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当根据相关法律法规通过适当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。It is understandable that before using the technical solutions disclosed in the embodiments of the present disclosure, the types, scope of use, usage scenarios, etc. of the personal information involved in the present disclosure should be informed to the user and the user's authorization should be obtained in an appropriate manner in accordance with relevant laws and regulations.
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息,从而使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。For example, in response to receiving an active request from a user, a prompt message is sent to the user to clearly prompt the user that the operation requested to be performed will require obtaining and using the user's personal information, so that the user can independently choose whether to provide personal information to software or hardware such as electronic devices, applications, servers or storage media that execute operations of the technical solution of the present disclosure based on the prompt message.
作为一种可选的但非限制性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式,例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或“不同意”向电子设备提供个人信息的选择控件。As an optional but non-limiting implementation, in response to receiving an active request from the user, the prompt information is sent to the user in the form of a pop-up window, in which the prompt information can be presented in text form. In addition, the pop-up window can also carry a selection control for the user to choose "agree" or "disagree" to provide personal information to the electronic device.
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其他满足相关法律法规的方式也可应用于本公开的实现方式中。It is understandable that the above notification and the process of obtaining user authorization are merely illustrative and do not constitute a limitation on the implementation of the present disclosure. Other methods that meet relevant laws and regulations may also be applied to the implementation of the present disclosure.
如本文中所使用的,术语“模型”可以从训练数据中学习到相应的输入与输出之间的关联关系,从而在训练完成后可以针对给定的输入,生成对应的输出。模型的生成可以基于机器学习技术。深度学习是一种机器学习算法,通过使用多层处理单元来处理输入和提供相应输出。神经网络模型是基于深度学习的模型的一个示例。在本文中,“模型”也可以被称为“机器学习模型”、“学习模型”、“机器学习网络”或“学习网络”,这些术语在本文中可互换地使用。As used herein, the term "model" can learn the association between the corresponding input and output from the training data, so that after the training is completed, the corresponding output can be generated for a given input. The generation of the model can be based on machine learning technology. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs by using multi-layer processing units. A neural network model is an example of a model based on deep learning. In this article, "model" may also be referred to as "machine learning model", "learning model", "machine learning network" or "learning network", and these terms are used interchangeably in this article.
“神经网络”是一种基于深度学习的机器学习网络。神经网络能够处理输入并且提供相应输出,其通常包括输入层和输出层以及在输入层与输出层之间的一个或多个隐藏层。在深度学习应用中使用的神 经网络通常包括许多隐藏层,从而增加网络的深度。神经网络的各个层按顺序相连,从而前一层的输出被提供作为后一层的输入,其中输入层接收神经网络的输入,而输出层的输出作为神经网络的最终输出。神经网络的每个层包括一个或多个节点(也称为处理节点或神经元),每个节点处理来自上一层的输入。A neural network is a machine learning network based on deep learning. A neural network can process input and provide corresponding output. It usually includes an input layer and an output layer and one or more hidden layers between the input layer and the output layer. The network usually includes many hidden layers, thereby increasing the depth of the network. The layers of the neural network are connected in sequence, so that the output of the previous layer is provided as the input of the next layer, where the input layer receives the input of the neural network and the output of the output layer serves as the final output of the neural network. Each layer of the neural network includes one or more nodes (also called processing nodes or neurons), each of which processes the input from the previous layer.
通常,机器学习大致可以包括三个阶段,即训练阶段、验证阶段和应用阶段(也称为推理阶段)。在训练阶段,给定的模型可以使用大量的训练数据进行训练,不断迭代更新参数值,直到模型能够从训练数据中获得一致的满足预期目标的推理。通过训练,模型可以被认为能够从训练数据中学习从输入到输出之间的关联(也称为输入到输出的映射)。训练后的模型的参数值被确定。在验证阶段,将验证输入应用到训练后的模型,验证模型是否能够提供正确的输出,从而确定模型的性能。在应用阶段,模型可以被用于基于训练得到的参数值,对实际的输入进行处理,确定对应的输出。Generally, machine learning can be roughly divided into three stages, namely the training stage, the verification stage, and the application stage (also called the inference stage). In the training stage, a given model can be trained using a large amount of training data, and the parameter values are continuously updated iteratively until the model can obtain consistent inferences that meet the expected goals from the training data. Through training, the model can be considered to be able to learn the association from input to output (also called input-to-output mapping) from the training data. The parameter values of the trained model are determined. In the verification stage, the verification input is applied to the trained model to verify whether the model can provide the correct output, thereby determining the performance of the model. In the application stage, the model can be used to process the actual input based on the parameter values obtained from the training to determine the corresponding output.
图1示出了能够在其中实现本公开的实施例的示例环境100的示意图。如图1所示,环境100包括模型训练***110和模型应用***120。在图1的示例实施例,模型训练***110被配置利用训练数据集112来训练预测模型105。模型应用***120可以被配置为应用训练后的预测模型105。FIG1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented. As shown in FIG1 , the environment 100 includes a model training system 110 and a model application system 120. In the example embodiment of FIG1 , the model training system 110 is configured to train a prediction model 105 using a training data set 112. The model application system 120 can be configured to apply the trained prediction model 105.
在实际***中,预测模型105可以被配置为处理输入数据,并确定对应的预测结果。在每次预测时,预测模型105接收输入的数据样本并生成数据样本对应的预测结果。在此,“数据样本”指的是预测模型105可处理的输入数据的单位粒度。In an actual system, the prediction model 105 can be configured to process input data and determine corresponding prediction results. At each prediction, the prediction model 105 receives an input data sample and generates a prediction result corresponding to the data sample. Here, "data sample" refers to the unit granularity of input data that the prediction model 105 can process.
可以根据实际应用来定义预测模型105所要执行的预测任务。例如,在推荐***中,预测任务是预测推荐项目的转化结果,并基于预测的转化结果来确定是否要向用户推荐该项目。在此,推荐项目可以是任何要被推荐的内容或资源,其示例可以包括应用、实体商品、虚拟商品、音视频内容,等等。推荐项目的转化结果可以基于具体项目以及实际需要来定义,一些示例转化结果例如可以包括点击、下载、 注册、加入购物车、付费、激活、或其他资源需求行为。The prediction task to be performed by the prediction model 105 can be defined according to the actual application. For example, in a recommendation system, the prediction task is to predict the conversion result of the recommended item and determine whether to recommend the item to the user based on the predicted conversion result. Here, the recommended item can be any content or resource to be recommended, and examples thereof may include applications, physical goods, virtual goods, audio and video content, etc. The conversion result of the recommended item can be defined based on the specific item and actual needs. Some example conversion results may include clicks, downloads, Registration, adding to shopping cart, payment, activation, or other resource demand behaviors.
在与推荐相关的预测任务中,输入到预测模型105的数据样本可以至少包括与推荐项目相关的信息。在一些情况下,数据样本还可以包括与要推荐的用户相关的信息。预测模型105输出的预测结果可以是推荐项目在被推荐的情况下会发生转化的概率,或者特定用户针对该推荐项目发生转化的概率,等等。In the prediction task related to recommendation, the data sample input to the prediction model 105 may include at least information related to the recommended item. In some cases, the data sample may also include information related to the user to be recommended. The prediction result output by the prediction model 105 may be the probability that the recommended item will be converted when recommended, or the probability that a specific user will be converted for the recommended item, etc.
作为另一个例子,在金融应用中,预测模型105的预测任务可以是预测产品在未来时间的销售量。在该示例中,输入到预测模型105的数据样本可以包括进一步的时间、与产品和/或其他相关产品相关的信息、产品和/或者其他相关产品的历史销售量、与产品的目标地理区域和目标用户相关的信息等等。预测模型105的输出可以包括产品在某个时间的预测销售量。As another example, in financial applications, the prediction task of the prediction model 105 may be to predict the sales volume of a product at a future time. In this example, the data samples input to the prediction model 105 may include further time, information related to the product and/or other related products, historical sales volume of the product and/or other related products, information related to the target geographic area and target users of the product, etc. The output of the prediction model 105 may include the predicted sales volume of the product at a certain time.
应当理解,上面仅列出了若干可能的示例,并且预测模型105可以被配置为实现任何其他预测任务。在下文中,出于解释说明的目的,以推荐***的应用场景为例进行说明,但应当理解本公开的实施例可以被应用到其他具有类似特性的预测任务中。It should be understood that only several possible examples are listed above, and the prediction model 105 can be configured to implement any other prediction tasks. In the following, for the purpose of explanation, the application scenario of the recommendation system is used as an example, but it should be understood that the embodiments of the present disclosure can be applied to other prediction tasks with similar characteristics.
预测模型105可以被构造为能够处理输入的数据样本并生成输出作为预测结果的函数。预测模型105可以配置有一组参数,这些参数的值将通过训练过程从训练数据中学习。The prediction model 105 may be constructed to process input data samples and generate output as a function of the prediction result. The prediction model 105 may be configured with a set of parameters, the values of which are learned from the training data through a training process.
在训练时,所使用的训练数据集112可以包括提供给预测模型105的训练数据样本114,以及指示训练数据样本114的对应真实预测结果的标注信息116。虽然图1仅示出了一对训练数据样本及其标注信息,但在训练时可能需要一定数量的训练数据样本以及标注信息。During training, the training data set 112 used may include training data samples 114 provided to the prediction model 105, and annotation information 116 indicating the corresponding true prediction results of the training data samples 114. Although FIG. 1 only shows a pair of training data samples and their annotation information, a certain number of training data samples and annotation information may be required during training.
在一些实施例中,使用目标函数来衡量预测模型105针对训练数据样本114给出的输出与标注信息116之间的误差(或距离)。这种误差也称为机器学习的损失,目标函数也可以称为损失函数。损失函数可以表示为l(f(x),y),其中x表示训练数据样本,f()表示机器学习模型,f(x)表示预测模型的输出,y表示x的标注信息,指示x的真实预测结果。在训练期间,预测模型105的参数值被更新以减少从目 标函数计算的误差。在目标函数被优化,例如,所计算的误差被最小化或达到期望的阈值时,学习目标完成。In some embodiments, an objective function is used to measure the error (or distance) between the output given by the prediction model 105 for the training data sample 114 and the annotation information 116. This error is also called the loss of machine learning, and the objective function can also be called the loss function. The loss function can be expressed as l(f(x), y), where x represents the training data sample, f() represents the machine learning model, f(x) represents the output of the prediction model, and y represents the annotation information of x, indicating the true prediction result of x. During training, the parameter values of the prediction model 105 are updated to reduce the error from the objective function. The learning objective is achieved when the objective function is optimized, for example, the calculated error is minimized or reaches a desired threshold.
在训练过程之后,可以将用更新的参数值配置的训练的预测模型105提供给模型应用***120,该模型应用***120将实际需要预测的目标数据样本122应用于预测模型105,以输出目标数据样本122的预测结果124。After the training process, the trained prediction model 105 configured with updated parameter values may be provided to the model application system 120 , which applies the target data sample 122 that actually needs to be predicted to the prediction model 105 to output a prediction result 124 of the target data sample 122 .
在图1中,模型训练***110和模型应用***120可以是任何具有计算能力的***,例如各种计算设备/***、终端设备、服务器等。终端设备可以是任意类型的移动终端、固定终端或便携式终端,包括移动手机、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、媒体计算机、多媒体平板、或者前述各项的任意组合,包括这些设备的配件和外设或者其任意组合。服务器包括但不限于大型机、边缘计算节点、云环境中的计算设备,等等。In FIG1 , the model training system 110 and the model application system 120 can be any system with computing capabilities, such as various computing devices/systems, terminal devices, servers, etc. The terminal device can be any type of mobile terminal, fixed terminal or portable terminal, including a mobile phone, a desktop computer, a laptop computer, a notebook computer, a netbook computer, a tablet computer, a media computer, a multimedia tablet, or any combination of the foregoing, including accessories and peripherals of these devices or any combination thereof. Servers include but are not limited to mainframes, edge computing nodes, computing devices in cloud environments, and the like.
应当理解,图1示出的环境中的部件和布置仅是示例,适于用于实现本公开所描述的示例实施例的计算***可以包括一个或多个不同的部件、其他部件和/或不同的布置方式。例如,虽然被示出为是分离的,但模型训练***110和模型应用***120可以集成在相同***或设备。本公开的实施例在此方面不受限制。It should be understood that the components and arrangements in the environment shown in FIG. 1 are merely examples, and a computing system suitable for implementing the example embodiments described in the present disclosure may include one or more different components, other components, and/or different arrangements. For example, although shown as being separate, the model training system 110 and the model application system 120 may be integrated into the same system or device. The embodiments of the present disclosure are not limited in this respect.
应当理解,仅出于示例性的目的描述环境100中各个元素的结构和功能,而不暗示对于本公开的范围的任何限制。It should be understood that the structure and function of the various elements in the environment 100 are described for exemplary purposes only and do not imply any limitation on the scope of the present disclosure.
在某些情况下,由机器学习模型处理的输入数据可能具有一定的周期性。这种数据称为周期性或周期性数据。例如,在推荐场景中,用户群体的行为随着时间有周期性。应用程序的用户可能通常在每天相对固定的时间窗口内(例如,睡前和下班后)登录应用程序或者在周末才会频繁使用应用程序,并在不同日子在同一时间窗口显示相同的兴趣。这些规律行为对预测模型造成的影响就是预测模型在相邻两个周期的同一时刻往往非常相似。In some cases, the input data processed by the machine learning model may have a certain periodicity. This kind of data is called periodic or cyclical data. For example, in the recommendation scenario, the behavior of the user group is periodic over time. The users of the application may usually log in to the application in a relatively fixed time window every day (for example, before bed and after get off work) or use the application frequently on weekends, and show the same interests in the same time window on different days. The impact of these regular behaviors on the prediction model is that the prediction model is often very similar at the same time in two adjacent cycles.
如果构建普通预测模型f(x),不考虑数据样本的时间特征,那么预测结果难以反映周期性特点。因此,需要利用能够建模周期性数 据的预测模型。对于处理周期性数据的预测模型,将会引入输入的数据样本相关联的时间,作为时间特征。用于处理周期性数据的预测模型可以被表示为f(x,t),其中x表示输入到模型的数据样本,t表示数据样本x在一个周期内的获得时间。这样,对于在不同时间的数据样本,可以实施不同的预测策略。If we construct a general prediction model f(x) without considering the time characteristics of the data sample, the prediction results will be difficult to reflect the periodic characteristics. Therefore, we need to use a method that can model periodic data. For prediction models that process periodic data, the time associated with the input data samples will be introduced as a time feature. The prediction model for processing periodic data can be expressed as f(x,t), where x represents the data sample input to the model and t represents the time when the data sample x is obtained in a period. In this way, different prediction strategies can be implemented for data samples at different times.
另一方面,除周期性之外,需要执行预测的输入数据在一个周期内表现出的特性可能也会发生改变。例如,在一段时间内用户总是在白天展示出对某些内容的兴趣,但在另一段时间用户可能在夜晚展示出对应的兴趣,或者在白天展示出对另一些内容的兴趣。这就要需要不断更新预测模型,以使模型能够追踪到将要处理的输入数据的更新特性。因此,在一次训练完成后,已被投入使用的预测模型可能还会需要利用新的训练数据来重新训练。在这种情况下,预测模型的训练其实可以被认为有两个阶段。第一阶段是利用大量历史训练数据进行模型训练的阶段,称为批式训练。在训练后的模型投入使用后,还根据后续到来的训练数据继续更新模型。这个过程叫做流式训练。On the other hand, in addition to periodicity, the characteristics of the input data that need to be predicted may also change within a period. For example, during a period of time, users always show interest in certain content during the day, but during another period of time, users may show corresponding interests at night, or show interest in other content during the day. This requires continuous updating of the prediction model so that the model can track the updated characteristics of the input data to be processed. Therefore, after a training is completed, the prediction model that has been put into use may need to be retrained using new training data. In this case, the training of the prediction model can actually be considered to have two stages. The first stage is the stage of model training using a large amount of historical training data, which is called batch training. After the trained model is put into use, the model continues to be updated based on the subsequent training data. This process is called streaming training.
在利用输入数据的周期性来训练更好的预测模型的问题可以如下设置。给定由三元组表示的样本(x,y,t),其中x是用于模型输入的训练数据样本,y是训练数据采样的预测结果,并且t是在一定周期内训练数据样本x的获得时间,期望学习可以预测任意给定时间t的预测模型(表示为f)。训练数据样本可能也是以循环方式到达模型训练***。更具体地说,在模型的两次连续更新之间,只有在更新间隔内可获得的样本才能够被用于训练。这可能会导致对周期性数据进行建模的预测模型在学习过程中的一定预估误差。The problem of exploiting the periodicity of input data to train a better prediction model can be set up as follows. Given samples represented by triples (x, y, t), where x is a training data sample for model input, y is the prediction result of the training data sample, and t is the time when the training data sample x is obtained within a certain period, it is desired to learn a prediction model (denoted as f) that can predict any given time t. The training data samples may also arrive at the model training system in a cyclic manner. More specifically, between two consecutive updates of the model, only samples that are available within the update interval can be used for training. This may lead to certain estimation errors in the learning process of the prediction model that models periodic data.
下面将举一个具体示例来说明这个问题。The following is a specific example to illustrate this problem.
在推荐场景中,在推荐平台中提供推荐项目后(例如推荐给用户后),与本次推荐对应的转化结果会被反馈至推荐平台。转化结果被反馈给推荐平台,称为转化回流。推荐项目被提供的时间称为推荐项目的“发送时间”,转化结果的反馈时间称为推荐项目的“回流时间”。In the recommendation scenario, after providing the recommended item in the recommendation platform (for example, after recommending it to the user), the conversion result corresponding to this recommendation will be fed back to the recommendation platform. The conversion result is fed back to the recommendation platform, which is called conversion return. The time when the recommended item is provided is called the "sending time" of the recommended item, and the feedback time of the conversion result is called the "return time" of the recommended item.
有时候与一次推荐对应的转化结果可能并不总是被实时反馈的。 转化结果的非实时可获得称为“转化回流延迟”,或反馈延迟。造成转化回流延迟的原因有多种,例如某些推荐项目的转化行为是在接收到推荐项目一段时间后才发生(例如,付费行为、激活行为等),或者转化结果被故意延迟以实现隐私保护,等等。此外,对于不同推荐项目或者不同转化结果,具体的转化回流延迟也不同。例如,对于某个推荐项目,如果用户未发生转化,那么指示未转化的转化结果可能会被实时反馈。如果用户在一段时间后执行了转化,那么指示成功转化的转化结果可能会在延迟一段时间后才被反馈。Sometimes the conversion result corresponding to a recommendation may not always be fed back in real time. The non-real-time availability of conversion results is called "conversion return flow delay", or feedback delay. There are many reasons for the conversion return flow delay, such as the conversion behavior of some recommended items occurs after a period of time after the recommended items are received (for example, payment behavior, activation behavior, etc.), or the conversion results are deliberately delayed to achieve privacy protection, etc. In addition, the specific conversion return flow delay is different for different recommended items or different conversion results. For example, for a certain recommended item, if the user does not convert, the conversion result indicating no conversion may be fed back in real time. If the user performs the conversion after a period of time, the conversion result indicating a successful conversion may be fed back after a delay of a period of time.
一般情况下,在推荐项目被发送时,由于只知道推荐项目相关的信息(以及用户的信息)而不知道该用户是否执行转化,即只知道模型输入的数据样本,而不知道该数据样本的标注信息,所以该推荐项目无法被应用于训练预测模型。只有在转化回流后,才能构造三元组(x,y,t)用于预测模型的训练。这个原因将会导致训练得到的预测模型的预测结果不准确。Generally, when a recommendation item is sent, only the information related to the recommendation item (and the user's information) is known, but it is unknown whether the user has performed the conversion, that is, only the data sample input to the model is known, but the annotation information of the data sample is unknown, so the recommendation item cannot be used to train the prediction model. Only after the conversion flow is returned can the triple (x, y, t) be constructed for the training of the prediction model. This reason will lead to inaccurate prediction results of the trained prediction model.
下面结合图2,通过一个较为极端的例子来证明由于训练样本的延迟导致的误差。假设存在作为基准的预测模型210,f(x),该模型不对数据的周期性进行建模,但会持续用最新获得标注的数据进行更新。此外还存在对数据的周期性进行建模的预测模型220,f(x,t)。假设在一个周期(假设为一天)内时间t只有两个取值,t=0或1,其中f(x,t=0)表示预测模型在白天对数据样本x执行的预测,f(x,t=1)表示预测模型在夜晚对数据样本x执行的预测。如果数据样本x对应的标注信息y=1,表示数据样本的真实预测结果(例如,指示发生转化),其具有反馈延迟1(假设,延迟12小时)。反之,如果数据样本x对应的标注信息y=0,表示数据样本的真实预测结果(例如,指示未发生转化),其具有反馈延迟0(假设,延迟0小时)。具有标注信息y=1的数据样本x可以称为正样本,具有标注信息y=0的数据样本x可以称为负样本。In conjunction with FIG. 2 , a more extreme example is used to illustrate the error caused by the delay of the training sample. Assume that there is a prediction model 210, f(x), which is used as a benchmark. The model does not model the periodicity of the data, but is continuously updated with the latest annotated data. In addition, there is a prediction model 220, f(x, t), which models the periodicity of the data. Assume that in a cycle (assuming a day), time t has only two values, t=0 or 1, where f(x, t=0) represents the prediction performed by the prediction model on the data sample x during the day, and f(x, t=1) represents the prediction performed by the prediction model on the data sample x at night. If the annotation information y=1 corresponding to the data sample x, it represents the true prediction result of the data sample (for example, indicating that a conversion has occurred), which has a feedback delay of 1 (assuming that the delay is 12 hours). Conversely, if the annotation information y=0 corresponding to the data sample x, it represents the true prediction result of the data sample (for example, indicating that a conversion has not occurred), which has a feedback delay of 0 (assuming that the delay is 0 hours). A data sample x with label information y=1 may be referred to as a positive sample, and a data sample x with label information y=0 may be referred to as a negative sample.
假设在前一周期(例如,前一天)的夜晚t=1提供数据样本211,并且在当前周期(例如,今天)的白天t=0提供数据样本212。在当 前周期t=0的时间处,例如今天的白天,如果要对预测模型执行训练,那么此时可以获得的是前一周期t=1被发送的数据样本及其标注信息,即正样本231,以及当前周期t=0被发送的数据样本及其标注信息,即负样本232。对于预测模型210,由于没有时间特征t的存在,所以这两类样本被共同用于更新预测模型。假设这两类样本数量为1:1,那么预测模型210学到的样本均值为0.5。对于预测模型220,正样本211会被用于更新预测模型220中与t=1相关的部分,即f(x,t=1),而负样本212会被用于更新预测模型220中与t=0相关的部分,即f(x,t=0)。这样,f(x,t=1)学习到的样本均值为1,而f(x,t=0)学习到的样本均值为0。Assume that data sample 211 is provided at night t=1 in the previous cycle (e.g., the previous day), and data sample 212 is provided at day t=0 in the current cycle (e.g., today). At the time of the previous cycle t=0, for example, today during the day, if the prediction model is to be trained, then what can be obtained at this time is the data sample sent in the previous cycle t=1 and its annotation information, that is, the positive sample 231, and the data sample sent in the current cycle t=0 and its annotation information, that is, the negative sample 232. For the prediction model 210, since there is no time feature t, these two types of samples are used together to update the prediction model. Assuming that the number of these two types of samples is 1:1, the sample mean learned by the prediction model 210 is 0.5. For the prediction model 220, the positive sample 211 will be used to update the part of the prediction model 220 related to t=1, that is, f(x, t=1), and the negative sample 212 will be used to update the part of the prediction model 220 related to t=0, that is, f(x, t=0). In this way, the sample mean learned by f(x, t=1) is 1, and the sample mean learned by f(x, t=0) is 0.
在更新过后,由于当前时间是白天t=0,那么预测模型220将会需要用训练后的f(x,t=0)来执行预测,这就导致对于当前时间获得的目标数据样本做出错误估计,即预测结果是低估(0<0.5)。当然,如果在图2的示例中,负样本的标注信息的延迟较大而正样本的标注信息的延迟较小,那么在更新预测模型220后将会导致预测结果的高估(例如,1>0.5)。After the update, since the current time is daytime t=0, the prediction model 220 will need to use the trained f(x, t=0) to perform prediction, which will lead to an incorrect estimate of the target data sample obtained at the current time, that is, the prediction result is underestimated (0<0.5). Of course, if in the example of FIG. 2, the delay of the labeling information of the negative sample is large and the delay of the labeling information of the positive sample is small, then updating the prediction model 220 will lead to an overestimation of the prediction result (for example, 1>0.5).
如果数据样本的标注信息的反馈时间没有被延迟,那么对于预测模型220的训练将会有与模型210的相同效果。然而,在针对具有周期性的输入数据的很多预测场景中,训练数据样本的标注信息的获得可能总是存在延迟。发明人通过大量研究实验发现,如果使用训练数据样本的获得时间作为时间特征,则在AB实验中会存在预测误差。这是因为,AB实验的过程中,当前训练的预测模型需要被即时应用到对当前到来的数据样本进行预估,这样就会导致反馈延迟较长的样本会更难被及时用于模型的学习与更新,从而导致在AB实验过程中观察到的模型预估性能主要由短时间内反馈的样本决定。如果反馈延迟较长的样本与反馈延迟较短的样本之间有较大差异,则会出现预测模型在线上即时到来数据上的预估性能与在已反馈完全的离线数据上的预估性能差异。If the feedback time of the annotation information of the data sample is not delayed, the training of the prediction model 220 will have the same effect as that of the model 210. However, in many prediction scenarios for input data with periodicity, there may always be a delay in obtaining the annotation information of the training data sample. The inventors have found through a large number of research experiments that if the acquisition time of the training data sample is used as a time feature, there will be prediction errors in the AB experiment. This is because, during the AB experiment, the currently trained prediction model needs to be applied immediately to estimate the currently arriving data samples, which will cause samples with longer feedback delays to be more difficult to be used in time for model learning and updating, resulting in the model estimation performance observed during the AB experiment being mainly determined by samples fed back in a short time. If there is a large difference between samples with longer feedback delays and samples with shorter feedback delays, there will be a difference between the estimated performance of the prediction model on the online instant arrival data and the estimated performance on the offline data that has been fully fed back.
根据本公开的实施例,提供了一种改进的方案,以解决处理周 期性数据的预测模型的更新问题。在该方案中,对于被配置为处理具有目标周期的输入数据的预测模型,在训练过程中,如果训练数据样本的标注信息的反馈有延迟,将训练数据样本对应的标注信息的获得时间作为该训练数据样本的时间特征来进行模型训练。对于这一类训练数据样本,预测模型的输入是训练数据样本本身以及样本时间,该样本时间指示在目标周期内标注信息的获得时间。这样,可以基于已获得的标注信息来确定对预测模型的更新。在训练完成后,经训练的预测模型被应用于处理目标数据样本并确定对应的预测结果,该目标数据样本的时间特征为在目标周期内该数据样本的获得时间。通过在训练时利用标注信息的获得时间来代替输入的数据样本的获得时间,能够有效改善标注信息反馈延迟导致的模型学***均值不准确的情况,从而能够在实际预测中输出更准确的预测结果。According to an embodiment of the present disclosure, an improved solution is provided to solve the problem of processing peripheral The problem of updating the prediction model of periodic data. In this solution, for the prediction model configured to process input data with a target period, during the training process, if the feedback of the annotation information of the training data sample is delayed, the time of obtaining the annotation information corresponding to the training data sample is used as the time feature of the training data sample to perform model training. For this type of training data sample, the input of the prediction model is the training data sample itself and the sample time, and the sample time indicates the time of obtaining the annotation information within the target period. In this way, the update of the prediction model can be determined based on the obtained annotation information. After the training is completed, the trained prediction model is applied to process the target data sample and determine the corresponding prediction result, and the time feature of the target data sample is the time of obtaining the data sample within the target period. By using the time of obtaining the annotation information instead of the time of obtaining the input data sample during training, the poor model learning effect caused by the delay in the feedback of the annotation information (especially the poor performance when predicting online data during the AB experiment) and the inaccurate prediction average value can be effectively improved, so that more accurate prediction results can be output in actual predictions.
以下将继续参考附图描述本公开的一些示例实施例。Some example embodiments of the present disclosure will be described below with continued reference to the accompanying drawings.
图3示出了根据本公开的一些实施例的用于周期性数据的预测的架构。假设预测模型310被配置为处理具有目标周期的输入数据。这样的预测模型会被构建为以数据样本和样本时间作为模型输入,并且输出对应的预测结果。目标周期可以是取决于具体应用来设置,样本时间指的是目标周期内的一个时间,其可以是目标周期内任意粒度的时间点或时间段。下文中首先描述模型的训练,然后再讨论具有周期性建模能力的预测模型310的详细架构。FIG3 shows an architecture for prediction of periodic data according to some embodiments of the present disclosure. Assume that the prediction model 310 is configured to process input data with a target period. Such a prediction model will be constructed to take data samples and sample times as model inputs and output corresponding prediction results. The target period can be set depending on the specific application, and the sample time refers to a time within the target period, which can be a time point or time period of any granularity within the target period. The following first describes the training of the model, and then discusses the detailed architecture of the prediction model 310 with periodic modeling capabilities.
根据本公开的实施例,期望对预测模型310执行多次迭代更新。在每次更新时,利用新采集到的训练数据执行针对预测模型310的训练过程。在每次迭代中,在训练阶段,预测模型310的训练数据可以包括训练数据样本302-1、302-2、……302-N(N为大于等于1的整数),为便于讨论,统称为或单独称为训练数据样本302。为了完成训练,每个训练数据样本还需要对应的标注信息304-1、304-2、……304-N,为便于讨论,统称为或单独称为标注信息304。标注信息304 指示对应的训练数据样本302的真实预测结果。例如,在推荐场景下,如果训练数据样本302是与推荐项目相关的信息,标注信息304可以指示推荐项目的真实转化结果。According to an embodiment of the present disclosure, it is expected that multiple iterations of updates are performed on the prediction model 310. At each update, the training process for the prediction model 310 is performed using newly collected training data. In each iteration, during the training phase, the training data of the prediction model 310 may include training data samples 302-1, 302-2, ... 302-N (N is an integer greater than or equal to 1), which are collectively or individually referred to as training data samples 302 for ease of discussion. In order to complete the training, each training data sample also requires corresponding annotation information 304-1, 304-2, ... 304-N, which are collectively or individually referred to as annotation information 304 for ease of discussion. Annotation information 304 Indicates the actual prediction result of the corresponding training data sample 302. For example, in a recommendation scenario, if the training data sample 302 is information related to a recommended item, the annotation information 304 may indicate the actual conversion result of the recommended item.
在训练时,预测模型310基于当前的模型参数来处理输入的训练数据样本302和对应的样本时间,并给出预测结果。更新模块312可以基于训练数据样本302的预测结果以及对应的标注信息304之间的误差,来更新预测模型310。通过迭代更新,预测模型310可以学习到训练数据样本302所展现的特性,从而能够在后续给出更准确的预测结果。During training, the prediction model 310 processes the input training data sample 302 and the corresponding sample time based on the current model parameters and gives a prediction result. The update module 312 can update the prediction model 310 based on the prediction result of the training data sample 302 and the error between the corresponding annotation information 304. Through iterative updates, the prediction model 310 can learn the characteristics exhibited by the training data sample 302, so that more accurate prediction results can be given in the future.
在一些情况下,由于训练数据样本的标注信息的反馈存在延迟,因此可能会存在这样的情况,在当前时间t所获得的标注信息,其对应的训练数据样本的获得时间是t-Δ,其中Δ表示延迟时间。如前文结合图2所讨论的,如果依赖于训练数据样本的获得时间作为样本时间输入到预测模型,那么将会导致训练出的预测模型在标注信息未完成反馈的部分数据样本上出现预测误差。在本公开的实施例中,对于预测模型310,在训练过程中所使用的训练数据样本302中,至少有部分训练数据样本302,其训练过程中其样本时间指示在目标周期内对应的标注信息304的获得时间。这就意味着,虽然存在标注信息的反馈延迟,但对于当前时间,可以将已被延迟反馈的标注信息的获得时间作为训练数据样本的样本时间,共同用于训练预测模型310。这样,预测模型310可以针对当前时间,从更多更全面的训练数据中学习到周期性输入数据的特性,从而在模型应用过程中防止用于预测模型出现对某些数据样本的高估或低估现象。In some cases, due to the delay in the feedback of the annotation information of the training data sample, there may be such a situation that the annotation information obtained at the current time t corresponds to the acquisition time of the training data sample t-Δ, where Δ represents the delay time. As discussed above in conjunction with FIG. 2, if the acquisition time of the training data sample is relied upon as the sample time input to the prediction model, it will cause the trained prediction model to have prediction errors on some data samples for which the annotation information has not been fully fed back. In an embodiment of the present disclosure, for the prediction model 310, among the training data samples 302 used in the training process, at least some of the training data samples 302 have sample times indicating the acquisition time of the corresponding annotation information 304 in the target period during the training process. This means that although there is a delay in the feedback of the annotation information, for the current time, the acquisition time of the annotation information that has been delayed can be used as the sample time of the training data sample, and used together to train the prediction model 310. In this way, the prediction model 310 can learn the characteristics of the periodic input data from more and more comprehensive training data for the current time, thereby preventing the prediction model from overestimating or underestimating certain data samples during the model application process.
在一些实施例中,在预测模型310的所有训练数据样本302中,如果部分训练数据样本的标注信息没有反馈延迟,那么在当前时间可以直接将这部分数据样本302的获得时间作为样本时间输入到预测模型310。也就是说,预测模型310的输入可以分为两类。第一类输入包括第一训练数据样本以及第一样本时间,第一样本时间指示第一训练数据样本的标注信息的获得时间。第二类输入包括第二训练数据样 本以及第二样本时间,第二样本时间指示第二训练数据样本的获得时间。在一些实施例中,在训练时预测模型310的输入也可以总是利用训练数据样本以及训练数据样本的标注信息的获得时间(即,第一样本时间)。In some embodiments, among all the training data samples 302 of the prediction model 310, if the annotation information of some training data samples has no feedback delay, then at the current time, the acquisition time of these data samples 302 can be directly input into the prediction model 310 as the sample time. That is, the input of the prediction model 310 can be divided into two categories. The first category of input includes the first training data sample and the first sample time, and the first sample time indicates the acquisition time of the annotation information of the first training data sample. The second category of input includes the second training data sample and the first sample time. This and the second sample time, the second sample time indicates the time when the second training data sample is obtained. In some embodiments, during training, the input of the prediction model 310 can also always use the time when the training data sample and the labeling information of the training data sample are obtained (ie, the first sample time).
在推荐应用中,预测模型310被配置为预测推荐项目的转化结果。相应地,第一训练数据样本至少指示训练推荐项目的相关信息(以及可能还可以指示用户的相关信息),第一样本时间指示在目标周期内该训练推荐项目的真实转化结果的获得时间。真实转化结果为第一训练数据样本的标注信息,其获得时间相对于训练推荐项目被推荐的时间具有延迟。在一些实施例中,可以使用多个第一训练数据样本、其标注信息以及各自的第一样本时间来训练预测模型310。在一些实施例中,还可以附加地使用第二训练数据样本,其中第二训练数据样本也至少指示训练推荐项目的相关信息(以及可能还可以指示用户的相关信息),相关联的第二样本时间指示在目标周期内该训练推荐项目的获得时间。In the recommendation application, the prediction model 310 is configured to predict the conversion result of the recommended item. Accordingly, the first training data sample at least indicates the relevant information of the training recommendation item (and may also indicate the relevant information of the user), and the first sample time indicates the time when the real conversion result of the training recommendation item is obtained within the target period. The real conversion result is the annotation information of the first training data sample, and its acquisition time is delayed relative to the time when the training recommendation item is recommended. In some embodiments, a plurality of first training data samples, their annotation information, and their respective first sample times can be used to train the prediction model 310. In some embodiments, a second training data sample can also be used in addition, wherein the second training data sample also at least indicates the relevant information of the training recommendation item (and may also indicate the relevant information of the user), and the associated second sample time indicates the time when the training recommendation item is obtained within the target period.
继续结合图4可以更清楚理解本公开的一些实施例的预测模型的训练。图4的示例是在与图2的示例类似的场景下。对于正样本231,虽然数据样本的获得时间是t=1,例如推荐项目是在前一周期的时间t=1被发送的,但在当前周期的时间t=0要对预测模型310执行训练时,可以以在t=0时标注信息的获得时间tf=0作为其样本时间。这样,正样本231及其样本时间t=tf=0可以被用于训练预测模型310中负责处理一个周期中t=0的处理部分f(x,t=0)。此外,在当前周期的时间t=0,由于标注信息的反馈没有延迟,那么同样可以获得负样本232用于模型训练。负样本232的样本时间是数据样本的获得时间t=0。由此,所训练的预测模型310中f(x,t=0)可以被用于对周期内t=0时间的数据样本执行准确预测。Continuing to combine with FIG. 4, the training of the prediction model of some embodiments of the present disclosure can be more clearly understood. The example of FIG. 4 is in a scenario similar to the example of FIG. 2. For the positive sample 231, although the acquisition time of the data sample is t=1, for example, the recommended item is sent at time t=1 in the previous cycle, when the prediction model 310 is to be trained at time t=0 of the current cycle, the acquisition time tf=0 of the annotation information at t=0 can be used as its sample time. In this way, the positive sample 231 and its sample time t=tf=0 can be used to train the processing part f(x, t=0) of the prediction model 310 responsible for processing t=0 in a cycle. In addition, at time t=0 of the current cycle, since there is no delay in the feedback of the annotation information, the negative sample 232 can also be obtained for model training. The sample time of the negative sample 232 is the acquisition time t=0 of the data sample. As a result, f(x, t=0) in the trained prediction model 310 can be used to perform accurate predictions on the data sample at time t=0 in the cycle.
注意,虽然图4示出了正样本的标注信息具有反馈延迟,但在其他情况下可能负样本的标注信息会出现反馈延迟,或者正样本和负样本的标注信息均会出现相同或不同的反馈延迟。注意,虽然借助推荐 场景下的预测来进行说明,但任何其他预测场景也同样适用。本公开的实施例提出的模型训练方式可以适用于在对周期性数据进行学习时,输入数据的获得时间与其标注信息的获得时间之间存在延迟的任何场景。Note that although Figure 4 shows that the positive sample annotation information has feedback delay, in other cases, the negative sample annotation information may have feedback delay, or the positive and negative sample annotation information may have the same or different feedback delays. The model training method proposed in the embodiment of the present disclosure can be applied to any scenario where there is a delay between the time when the input data is obtained and the time when its annotation information is obtained when learning periodic data.
继续参考回图3,在训练阶段,预测模型310的训练例如可以由图1的环境100中的模型训练***110来实现,并且更新模块312可以被实现为模型训练***110的一部分。Continuing to refer back to FIG. 3 , in the training phase, the training of the prediction model 310 may be implemented by, for example, the model training system 110 in the environment 100 of FIG. 1 , and the update module 312 may be implemented as a part of the model training system 110 .
在经过一个训练过程被训练完成后,经训练的预测模型310可以被投入应用,例如可以由图1的模型应用***120来应用。如图3所示,在应用阶段,获取要预测的目标数据样本304以及对应的目标样本时间作为预测模型310的输入,目标样本指示目标数据样本304的获得时间。例如,在推荐场景中,目标数据样本可以至少指示待被推荐的目标推荐项目的相关信息(以及可能还可以指示用户的相关信息),目标样本时间可以指示在目标周期内目标推荐项目要被推荐的时间。预测模型310被用于基于目标数据样本和目标样本时间来确定针对目标数据样本的预测结果。After a training process is completed, the trained prediction model 310 can be put into application, for example, it can be applied by the model application system 120 of Figure 1. As shown in Figure 3, in the application stage, the target data sample 304 to be predicted and the corresponding target sample time are obtained as inputs of the prediction model 310, and the target sample indicates the time when the target data sample 304 is obtained. For example, in a recommendation scenario, the target data sample can at least indicate the relevant information of the target recommendation item to be recommended (and may also indicate the relevant information of the user), and the target sample time can indicate the time when the target recommendation item is to be recommended within the target period. The prediction model 310 is used to determine the prediction result for the target data sample based on the target data sample and the target sample time.
在一些实施例中,对于预测模型310的训练过程可以按一定时间间隔或者根据其他条件被重复执行。每次执行训练时,均可以根据上文讨论的类似方式获得对应的训练数据来进行模型更新。In some embodiments, the training process for the prediction model 310 may be repeatedly performed at a certain time interval or according to other conditions. Each time the training is performed, the corresponding training data may be obtained in a similar manner as discussed above to perform model updating.
在一些实施例中,为了能够对输入数据的周期性进行建模,预测模型310可以包括至少包括非周期性建模部分、周期性建模部分和输出层。图5示出了预测模型310的一个示例结构,其包括非周期性建模部分510、周期性建模部分520和输出层530。In some embodiments, in order to be able to model the periodicity of the input data, the prediction model 310 may include at least a non-periodic modeling part, a periodic modeling part, and an output layer. FIG5 shows an example structure of the prediction model 310, which includes a non-periodic modeling part 510, a periodic modeling part 520, and an output layer 530.
非周期性建模部分510被配置为从输入的数据样本提取中间特征表示。在训练阶段,输入的数据样本是训练数据样本;在应用阶段,输入的数据样本是目标数据样本。非周期性建模部分510用于学习模型输入中非周期性的部分。周期性建模部分520被配置为基于在目标周期内与数据样本对应的样本时间来处理中间特征表示,以获得周期性特征表示。周期性建模部分520用于学习模型输入中周期性的部分。 在训练阶段,对于某些训练数据样本,样本时间是训练数据样本的标注信息的获得时间,对于另一些训练数据样本,样本时间是训练数据样本的获得时间。在应用阶段,输入到周期性建模部分520的样本时间是目标数据样本的获得时间。预测模型310中的输出层530被配置为至少基于周期性特征表示来确定针对数据样本的预测结果。The non-periodic modeling part 510 is configured to extract intermediate feature representations from input data samples. In the training phase, the input data samples are training data samples; in the application phase, the input data samples are target data samples. The non-periodic modeling part 510 is used to learn the non-periodic part of the model input. The periodic modeling part 520 is configured to process the intermediate feature representation based on the sample time corresponding to the data sample in the target period to obtain the periodic feature representation. The periodic modeling part 520 is used to learn the periodic part of the model input. In the training phase, for some training data samples, the sample time is the time when the annotation information of the training data samples is obtained, and for other training data samples, the sample time is the time when the training data samples are obtained. In the application phase, the sample time input to the periodic modeling part 520 is the time when the target data samples are obtained. The output layer 530 in the prediction model 310 is configured to determine the prediction result for the data sample based on at least the periodic feature representation.
在一些实施例中,周期性建模部分520可以被配置为利用傅里叶展开函数,基于样本时间来处理非周期性建模部分提供的中间特征表示。这样的周期性建模部分520可以被称为傅里叶层。基于傅里叶学习构建的模型可以直观地利用训练数据的周期性,并且可以表示为具有周期性的周期函数。因此,傅里叶学习可以适用于基于机器学习的预测模型。基于傅里叶展开的周期性建模部分520可以被表示为如下:
In some embodiments, the periodic modeling portion 520 can be configured to process the intermediate feature representation provided by the non-periodic modeling portion based on the sample time using a Fourier expansion function. Such a periodic modeling portion 520 can be referred to as a Fourier layer. The model constructed based on Fourier learning can intuitively utilize the periodicity of the training data and can be expressed as a periodic function with periodicity. Therefore, Fourier learning can be applied to prediction models based on machine learning. The periodic modeling portion 520 based on Fourier expansion can be expressed as follows:
其中N是超参数,T表示预测模型310所要处理的输入数据的目标周期(其也是超参数),t是样本时间,x是周期性建模部分的输入,即从非周期性建模部分获得的中间特征表示。周期性建模部分520可以被构造为实现如上式(1)所示的傅里叶展开,得到周期性特征表示。周期性建模部分520的输出会被提供以由输出层530映射到预测结果。引入基于傅里叶展开的周期性建模部分520,可以允许通过考虑输入数据内的周期性来生成更准确的预测结果。Where N is a hyperparameter, T represents the target period of the input data to be processed by the prediction model 310 (which is also a hyperparameter), t is the sample time, and x is the input of the periodic modeling part, that is, the intermediate feature representation obtained from the non-periodic modeling part. The periodic modeling part 520 can be constructed to implement the Fourier expansion shown in the above formula (1) to obtain a periodic feature representation. The output of the periodic modeling part 520 will be provided to be mapped to the prediction result by the output layer 530. The introduction of the periodic modeling part 520 based on Fourier expansion can allow more accurate prediction results to be generated by considering the periodicity in the input data.
在一些预测任务中,如果利用基于傅里叶展开的周期性建模部分520,可能会由于模型更新学习率固定而导致学习出的各个频率分量并不会呈现出明显的周期性,进而导致加和之后fN(x,t)的周期性不明显。图6A示出了在一些示例预测任务下利用基于傅里叶展开的周期性建模部分520的预测结果,其中横轴表示时间,纵轴表示预测结果。可以看出在一些预测任务中预测结果的周期性不明显。如果进一步分析傅里叶展开的各个频率段的能量,可能会发现每个傅里叶展开结果中每个频率段的能量均较小,基频分量(n<=3)所学到的信号周期性不 明显,这意味着基于傅里叶展开的周期性建模部分可能主要学习到输入数据中的高频分量以及噪声,而原模型的输出依然会承担学习数据中周期性信息的主要作用,这将会到在实际预测中预测结果不符合预期。In some prediction tasks, if the periodic modeling part 520 based on Fourier expansion is used, the various frequency components learned may not show obvious periodicity due to the fixed model update learning rate, which may lead to the periodicity of f N (x, t) after addition is not obvious. Figure 6A shows the prediction results of using the periodic modeling part 520 based on Fourier expansion in some example prediction tasks, where the horizontal axis represents time and the vertical axis represents the prediction results. It can be seen that the periodicity of the prediction results is not obvious in some prediction tasks. If the energy of each frequency band of the Fourier expansion is further analyzed, it may be found that the energy of each frequency band in each Fourier expansion result is small, and the signal periodicity learned by the fundamental frequency component (n<=3) is not obvious. Obviously, this means that the periodic modeling part based on Fourier expansion may mainly learn the high-frequency components and noise in the input data, while the output of the original model will still play the main role of learning the periodic information in the data, which will lead to the prediction results not meeting expectations in actual predictions.
为了解决这个现象,在一些实施例中,根据具体预测任务,还可以利用周期性高斯核函数来直接建模输入数据的周期性。基于周期性高斯核函数的周期性建模部分520可以被表示为如下:
In order to solve this phenomenon, in some embodiments, according to the specific prediction task, the periodic Gaussian kernel function can also be used to directly model the periodicity of the input data. The periodic modeling part 520 based on the periodic Gaussian kernel function can be expressed as follows:
其中K(x,y)是高斯核函数:p是所要处理的输入数据的目标周期的一半,即2p表示目标周期,l是超参数。在上式(2)中,t表示样本时间,tn是超参数,x是周期性建模部分的输入,即从非周期性建模部分获得的中间特征表示。可以看出,周期性高斯核函数只需要建模傅里叶展开中与sin函数对应的分量。Where K(x, y) is the Gaussian kernel function: p is half of the target period of the input data to be processed, that is, 2p represents the target period, and l is a hyperparameter. In the above formula (2), t represents the sample time, tn is a hyperparameter, and x is the input of the periodic modeling part, that is, the intermediate feature representation obtained from the non-periodic modeling part. It can be seen that the periodic Gaussian kernel function only needs to model the component corresponding to the sin function in the Fourier expansion.
基于周期性高斯核函数的周期性建模部分520可以很好地学习到输入数据的周期性。周期性高斯核函数可以被表示为与目标周期内的相应时间分别对应的核函数。例如,假设p=1,改变参数l,可以得到如图6B所示的一组周期性核函数K(x,y)的样式。当l变化的时候,周期性高斯核函数的最低点会出现变化。当最低点更接近0时,则中心在ti和tj的核函数之间更不容易相互影响,从而需要在一个周期内分布更多的核函数来覆盖所有的周期性函数表达。The periodic modeling part 520 based on the periodic Gaussian kernel function can well learn the periodicity of the input data. The periodic Gaussian kernel function can be expressed as a kernel function corresponding to the corresponding time in the target period. For example, assuming p = 1, changing the parameter l, a set of periodic kernel functions K (x, y) patterns as shown in FIG6B can be obtained. When l changes, the lowest point of the periodic Gaussian kernel function will change. When the lowest point is closer to 0, the kernel functions centered at t i and t j are less likely to affect each other, so more kernel functions need to be distributed in one period to cover all periodic function expressions.
对于假设另t表示一天的周期,tn表示一天中第i个小时,可以得到一组核函数K(t,t0),...,K(t,t23),用于有效表达任何以一天为周期的周期性函数。同时当i和j之间相差较大时,K(tj,ti)<<K(ti,ti)。也就是说,可以理解为,每个小时的模型都由一个中心在tn的高斯核函数负责表示,各个核函数之间关系不 大,核函数的大小由参数l决定。图6C示出了在推荐相关的预测模型中使用高斯核函数时每个an(x)K(t,tn)的输出。可以看出,相比于图6A,图6C中的输出具有明显的周期性。for Assuming that t represents the period of one day and tn represents the i-th hour in a day, we can get a set of kernel functions K(t, t0 ), ..., K(t, t23 ), which can be used to effectively express any periodic function with a period of one day. At the same time, when the difference between i and j is large, K( tj , ti ) << K( ti , ti ). In other words, it can be understood that the model of each hour is represented by a Gaussian kernel function centered at tn , and the relationship between the kernel functions is not large. The size of the kernel function is determined by the parameter l. FIG6C shows the output of each a n (x)K(t, t n ) when the Gaussian kernel function is used in the recommendation-related prediction model. It can be seen that compared with FIG6A , the output in FIG6C has obvious periodicity.
结合图6B和6C,可以注意到在t=tn时周期性高斯核函数取值最大,所以tn决定了一个周期性高斯核函数最高点所处位置。通过使用不同的tn,可以在一个周期内布局多个周期性高斯核函数,从而使得预测模型能够具有能力表达任意以目标周期为T的周期性函数。例如,如果以1天为周期,可以让N=24,tn=0,1,2,…,23小时所对应的周期内相对时间,则K(t,ti)代表第i个中心在第i小时的核函数对于一个样本时间为t的模型的预估影响。这时,对于一个固定的x以及样本时间t,系数an(x)会决定这24个中心点分别处于0-23小时的周期性高斯核函数对该样本的影响,其中与t最接近的tn所对应的核函数起最大的影响作用,而与t相隔较远的核函数起的作用相对较小。综合效应则是通过对所有核函数按权重an(x)进行线性组合所获得。从另一方面,如果同一个样本x在不同时间t下到来,则由于K(t,tn)的变化,会出现在一天以内不同时刻由不同的周期性高斯核函数主要负责预估该样本的情况,从而实现模型对同一个样本特征x在不同样本时间t下的周期性预估。Combining Figures 6B and 6C, it can be noted that the periodic Gaussian kernel function takes the maximum value when t = tn , so tn determines the position of the highest point of a periodic Gaussian kernel function. By using different tn , multiple periodic Gaussian kernel functions can be arranged in one cycle, so that the prediction model can have the ability to express any periodic function with a target period of T. For example, if the cycle is 1 day, N = 24, tn = 0, 1, 2, ..., 23 hours corresponding to the relative time in the cycle, then K(t, ti ) represents the estimated impact of the kernel function of the i-th center at the i-th hour on a model with a sample time of t. At this time, for a fixed x and sample time t, the coefficient an (x) will determine the impact of the periodic Gaussian kernel function of the 24 center points at 0-23 hours on the sample, among which the kernel function corresponding to the tn closest to t has the greatest impact, while the kernel function far away from t has a relatively small effect. The comprehensive effect is obtained by linearly combining all kernel functions according to the weight an (x). On the other hand, if the same sample x arrives at different times t, due to the change of K(t,t n ), different periodic Gaussian kernel functions will be responsible for estimating the sample at different times within a day, thereby realizing the model's periodic estimation of the same sample feature x at different sample times t.
相比于傅里叶展开,周期性高斯核函数能够使预测模型的意义从频域中切换回时域中,特别地,使an所表达的物理意义从频域中切换回时域中。在傅里叶展开中,an表示第n个频率分量所对应的能量,而在周期性高斯核函数的建模中,an表示第n个小时为中心的核函数在预估中所占的权重。同时,由于周期性高斯核函数也是周期性函数,所以一个周期性高斯核函数也可以视为一组傅里叶函数的按照一组既定比例的线性组合。这样,由于对于各个频率分量已经预先组合,所以周期性高斯核函数能够有效防止傅里叶展开方案中由于个别频率分量无法学好而引发的周期性无法学好的现象。同时相比于傅里叶变换,周期性高斯核函数更为平滑,可以通过控制核函数的带宽l很自然的防止在建模过程中引入高频分量以及过拟合现象的发生。因此, 加入基于周期性高斯核函数的周期性建模部分后,预测模型更接近原始基线模型。此外,能够使得模型更轻量化,加入模型的参数量更少。Compared with Fourier expansion, the periodic Gaussian kernel function can switch the meaning of the prediction model from the frequency domain back to the time domain, especially, it can switch the physical meaning expressed by a n from the frequency domain back to the time domain. In Fourier expansion, a n represents the energy corresponding to the nth frequency component, and in the modeling of the periodic Gaussian kernel function, a n represents the weight of the kernel function centered at the nth hour in the estimation. At the same time, since the periodic Gaussian kernel function is also a periodic function, a periodic Gaussian kernel function can also be regarded as a linear combination of a set of Fourier functions according to a set of predetermined proportions. In this way, since each frequency component has been pre-combined, the periodic Gaussian kernel function can effectively prevent the phenomenon of periodicity not being learned well in the Fourier expansion scheme due to the inability to learn individual frequency components well. At the same time, compared with the Fourier transform, the periodic Gaussian kernel function is smoother, and the introduction of high-frequency components and the occurrence of overfitting in the modeling process can be naturally prevented by controlling the bandwidth l of the kernel function. Therefore, After adding the periodic modeling part based on the periodic Gaussian kernel function, the prediction model is closer to the original baseline model. In addition, it can make the model lighter and add fewer parameters to the model.
在预测模型310中,非周期性建模部分和周期性建模部分可以具有多种部署方式。在一些实施例中,非周期性建模部分510可以包括多个预测部分,每个预测部分可以被配置为提供一个中间预测结果。In the prediction model 310, the non-periodic modeling part and the periodic modeling part may have multiple deployment modes. In some embodiments, the non-periodic modeling part 510 may include multiple prediction parts, each of which may be configured to provide an intermediate prediction result.
在一些实施例中,周期性建模部分510可以被构建为其中一个预测部分,周期性建模部分510的输出作为一个中间预测结果,与其他预测部分的中间预测结果一起聚合出最终的预测结果。图7A示出了预测模型310的这样一个示例。预测模型310包括一个或多个预测部分710-1、710-2、……710-M(M为大于等于1的整数),统称为或单独称为预测部分710。每个预测部分710可以基于不同机器学习建模方式来构造,并可以以数据样本为输入进行处理,得到中间特征表示,并提供给输出层530。预测模型310还包括共享部分712,被配置为从数据样本提取中间特征表示,并将中间特征表示提供给非共享部分714和周期性建模部分520。共享部分712和非共享部分714例如可以基于深度学习模型来构建。非共享部分714处理中间特征表示,并提取进一步的中间特征表示提供给输出层530。周期性建模部分520处理来自共享部分712的中间特征表示以及数据样本的样本时间,并提供周期性特征表示给输出层530。输出层530汇总来自各个部分的特征表示,将其映射到预测结果。In some embodiments, the periodic modeling part 510 can be constructed as one of the prediction parts, and the output of the periodic modeling part 510 is used as an intermediate prediction result, and the intermediate prediction results of other prediction parts are aggregated to form a final prediction result. FIG. 7A shows such an example of a prediction model 310. The prediction model 310 includes one or more prediction parts 710-1, 710-2, ... 710-M (M is an integer greater than or equal to 1), collectively or individually referred to as a prediction part 710. Each prediction part 710 can be constructed based on different machine learning modeling methods, and can be processed with a data sample as input to obtain an intermediate feature representation, and provided to the output layer 530. The prediction model 310 also includes a shared part 712, which is configured to extract an intermediate feature representation from the data sample and provide the intermediate feature representation to a non-shared part 714 and a periodic modeling part 520. The shared part 712 and the non-shared part 714 can be constructed based on a deep learning model, for example. The non-shared part 714 processes the intermediate feature representation and extracts a further intermediate feature representation to provide to the output layer 530. The periodic modeling part 520 processes the intermediate feature representations from the shared part 712 and the sample time of the data samples and provides the periodic feature representations to the output layer 530. The output layer 530 aggregates the feature representations from various parts and maps them to prediction results.
在一些实施例中,为了进一步简化预测模型310的结构,使模型参数量更少,非周期性建模部分中的多个预测部分可以各自处理输入的数据样本,并提供多个中间预测结果。这些中间预测结果可以被级联作为中间特征表示,输入到周期性预测部分530。图7B示出了预测模型310的这样一个示例。如图7B所示,预测模型310包括多个预测部分710-1、710-2、……710-M以及预测部分716。这些预测部分分别处理输入的数据样本,得到多个中间预测结果,这些中间预测结果被级联得到中间预测结果的级联720。中间预测结果的级联720被输入到周期性建模部分520。周期性建模部分520还接收数据样本 的样本时间,以基于学习到的输入数据的周期性来给出准确的特征提取,帮助准确的结果预测。周期性建模部分520确定出周期性特征表示,提供到输出层530,以用于确定预测结果。In some embodiments, in order to further simplify the structure of the prediction model 310 and reduce the number of model parameters, multiple prediction parts in the non-periodic modeling part can each process the input data sample and provide multiple intermediate prediction results. These intermediate prediction results can be cascaded as intermediate feature representations and input to the periodic prediction part 530. Figure 7B shows such an example of a prediction model 310. As shown in Figure 7B, the prediction model 310 includes multiple prediction parts 710-1, 710-2,...710-M and a prediction part 716. These prediction parts process the input data samples respectively to obtain multiple intermediate prediction results, which are cascaded to obtain a cascade 720 of intermediate prediction results. The cascade 720 of intermediate prediction results is input to the periodic modeling part 520. The periodic modeling part 520 also receives data samples The sample time is used to provide accurate feature extraction based on the learned periodicity of the input data, which helps to accurately predict the results. The periodicity modeling part 520 determines the periodic feature representation and provides it to the output layer 530 for determining the prediction result.
当然,除图7A和图7B的示例外,周期性建模部分520在预测模型中还可以有其他部署方式,本文中对此不限定。Of course, in addition to the examples in FIG. 7A and FIG. 7B , the periodicity modeling part 520 may be deployed in other ways in the prediction model, which is not limited in this document.
图8示出了根据本公开的一些实施例的用于周期性数据的预测的过程800的流程图。过程800例如可以被实现在图1的模型应用***120处。FIG8 shows a flow chart of a process 800 for prediction of periodic data according to some embodiments of the present disclosure. The process 800 may be implemented, for example, at the model application system 120 of FIG1 .
在框810,模型应用***120获取经训练的预测模型,预测模型被配置为处理具有目标周期的输入数据。预测模型在第一训练过程中至少基于第一训练数据样本、相关联的第一样本时间以及第一标注信息而被训练,第一样本时间指示在目标周期内第一标注信息的获得时间。预测模型的训练例如可以在模型训练***120处实现。模型应用***110可以从模型训练***110获取经训练的预测模型。At block 810, the model application system 120 obtains a trained prediction model, the prediction model being configured to process input data having a target period. The prediction model is trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating a time at which the first annotation information is obtained within the target period. The training of the prediction model can be implemented, for example, at the model training system 120. The model application system 110 can obtain the trained prediction model from the model training system 110.
在框820,模型应用***120获取目标数据样本和相关联的目标样本时间,目标样本时间指示在目标周期内目标数据样本的获得时间。在框830,模型应用***120利用预测模型,基于目标数据样本和目标样本时间来确定针对目标数据样本的预测结果。At block 820, the model application system 120 obtains a target data sample and an associated target sample time, the target sample time indicating the time at which the target data sample is obtained within a target period. At block 830, the model application system 120 uses the prediction model to determine a prediction result for the target data sample based on the target data sample and the target sample time.
在一些实施例中,预测模型包括非周期性建模部分、周期性建模部分和输出层。非周期性建模部分被配置为从输入的数据样本提取中间特征表示。周期性建模部分被配置为基于在目标周期内与数据样本对应的样本时间来处理中间特征表示,以获得周期性特征表示。输出层被配置为至少基于周期性特征表示来确定针对数据样本的预测结果。In some embodiments, the prediction model includes a non-periodic modeling part, a periodic modeling part, and an output layer. The non-periodic modeling part is configured to extract an intermediate feature representation from an input data sample. The periodic modeling part is configured to process the intermediate feature representation based on a sample time corresponding to the data sample within a target period to obtain a periodic feature representation. The output layer is configured to determine a prediction result for the data sample based at least on the periodic feature representation.
在一些实施例中,周期建模部分被配置为利用周期性高斯核函数,基于样本时间来处理中间特征表示。在一些实施例中,周期性高斯核函数被表示为与目标周期内的相应时间分别对应的核函数。In some embodiments, the periodic modeling portion is configured to process the intermediate feature representation based on the sample time using a periodic Gaussian kernel function. In some embodiments, the periodic Gaussian kernel function is represented as a kernel function corresponding to the corresponding time in the target period.
在一些实施例中,周期建模部分被配置为利用傅里叶展开函数,基于样本时间来处理中间特征表示。 In some embodiments, the period modeling portion is configured to process the intermediate feature representation based on the sample time using a Fourier expansion function.
在一些实施例中,非周期性建模部分包括多个预测部分,多个预测部分被配置为处理输入的数据样本并输出多个中间预测结果,多个中间预测结果被级联作为中间特征表示。In some embodiments, the non-periodic modeling part includes multiple prediction parts, which are configured to process input data samples and output multiple intermediate prediction results, and the multiple intermediate prediction results are cascaded as intermediate feature representations.
在一些实施例中,预测模型被配置为预测推荐项目的转化结果。第一训练数据样本至少指示训练推荐项目的相关信息,第一样本时间指示在目标周期内训练推荐项目的真实转化结果的获得时间。目标数据样本至少指示待被推荐的目标推荐项目的相关信息,目标样本时间指示在目标周期内目标推荐项目要被推荐的时间。In some embodiments, the prediction model is configured to predict the conversion result of the recommended item. The first training data sample indicates at least the relevant information of the training recommended item, and the first sample time indicates the time when the real conversion result of the training recommended item is obtained within the target period. The target data sample indicates at least the relevant information of the target recommended item to be recommended, and the target sample time indicates the time when the target recommended item is to be recommended within the target period.
在一些实施例中,真实转化结果的获得时间相对于训练推荐项目被推荐的时间具有延迟。In some embodiments, the time when the actual conversion result is obtained is delayed relative to the time when the training recommendation item is recommended.
在一些实施例中,预测模型在第一训练过程中还基于第二训练数据样本、相关联的第二样本时间以及第二标注信息而被训练,第二样本时间指示在目标周期内第二标注信息的获得时间。In some embodiments, the prediction model is further trained during the first training process based on second training data samples, associated second sample times, and second annotation information, where the second sample time indicates a time at which the second annotation information is obtained within the target period.
图9示出了根据本公开的一些实施例的用于周期性数据的预测的装置900的示意性结构框图。装置900可以被实现为或者被包括在模型应用***120中。装置900中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。9 shows a schematic structural block diagram of an apparatus 900 for predicting periodic data according to some embodiments of the present disclosure. The apparatus 900 may be implemented as or included in the model application system 120. Each module/component in the apparatus 900 may be implemented by hardware, software, firmware, or any combination thereof.
如图所示,装置900包括模型获取模块910,被配置为获取经训练的预测模型,预测模型被配置为处理具有目标周期的输入数据,预测模型在第一训练过程中至少基于第一训练数据样本、相关联的第一样本时间以及第一标注信息而被训练,第一样本时间指示在目标周期内第一标注信息的获得时间。装置900还包括目标获取模块920,被配置为获取目标数据样本和相关联的目标样本时间,目标样本时间指示在目标周期内目标数据样本的获得时间。装置900还包括预测执行模块930,被配置为利用预测模型,基于目标数据样本和目标样本时间来确定针对目标数据样本的预测结果。As shown in the figure, the device 900 includes a model acquisition module 910, which is configured to acquire a trained prediction model, the prediction model is configured to process input data with a target period, and the prediction model is trained in a first training process based on at least a first training data sample, an associated first sample time, and a first annotation information, and the first sample time indicates the time when the first annotation information is obtained within the target period. The device 900 also includes a target acquisition module 920, which is configured to acquire a target data sample and an associated target sample time, and the target sample time indicates the time when the target data sample is obtained within the target period. The device 900 also includes a prediction execution module 930, which is configured to use the prediction model to determine a prediction result for the target data sample based on the target data sample and the target sample time.
在一些实施例中,预测模型包括非周期性建模部分、周期性建模部分和输出层。非周期性建模部分被配置为从输入的数据样本提取中间特征表示。周期性建模部分被配置为基于在目标周期内与数据样本 对应的样本时间来处理中间特征表示,以获得周期性特征表示。输出层被配置为至少基于周期性特征表示来确定针对数据样本的预测结果。In some embodiments, the prediction model includes a non-periodic modeling part, a periodic modeling part, and an output layer. The non-periodic modeling part is configured to extract intermediate feature representations from input data samples. The periodic modeling part is configured to generate an output layer based on the data samples in the target period. The intermediate feature representation is processed at the corresponding sample time to obtain a periodic feature representation. The output layer is configured to determine a prediction result for the data sample based at least on the periodic feature representation.
在一些实施例中,周期建模部分被配置为利用周期性高斯核函数,基于样本时间来处理中间特征表示。在一些实施例中,周期性高斯核函数被表示为与目标周期内的相应时间分别对应的核函数。In some embodiments, the periodic modeling portion is configured to process the intermediate feature representation based on the sample time using a periodic Gaussian kernel function. In some embodiments, the periodic Gaussian kernel function is represented as a kernel function corresponding to the corresponding time in the target period.
在一些实施例中,周期建模部分被配置为利用傅里叶展开函数,基于样本时间来处理中间特征表示。In some embodiments, the period modeling portion is configured to process the intermediate feature representation based on the sample time using a Fourier expansion function.
在一些实施例中,非周期性建模部分包括多个预测部分,多个预测部分被配置为处理输入的数据样本并输出多个中间预测结果,多个中间预测结果被级联作为中间特征表示。In some embodiments, the non-periodic modeling part includes multiple prediction parts, which are configured to process input data samples and output multiple intermediate prediction results, and the multiple intermediate prediction results are cascaded as intermediate feature representations.
在一些实施例中,预测模型被配置为预测推荐项目的转化结果。第一训练数据样本至少指示训练推荐项目的相关信息,第一样本时间指示在目标周期内训练推荐项目的真实转化结果的获得时间。目标数据样本至少指示待被推荐的目标推荐项目的相关信息,目标样本时间指示在目标周期内目标推荐项目要被推荐的时间。In some embodiments, the prediction model is configured to predict the conversion result of the recommended item. The first training data sample indicates at least the relevant information of the training recommended item, and the first sample time indicates the time when the real conversion result of the training recommended item is obtained within the target period. The target data sample indicates at least the relevant information of the target recommended item to be recommended, and the target sample time indicates the time when the target recommended item is to be recommended within the target period.
在一些实施例中,真实转化结果的获得时间相对于训练推荐项目被推荐的时间具有延迟。In some embodiments, the time when the actual conversion result is obtained is delayed relative to the time when the training recommendation item is recommended.
在一些实施例中,预测模型在第一训练过程中还基于第二训练数据样本、相关联的第二样本时间以及第二标注信息而被训练,第二样本时间指示在目标周期内第二标注信息的获得时间。In some embodiments, the prediction model is further trained during the first training process based on second training data samples, associated second sample times, and second annotation information, where the second sample time indicates a time at which the second annotation information is obtained within the target period.
图10示出了其中可以实施本公开的一个或多个实施例的电子设备1000的框图。应当理解,图10所示出的电子设备1000仅仅是示例性的,而不应当构成对本文所描述的实施例的功能和范围的任何限制。图10所示出的电子设备1000可以用于实现模型训练***110和/或模型应用***120。电子设备1000可以包括或被实现为图9的装置900。FIG10 shows a block diagram of an electronic device 1000 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the electronic device 1000 shown in FIG10 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein. The electronic device 1000 shown in FIG10 may be used to implement the model training system 110 and/or the model application system 120. The electronic device 1000 may include or be implemented as the device 900 of FIG9 .
如图10所示,电子设备1000是通用计算设备的形式。电子设备1000的组件可以包括但不限于一个或多个处理器或处理单元1010、 存储器1020、存储设备1030、一个或多个通信单元1040、一个或多个输入设备1050以及一个或多个输出设备1060。处理单元1010可以是实际或虚拟处理器并且能够根据存储器1020中存储的程序来执行各种处理。在多处理器***中,多个处理单元并行执行计算机可执行指令,以提高电子设备1000的并行处理能力。As shown in FIG. 10 , the electronic device 1000 is in the form of a general-purpose computing device. The components of the electronic device 1000 may include, but are not limited to, one or more processors or processing units 1010, Memory 1020, storage device 1030, one or more communication units 1040, one or more input devices 1050, and one or more output devices 1060. Processing unit 1010 may be a real or virtual processor and is capable of performing various processes according to a program stored in memory 1020. In a multi-processor system, multiple processing units execute computer executable instructions in parallel to improve the parallel processing capability of electronic device 1000.
电子设备1000通常包括多个计算机存储介质。这样的介质可以是电子设备1000可访问的任何可以获得的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器1020可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或它们的某种组合。存储设备1030可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,诸如闪存驱动、磁盘或者任何其他介质,其可以能够用于存储信息和/或数据(例如用于训练的训练数据)并且可以在电子设备1000内被访问。The electronic device 1000 typically includes a plurality of computer storage media. Such media may be any available media accessible to the electronic device 1000, including but not limited to volatile and non-volatile media, removable and non-removable media. The memory 1020 may be a volatile memory (e.g., a register, a cache, a random access memory (RAM)), a non-volatile memory (e.g., a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. The storage device 1030 may be a removable or non-removable medium, and may include a machine-readable medium, such as a flash drive, a disk, or any other medium, which may be capable of being used to store information and/or data (e.g., training data for training) and may be accessed within the electronic device 1000.
电子设备1000可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图10中示出,可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中,每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器1020可以包括计算机程序产品1025,其具有一个或多个程序模块,这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。The electronic device 1000 may further include additional removable/non-removable, volatile/non-volatile storage media. Although not shown in FIG. 10 , a disk drive for reading or writing from a removable, non-volatile disk (e.g., a “floppy disk”) and an optical drive for reading or writing from a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. The memory 1020 may include a computer program product 1025 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.
通信单元1040实现通过通信介质与其他电子设备进行通信。附加地,电子设备1000的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,电子设备1000可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。The communication unit 1040 implements communication with other electronic devices through a communication medium. Additionally, the functions of the components of the electronic device 1000 can be implemented in a single computing cluster or multiple computing machines that can communicate through a communication connection. Therefore, the electronic device 1000 can operate in a networked environment using a logical connection with one or more other servers, a network personal computer (PC), or another network node.
输入设备1050可以是一个或多个输入设备,例如鼠标、键盘、追踪球等。输出设备1060可以是一个或多个输出设备,例如显示器、 扬声器、打印机等。电子设备1000还可以根据需要通过通信单元1040与一个或多个外部设备(未示出)进行通信,外部设备诸如存储设备、显示设备等,与一个或多个使得用户与电子设备1000交互的设备进行通信,或者与使得电子设备1000与一个或多个其他电子设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。The input device 1050 may be one or more input devices, such as a mouse, a keyboard, a tracking ball, etc. The output device 1060 may be one or more output devices, such as a display, Speakers, printers, etc. The electronic device 1000 can also communicate with one or more external devices (not shown) through the communication unit 1040 as needed, such as storage devices, display devices, etc., communicate with one or more devices that allow users to interact with the electronic device 1000, or communicate with any device that allows the electronic device 1000 to communicate with one or more other electronic devices (e.g., a network card, a modem, etc.). Such communication can be performed via an input/output (I/O) interface (not shown).
根据本公开的示例性实现方式,提供了一种计算机可读存储介质,其上存储有计算机可执行指令,其中计算机可执行指令被处理器执行以实现上文描述的方法。根据本公开的示例性实现方式,还提供了一种计算机程序产品,计算机程序产品被有形地存储在非瞬态计算机可读介质上并且包括计算机可执行指令,而计算机可执行指令被处理器执行以实现上文描述的方法。According to an exemplary implementation of the present disclosure, a computer-readable storage medium is provided, on which computer-executable instructions are stored, wherein the computer-executable instructions are executed by a processor to implement the method described above. According to an exemplary implementation of the present disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
这里参照根据本公开实现的方法、装置、设备和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present disclosure are described herein with reference to the flowcharts and/or block diagrams of the methods, devices, equipment, and computer program products implemented according to the present disclosure. It should be understood that each box in the flowchart and/or block diagram and the combination of each box in the flowchart and/or block diagram can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元,从而生产出一种机器,使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processing unit of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上,使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现 流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement The functions/acts specified in one or more blocks in the flowchart and/or block diagram.
附图中的流程图和框图显示了根据本公开的多个实现的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings show the possible architecture, function and operation of the system, method and computer program product according to multiple implementations of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of a module, program segment or instruction includes one or more executable instructions for realizing the logical function of the specification. In some implementations as replacements, the function marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be realized by a special hardware-based system that performs the function or action of the specification, or can be realized by a combination of special hardware and computer instructions.
以上已经描述了本公开的各实现,上述说明是示例性的,并非穷尽性的,并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实现的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文公开的各个实现方式。 The above descriptions of various implementations of the present disclosure are exemplary, non-exhaustive, and not limited to the disclosed implementations. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The selection of terms used herein is intended to best explain the principles of the implementations, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the various implementations disclosed herein.

Claims (18)

  1. 一种用于周期性数据的预测的方法,包括:A method for predicting periodic data, comprising:
    获取经训练的预测模型,所述预测模型被配置为处理具有目标周期的输入数据,所述预测模型在第一训练过程中至少基于第一训练数据样本、相关联的第一样本时间以及第一标注信息而被训练,所述第一样本时间指示在所述目标周期内所述第一标注信息的获得时间;Obtaining a trained prediction model, the prediction model being configured to process input data having a target period, the prediction model being trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating a time at which the first annotation information is obtained within the target period;
    获取目标数据样本和相关联的目标样本时间,所述目标样本时间指示在所述目标周期内所述目标数据样本的获得时间;以及acquiring a target data sample and an associated target sample time, the target sample time indicating a time at which the target data sample is obtained within the target period; and
    利用所述预测模型,基于所述目标数据样本和所述目标样本时间来确定针对所述目标数据样本的预测结果。The prediction model is used to determine a prediction result for the target data sample based on the target data sample and the target sample time.
  2. 根据权利要求1所述的方法,其中所述预测模型包括非周期性建模部分、周期性建模部分和输出层,The method according to claim 1, wherein the prediction model comprises a non-periodic modeling part, a periodic modeling part and an output layer,
    所述非周期性建模部分被配置为从输入的数据样本提取中间特征表示,The non-periodic modeling part is configured to extract intermediate feature representations from input data samples,
    所述周期性建模部分被配置为基于在所述目标周期内与所述数据样本对应的样本时间来处理所述中间特征表示,以获得周期性特征表示,并且The periodic modeling portion is configured to process the intermediate feature representation based on sample times corresponding to the data samples within the target period to obtain a periodic feature representation, and
    所述输出层被配置为至少基于所述周期性特征表示来确定针对所述数据样本的预测结果。The output layer is configured to determine a prediction result for the data sample based at least on the periodic feature representation.
  3. 根据权利要求2所述的方法,其中所述周期建模部分被配置为利用周期性高斯核函数,基于所述样本时间来处理所述中间特征表示,并且The method according to claim 2, wherein the periodic modeling portion is configured to process the intermediate feature representation based on the sample time using a periodic Gaussian kernel function, and
    其中所述周期性高斯核函数被表示为与所述目标周期内的相应时间分别对应的核函数。The periodic Gaussian kernel function is represented as a kernel function corresponding to the corresponding time in the target period.
  4. 根据权利要求2所述的方法,其中所述周期建模部分被配置为利用傅里叶展开函数,基于所述样本时间来处理所述中间特征表示。The method according to claim 2, wherein the period modeling portion is configured to process the intermediate feature representation based on the sample time using a Fourier expansion function.
  5. 根据权利要求2所述的方法,其中所述非周期性建模部分包括多个预测部分,所述多个预测部分被配置为处理输入的所述数据样 本并输出多个中间预测结果,所述多个中间预测结果被级联作为所述中间特征表示。The method according to claim 2, wherein the non-periodic modeling part includes a plurality of prediction parts, wherein the plurality of prediction parts are configured to process the input data samples. The method also outputs a plurality of intermediate prediction results, and the plurality of intermediate prediction results are cascaded as the intermediate feature representation.
  6. 根据权利要求1所述的方法,其中所述预测模型被配置为预测推荐项目的转化结果,The method according to claim 1, wherein the prediction model is configured to predict the conversion result of the recommended item,
    所述第一训练数据样本至少指示训练推荐项目的相关信息,所述第一样本时间指示在所述目标周期内所述训练推荐项目的真实转化结果的获得时间,以及The first training data sample at least indicates relevant information of the training recommendation item, the first sample time indicates the time when the real conversion result of the training recommendation item is obtained within the target period, and
    所述目标数据样本至少指示待被推荐的目标推荐项目的相关信息,所述目标样本时间指示在所述目标周期内所述目标推荐项目要被推荐的时间。The target data sample at least indicates relevant information of a target recommendation item to be recommended, and the target sample time indicates the time when the target recommendation item is to be recommended within the target period.
  7. 根据权利要求6所述的方法,其中所述真实转化结果的所述获得时间相对于所述训练推荐项目被推荐的时间具有延迟。The method according to claim 6, wherein the time of obtaining the actual conversion result is delayed relative to the time when the training recommendation item is recommended.
  8. 根据权利要求1所述的方法,其中所述预测模型在所述第一训练过程中还基于第二训练数据样本、相关联的第二样本时间以及第二标注信息而被训练,所述第二样本时间指示在所述目标周期内所述第二标注信息的获得时间。The method according to claim 1, wherein the prediction model is also trained in the first training process based on a second training data sample, an associated second sample time, and second annotation information, and the second sample time indicates the time when the second annotation information is obtained within the target period.
  9. 一种用于周期性数据的预测的装置,包括:A device for predicting periodic data, comprising:
    模型获取模块,被配置为获取经训练的预测模型,所述预测模型被配置为处理具有目标周期的输入数据,所述预测模型在第一训练过程中至少基于第一训练数据样本、相关联的第一样本时间以及第一标注信息而被训练,所述第一样本时间指示在所述目标周期内所述第一标注信息的获得时间;a model acquisition module, configured to acquire a trained prediction model, the prediction model being configured to process input data having a target period, the prediction model being trained in a first training process based on at least a first training data sample, an associated first sample time, and first annotation information, the first sample time indicating a time at which the first annotation information is obtained within the target period;
    目标获取模块,被配置为获取目标数据样本和相关联的目标样本时间,所述目标样本时间指示在所述目标周期内所述目标数据样本的获得时间;以及a target acquisition module configured to acquire a target data sample and an associated target sample time, wherein the target sample time indicates a time at which the target data sample is obtained within the target period; and
    预测执行模块,被配置为利用所述预测模型,基于所述目标数据样本和所述目标样本时间来确定针对所述目标数据样本的预测结果。The prediction execution module is configured to use the prediction model to determine a prediction result for the target data sample based on the target data sample and the target sample time.
  10. 根据权利要求9所述的装置,其中所述预测模型包括非周期性建模部分、周期性建模部分和输出层, The apparatus according to claim 9, wherein the prediction model comprises a non-periodic modeling part, a periodic modeling part and an output layer,
    所述非周期性建模部分被配置为从输入的数据样本提取中间特征表示,The non-periodic modeling part is configured to extract intermediate feature representations from input data samples,
    所述周期性建模部分被配置为基于在所述目标周期内与所述数据样本对应的样本时间来处理所述中间特征表示,以获得周期性特征表示,并且The periodic modeling portion is configured to process the intermediate feature representation based on sample times corresponding to the data samples within the target period to obtain a periodic feature representation, and
    所述输出层被配置为至少基于所述周期性特征表示来确定针对所述数据样本的预测结果。The output layer is configured to determine a prediction result for the data sample based at least on the periodic feature representation.
  11. 根据权利要求10所述的装置,其中所述周期建模部分被配置为利用周期性高斯核函数,基于所述样本时间来处理所述中间特征表示,并且The apparatus according to claim 10, wherein the periodic modeling portion is configured to process the intermediate feature representation based on the sample time using a periodic Gaussian kernel function, and
    其中所述周期性高斯核函数被表示为与所述目标周期内的相应时间分别对应的核函数。The periodic Gaussian kernel function is represented as a kernel function corresponding to the corresponding time in the target period.
  12. 根据权利要求10所述的装置,其中所述周期建模部分被配置为利用傅里叶展开函数,基于所述样本时间来处理所述中间特征表示。The apparatus according to claim 10, wherein the period modeling portion is configured to process the intermediate feature representation based on the sample time using a Fourier expansion function.
  13. 根据权利要求10所述的装置,其中所述非周期性建模部分包括多个预测部分,所述多个预测部分被配置为处理输入的所述数据样本并输出多个中间预测结果,所述多个中间预测结果被级联作为所述中间特征表示。The apparatus according to claim 10, wherein the non-periodic modeling part comprises a plurality of prediction parts, the plurality of prediction parts are configured to process the input data samples and output a plurality of intermediate prediction results, the plurality of intermediate prediction results are cascaded as the intermediate feature representation.
  14. 根据权利要求9所述的装置,其中所述预测模型被配置为预测推荐项目的转化结果,The apparatus according to claim 9, wherein the prediction model is configured to predict a conversion result of the recommended item,
    所述第一训练数据样本至少指示训练推荐项目的相关信息,所述第一样本时间指示在所述目标周期内所述训练推荐项目的真实转化结果的获得时间,以及The first training data sample at least indicates relevant information of the training recommendation item, the first sample time indicates the time when the real conversion result of the training recommendation item is obtained within the target period, and
    所述目标数据样本至少指示待被推荐的目标推荐项目的相关信息,所述目标样本时间指示在所述目标周期内所述目标推荐项目要被推荐的时间。The target data sample at least indicates relevant information of a target recommendation item to be recommended, and the target sample time indicates the time when the target recommendation item is to be recommended within the target period.
  15. 根据权利要求14所述的装置,其中所述真实转化结果的所述获得时间相对于所述训练推荐项目被推荐的时间具有延迟。 The apparatus according to claim 14, wherein the time when the real conversion result is obtained is delayed relative to the time when the training recommendation item is recommended.
  16. 根据权利要求9所述的装置,其中所述预测模型在所述第一训练过程中还基于第二训练数据样本、相关联的第二样本时间以及第二标注信息而被训练,所述第二样本时间指示在所述目标周期内所述第二标注信息的获得时间。The apparatus according to claim 9, wherein the prediction model is also trained in the first training process based on a second training data sample, an associated second sample time, and second annotation information, wherein the second sample time indicates a time when the second annotation information is obtained within the target period.
  17. 一种电子设备,包括:An electronic device, comprising:
    至少一个处理单元;以及at least one processing unit; and
    至少一个存储器,所述至少一个存储器被耦合到所述至少一个处理单元并且存储用于由所述至少一个处理单元执行的指令,所述指令在由所述至少一个处理单元执行时使所述设备执行根据权利要求1至8中任一项所述的方法。At least one memory, the at least one memory being coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions causing the device to perform the method according to any one of claims 1 to 8 when executed by the at least one processing unit.
  18. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现根据权利要求1至8中任一项所述的方法。 A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any one of claims 1 to 8.
PCT/CN2023/130543 2022-11-18 2023-11-08 Method and apparatus for predicting cyclic data, and device and medium WO2024104233A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211447814.0A CN115718822A (en) 2022-11-18 2022-11-18 Method, apparatus, device and medium for prediction of periodic data
CN202211447814.0 2022-11-18

Publications (1)

Publication Number Publication Date
WO2024104233A1 true WO2024104233A1 (en) 2024-05-23

Family

ID=85255600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/130543 WO2024104233A1 (en) 2022-11-18 2023-11-08 Method and apparatus for predicting cyclic data, and device and medium

Country Status (2)

Country Link
CN (1) CN115718822A (en)
WO (1) WO2024104233A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115718822A (en) * 2022-11-18 2023-02-28 抖音视界有限公司 Method, apparatus, device and medium for prediction of periodic data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10824940B1 (en) * 2016-11-30 2020-11-03 Amazon Technologies, Inc. Temporal ensemble of machine learning models trained during different time intervals
CN112380449A (en) * 2020-12-03 2021-02-19 腾讯科技(深圳)有限公司 Information recommendation method, model training method and related device
CN112954066A (en) * 2021-02-26 2021-06-11 北京三快在线科技有限公司 Information pushing method and device, electronic equipment and readable storage medium
CN115718822A (en) * 2022-11-18 2023-02-28 抖音视界有限公司 Method, apparatus, device and medium for prediction of periodic data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10824940B1 (en) * 2016-11-30 2020-11-03 Amazon Technologies, Inc. Temporal ensemble of machine learning models trained during different time intervals
CN112380449A (en) * 2020-12-03 2021-02-19 腾讯科技(深圳)有限公司 Information recommendation method, model training method and related device
CN112954066A (en) * 2021-02-26 2021-06-11 北京三快在线科技有限公司 Information pushing method and device, electronic equipment and readable storage medium
CN115718822A (en) * 2022-11-18 2023-02-28 抖音视界有限公司 Method, apparatus, device and medium for prediction of periodic data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUANG CHENG, GONG JIAN: "Seasonal Neural Network Model on Internet Traffic Behavior", JOURNAL OF CHINESE COMPUTER SYSTEMS. MINI-MICRO SYSTEMS, vol. 23, no. 11, 21 November 2002 (2002-11-21), pages 1321 - 1324, XP093170134 *

Also Published As

Publication number Publication date
CN115718822A (en) 2023-02-28

Similar Documents

Publication Publication Date Title
Cai et al. A noise-immune LSTM network for short-term traffic flow forecasting
US11288575B2 (en) Asynchronous neural network training
US20220283860A1 (en) Guaranteed quality of service in cloud computing environments
US20180285759A1 (en) Online hyperparameter tuning in distributed machine learning
US20180197087A1 (en) Systems and methods for retraining a classification model
US11640617B2 (en) Metric forecasting employing a similarity determination in a digital medium environment
US8190537B1 (en) Feature selection for large scale models
WO2017052671A1 (en) Regularized model adaptation for in-session recommendations
US20230297847A1 (en) Machine-learning techniques for factor-level monotonic neural networks
WO2024104233A1 (en) Method and apparatus for predicting cyclic data, and device and medium
Zhang et al. Recurrent tensor factorization for time-aware service recommendation
Zhao et al. Tag‐Aware Recommender System Based on Deep Reinforcement Learning
Khan et al. MISGD: moving-information-based stochastic gradient descent paradigm for personalized fuzzy recommender systems
Gao et al. Active sampler: Light-weight accelerator for complex data analytics at scale
US11922287B2 (en) Video recommendation with multi-gate mixture of experts soft actor critic
Wang et al. Distribution inference from early-stage stationary data streams by transfer learning
US20210182953A1 (en) Systems and methods for optimal bidding in a business to business environment
Wang et al. Advantages of combining factorization machine with Elman neural network for volatility forecasting of stock market
Chakrabarti et al. Joint label inference in networks
Singh et al. A review of online supervised learning
Sagaama et al. Automatic parameter tuning for big data pipelines with deep reinforcement learning
Bose et al. Location and Time Aware Real Time Cloud Service Recommendation System Based on Multilayer Perceptron.
Gnanasekaran et al. Analyzing the QoS prediction for web service recommendation using time series forecasting with deep learning techniques
Lu et al. Moving boundary truncated grid method: Multidimensional quantum dynamics
Mao et al. Predicting QoS for cloud services through prefilling-based matrix factorization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890669

Country of ref document: EP

Kind code of ref document: A1