WO2018205845A1

WO2018205845A1 - Data processing method, server, and computer storage medium

Info

Publication number: WO2018205845A1
Application number: PCT/CN2018/084664
Authority: WO
Inventors: 钟沛珉
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-05-10
Filing date: 2018-04-26
Publication date: 2018-11-15
Also published as: CN108874812B; US11816172B2; US20190266206A1; CN108874812A

Abstract

Disclosed are a data processing method, a server, and a computer storage medium, wherein the method comprises: collecting a real-time data flow, wherein the real-time data flow includes a first type of data representing user behaviour and a second type of data representing data of interest to a user; according to the first type of data and the second type of data, establishing a user state queue; according to the user state queue and information about the time the first type of data is triggered, tracking, in real time, a change in a user state, and obtaining a user state characteristic; acquiring candidate data to be processed and an operational model; using the user state characteristic and the candidate data to be processed as input parameters for the operational model, and obtaining output parameters after the operational model carries out an operation; and obtaining recommendation information according to the output parameters, and sending the recommendation information.

Description

一种数据处理方法及服务器、计算机存储介质Data processing method and server, computer storage medium

相关申请的交叉引用Cross-reference to related applications

本申请基于申请号为201710326633.5、申请日为2017年05月10日的中国专利申请提出，并要求该中国专利申请的优先权，该中国专利申请的全部内容在此引入本申请作为参考。The present application is based on a Chinese patent application filed on Jan. 10, 2017, the entire disclosure of which is hereby incorporated by reference.

技术领域Technical field

本申请涉及通信技术，尤其涉及一种数据处理方法及服务器、计算机存储介质。The present application relates to communication technologies, and in particular, to a data processing method, a server, and a computer storage medium.

背景技术Background technique

从信息交互中发掘出用户所关注的信息，以该用户所关注的信息为基础为用户提供更多相关的服务，是目前信息发掘的发展趋势。比如，在信息发掘的过程中，可以对用户状态(如用户当下的兴趣或偏好)等进行分析。It is the development trend of information mining to discover the information that the user pays attention to from the information interaction and provide more related services to the user based on the information that the user pays attention to. For example, in the process of information mining, the user status (such as the user's current interests or preferences) can be analyzed.

目前，按照固定时间段的选取机制来捕获用户状态，以用于分析。由于用户状态变化较快、且有一定随机性的特性，导致其难以被精确的捕获到，从而无法精准的为用户提供推荐信息。Currently, user status is captured by a fixed time period selection mechanism for analysis. Due to the fast change of the user state and the randomness of the user, it is difficult to be accurately captured, so that the recommendation information cannot be accurately provided to the user.

如何精准的得到用户状态并对其进行描述，是要解决的技术问题。然而，相关技术中，对此，尚未存在有效的解决方案。How to accurately get the user status and describe it is a technical problem to be solved. However, in the related art, there is no effective solution to this.

发明内容Summary of the invention

有鉴于此，本申请实施例提供了一种数据处理方法及服务器、计算机存储介质，至少解决了现有技术存在的问题。In view of this, the embodiment of the present application provides a data processing method, a server, and a computer storage medium, which at least solve the problems existing in the prior art.

本申请实施例的一种数据处理方法，所述方法包括：A data processing method according to an embodiment of the present application, the method includes:

收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据；Collecting a real-time data stream, the real-time data stream including a first type of data characterizing the user's behavior and a second type of data characterizing the user's attention data itself;

根据所述第一类数据和所述第二类数据建立用户状态队列；Establishing a user status queue according to the first type of data and the second type of data;

根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；And tracking the change of the user state in real time according to the user status queue and the time information triggering the first type of data, to obtain a user status feature;

获取待处理的候选数据和运算模型；Obtaining candidate data and an operation model to be processed;

将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数；Taking the user state feature and the candidate data to be processed as input parameters of the operation model, and obtaining an output parameter after the operation of the operation model;

根据所述输出参数得到推荐信息，发送所述推荐信息。The recommendation information is obtained according to the output parameter, and the recommendation information is sent.

本申请实施例的一种服务器，所述服务器包括：A server in an embodiment of the present application, the server includes:

收集单元，配置为收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据；a collecting unit configured to collect a real-time data stream, the real-time data stream including a first type of data representing a user behavior and a second type of data representing a user's attention data itself;

队列建立单元，配置为根据所述第一类数据和所述第二类数据建立用户状态队列；a queue establishing unit, configured to establish a user status queue according to the first type of data and the second type of data;

状态变化跟踪单元，配置为根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；The state change tracking unit is configured to track the change of the user state in real time according to the user state queue and the time information for triggering the first type of data, to obtain a user state feature;

获取单元，配置为获取待处理的候选数据和运算模型；An obtaining unit configured to acquire candidate data and an operation model to be processed;

运算单元，配置为将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数；An operation unit configured to use the user state feature and the candidate data to be processed as an input parameter of the operation model, and obtain an output parameter after the operation of the operation model;

发送单元，配置为根据所述输出参数得到推荐信息，发送所述推荐信息。The sending unit is configured to obtain recommendation information according to the output parameter, and send the recommendation information.

本申请实施例的一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述任一项所述方法的步骤。A computer readable storage medium of an embodiment of the present application, wherein a computer program is stored thereon, the computer program being executed by a processor to implement the steps of any of the methods described above.

存储器，配置为存储能够在处理器上运行的计算机程序；a memory configured to store a computer program capable of running on a processor;

处理器，配置为运行所述计算机程序时，执行如上述方案任一项所述方法的步骤。A processor, configured to perform the steps of the method of any of the above aspects, when the computer program is run.

本申请实施例的一种数据处理方法，所述方法由服务器执行，所述服务器包括有一个或多个处理器以及存储器，以及一个或一个以上的程序，其中，所述一个或一个以上的程序存储于存储器中，所述程序可以包括一个或一个以上的每一个对应于一组指令的单元，所述一个或多个处理器被配置为执行指令；所述方法包括：A data processing method of an embodiment of the present application, the method being performed by a server, the server including one or more processors and a memory, and one or more programs, wherein the one or more programs Stored in a memory, the program can include one or more units each corresponding to a set of instructions, the one or more processors being configured to execute instructions; the method comprising:

本申请实施例的信息安全处理方法，包括：收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据；根据所述第一类数据和所述第二类数据建立用户状态队列；根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；获取待处理的候选数据和运算模型；将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数；根据所述输出参数得到推荐信息，发送所述推荐信息。The information security processing method of the embodiment of the present application includes: collecting a real-time data stream, where the real-time data stream includes first type data representing user behavior and second type data representing user attention data itself; Generating a user state queue according to the class data and the second type of data; tracking the change of the user state in real time according to the user state queue and the time information triggering the first type of data, obtaining a user state feature; and acquiring candidate data to be processed And a computing model; the user state feature and the candidate data to be processed are used as input parameters of the computing model, and an output parameter is obtained after the operation of the computing model; and the recommended information is obtained according to the output parameter, and the sending station Recommended information.

采用本申请实施例，收集实时的数据流，根据实时的数据流建立用户状态队列，根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征，从而实现实时(如以实时的数据流为依据)对用户状态变化(如动态时间窗口)的动态把握(如队列中每一个用户状态的跟踪)，可以精确的定位到某个时刻对应的用户状态、或某个相对短时间段所对应的用户状态变化，即便用户状态变化较快、且有一定随机性的特性，仍然可以将其精确的捕获到，从而精准的为用户提供推荐信息，为用户提供更多精确的相关服务。According to the embodiment of the present application, a real-time data stream is collected, a user status queue is established according to the real-time data stream, and the user status is tracked according to the user status queue and the time information for triggering the first type of data, and the user status is obtained. Features, thus real-time (such as real-time data flow based) dynamic grasp of user state changes (such as dynamic time windows) (such as tracking of each user state in the queue), can accurately locate a corresponding moment The user state, or the user state change corresponding to a relatively short period of time, even if the user state changes rapidly and has a certain randomness characteristic, it can be accurately captured, thereby providing accurate recommendation information to the user. Provide users with more accurate and relevant services.

附图说明DRAWINGS

图1为本申请实施例中进行信息交互的各方硬件实体的示意图；1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present application;

图2为本申请实施例一方法的实现流程示意图；2 is a schematic diagram of an implementation process of a method according to Embodiment 1 of the present application;

图3为本申请实施例一***架构的示意图；3 is a schematic diagram of a system architecture according to Embodiment 1 of the present application;

图4为本申请实施例一服务器的硬件架构示意图；4 is a schematic diagram of a hardware architecture of a server according to Embodiment 1 of the present application;

图5为应用本申请实施例一点击率预估流程的示意图；FIG. 5 is a schematic diagram of a click rate estimation process according to an embodiment of the present application; FIG.

图6-10为应用本申请实施例的多个用户状态队列及其更新的示意图；6-10 are schematic diagrams of applying multiple user status queues and their updates in an embodiment of the present application;

图11为应用本申请实施例的用户状态存储流程图。FIG. 11 is a flow chart of user state storage to which an embodiment of the present application is applied.

具体实施方式detailed description

下面结合附图对技术方案的实施作进一步的详细描述。The implementation of the technical solution will be further described in detail below with reference to the accompanying drawings.

现在将参考附图描述实现本申请各个实施例的移动终端。在后续的描述中，使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本申请实施例的说明，其本身并没有特定的意义。因此，"模块"与"部件"可以混合地使用。A mobile terminal implementing various embodiments of the present application will now be described with reference to the accompanying drawings. In the following description, the suffixes such as "module", "component" or "unit" used to denote elements are merely illustrative of the embodiments of the present application, and do not have a specific meaning per se. Therefore, "module" and "component" can be used in combination.

在下面的详细说明中，陈述了众多的具体细节，以便彻底理解本申请。不过，对于本领域的普通技术人员来说，显然可在没有这些具体细节的情况下实践本申请。在其他情况下，没有详细说明公开的公知方法、过程、组件、电路和网络，以避免不必要地使实施例的各个方面模糊不清。In the following detailed description, numerous specific details are set forth in the However, it is apparent to those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks are not described in detail to avoid unnecessarily obscuring aspects of the embodiments.

另外，本文中尽管多次采用术语“第一”、“第二”等来描述各种元件(或各种阈值或各种应用或各种指令或各种操作)等，不过这些元件(或阈值或应用或指令或操作)不应受这些术语的限制。这些术语只是用于区分一个元件(或阈值或应用或指令或操作)和另一个元件(或阈值或应用或指令或操作)。例如，第一操作可以被称为第二操作，第二操作也可以被称为第一操作，而不脱离本申请的范围，第一操作和第二操作都是操作，只是二者并不是相同的操作而已。In addition, although the terms "first", "second", etc. are used herein to describe various elements (or various thresholds or various applications or various instructions or various operations), etc., these elements (or thresholds) Or application or instruction or operation) should not be limited by these terms. These terms are only used to distinguish one element (or threshold or application or instruction or operation) and another element (or threshold or application or instruction or operation). For example, the first operation may be referred to as a second operation, and the second operation may also be referred to as a first operation, without departing from the scope of the present application, the first operation and the second operation are operations, but the two are not the same The operation is only.

本申请实施例中的步骤并不一定是按照所描述的步骤顺序进行处理，可以按照需求有选择的将步骤打乱重排，或者删除实施例中的步骤，或者增加实施例中的步骤，本申请实施例中的步骤描述只是可选的顺序组合，并不代表本申请实施例的所有步骤顺序组合，实施例中的步骤顺序不能认为是对本申请的限制。The steps in the embodiments of the present application are not necessarily processed in the order of the steps described. The steps may be selectively arranged to be reordered according to requirements, or the steps in the embodiment may be deleted, or the steps in the embodiment may be added. The description of the steps in the application examples is only an optional combination of the steps, and does not represent a combination of the steps in the embodiments of the present application. The order of the steps in the embodiments is not to be construed as limiting the present application.

本申请实施例中的术语“和/或”指的是包括相关联的列举项目中的一个或多个的任何和全部的可能组合。还要说明的是：当用在本说明书中时，“包括/包含”指定所陈述的特征、整数、步骤、操作、元件和/或组件的存在，但是不排除一个或多个其他特征、整数、步骤、操作、元件和/或组件和/或它们的组群的存在或添加。The term "and/or" in the embodiments of the present application refers to any and all possible combinations including one or more of the associated listed items. It is also to be understood that the phrase "including/comprising", when used in the specification, is intended to mean the presence of the stated features, integers, steps, operations, components and/or components, but does not exclude one or more other features, integers The presence or addition of steps, operations, elements and/or components and/or groups thereof.

本申请实施例的智能终端(如移动终端)可以以各种形式来实施。例如，本申请实施例中描述的移动终端可以包括诸如移动电话、智能电话、笔记本电脑、数值广播接收器、个人数值助理(PDA，Personal Digital Assistant)、平板电脑(PAD)、便携式多媒体播放器(PMP，Portable Media Player)、导航装置等等的移动终端以及诸如数值TV、台式计算机等等的固定终端。下面，假设终端是移动终端。然而，本领域技术人员将理解的是，除了特别用于移动目的的元件之外，根据本申请的实施方式的构造也能够应用于固定类型的终端。The intelligent terminal (such as a mobile terminal) of the embodiment of the present application can be implemented in various forms. For example, the mobile terminal described in the embodiments of the present application may include, for example, a mobile phone, a smart phone, a notebook computer, a numerical broadcast receiver, a personal digital assistant (PDA, Personal Digital Assistant), a tablet (PAD), a portable multimedia player ( Mobile terminals such as PMP (Portable Media Player), navigation devices, and the like, and fixed terminals such as numeric TVs, desktop computers, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, those skilled in the art will appreciate that configurations in accordance with embodiments of the present application can be applied to fixed type terminals in addition to components that are specifically for mobile purposes.

随着互联网的普及，用户间可以很方便的通过互联网进行各种信息交互。需要从信息交互中发掘出用户所关注的信息，以便以该用户所关注的信息为基础为用户提供更多相关的服务。比如，在信息发掘的过程中，可以对用户状态等信息进行分析。所述用户状态指用户当下的兴趣、偏好，通常变化较快而又有一定随机性。如果能精确的捕获到用户状态，就可以为用户提供精准的服务。With the popularity of the Internet, users can easily exchange various information through the Internet. It is necessary to discover the information that the user pays attention from the information interaction, so as to provide more related services to the user based on the information that the user pays attention to. For example, in the process of information mining, information such as user status can be analyzed. The user status refers to the current interests and preferences of the user, and usually changes quickly and has certain randomness. If the user status can be accurately captured, the user can be provided with accurate services.

由于用户状态通常变化较快且有一定随机性，因此，难以通过目前的相关技术精确的捕获到用户状态，从而无法精准的为用户推荐信息。一种方式是：使用固定时间长度的用户历史行为数据建立用户画像，可是，用户画像指用户长期的兴趣、喜好，只能描述倾向于长期稳定。无论对该固定时间段的选取时间是长或短，都无法精确的定位到某个时刻对应的用户状态、或某个相对短时间段所对应的用户状态。这与希望捕获用户状态变化和对它进行描述的初衷是不一致的。Since the user state usually changes quickly and has some randomness, it is difficult to accurately capture the user state through the current related technology, so that the information cannot be accurately recommended for the user. One way is to create a user portrait using user history behavior data of a fixed length of time. However, the user portrait refers to the user's long-term interest and preference, and can only describe a tendency to be stable for a long time. Regardless of whether the selection time of the fixed time period is long or short, it is impossible to accurately locate the user state corresponding to a certain time or the user state corresponding to a relatively short time period. This is inconsistent with the original intention of capturing user state changes and describing them.

图1为本申请实施例中进行信息交互的各方硬件实体的示意图，图1中包括：终端1、服务器2。终端1可以由多个终端11-13组成，通过无线或有线方式与服务器2进行信息交互。图1中的服务器个数仅仅起指代作用，并不限制服务器的个数。FIG. 1 is a schematic diagram of hardware entities of each party performing information interaction in the embodiment of the present application. FIG. 1 includes: a terminal 1 and a server 2. The terminal 1 can be composed of a plurality of terminals 11-13 for performing information exchange with the server 2 by wireless or wired means. The number of servers in Figure 1 is only for reference, and does not limit the number of servers.

在信息发掘的过程中，可以对用户状态等信息进行分析。所述用户状态指用户当下的兴趣、偏好，通常变化较快而又有一定随机性。如果能精确的捕获到用户状态，就可以为用户提供精准的服务，尤其适用于信息推荐场景。由于用户状态通常变化较快且有一定随机性，比如，在天宫一号空间站发射升空时，平时不喜爱航空航天的人，也会在这个时间点关注中国航空航天；比如，在美国大选时，平时不关注国际时政的人，也会关注大选结果，因此，既能敏感的捕捉用户状态变化、又能准确的描述用户状态十分困难，也就是说，难以通过现有技术精确的捕获到用户状态并对它进行描述。因此，难以通过相关技术精确的捕获到用户状态，从而无法精准的为用户推荐信息。也就是说，信息推荐场景中，想要获取到用户状态，但是，相关技术达不到，因此，只能退而求其次，获取用户画像并进行分析，而用户画像指用户长期的兴趣、喜好，倾向长期稳定。可见，这个分析结果并不是用户希望得到的数据处理结果，也不是最好的处理结果。用户的目的始终是希望得到用户状态，更精确的描述出用户状态。In the process of information mining, information such as user status can be analyzed. The user status refers to the current interests and preferences of the user, and usually changes quickly and has certain randomness. If the user status can be accurately captured, the user can be provided with accurate services, especially for information recommendation scenarios. Since the user status usually changes quickly and has some randomness, for example, when the Tiangong-1 space station is launched, people who do not like aerospace will usually pay attention to China's aerospace at this time; for example, during the US election. People who do not pay attention to international politics will also pay attention to the results of the election. Therefore, it is very difficult to accurately capture the changes of the user's state and accurately describe the user's state. That is to say, it is difficult to accurately capture the user through the prior art. State and describe it. Therefore, it is difficult to accurately capture the user state through the related technology, so that it is impossible to accurately recommend information to the user. That is to say, in the information recommendation scenario, the user state is acquired, but the related technology cannot be obtained. Therefore, the user can only obtain the user image and analyze it, and the user portrait refers to the user's long-term interest and preference. , tend to be stable for a long time. It can be seen that the result of this analysis is not the result of the data processing that the user wishes to obtain, nor is it the best result. The purpose of the user is always to get the user status and more accurately describe the user status.

在信息推荐场景或广告投放等场景，如果能准确的捕捉用户状态，从技术角度来说，对信息推荐和广告投放的精准度会有很大的帮助，处理效率也会得到改善。从产品应用的角度来说，无论对提升信息推荐的预估点击率，还是用户得到更精准的信息都有巨大的帮助。可是，相关技术中，是使用不同周期的用户行为建立用户画像，但时间周期往往很难选择。如果选择用户行为的时间周期过长，例如一个月或半个月内的行为建立用户画像，那么无法敏感的获得用户状态的变化；如果选择用户行为的时间周期过短，例如几个小时内的行为建立用户画像，那么可能导致数据量过少，无法准确的描述用户当下状态。采用本申请实施例，由服务器执行处理逻辑10，如图1所示。处理逻辑10，包括：S1、根据第一类数据和第二类数据建立用户状态队列，以时间信息标识队列中的每一个用户状态信息；S2、根据用户状态队列得到用户状态特征；S3、将用户状态特征、候选数据输入运算模型，输出数据处理结果，将数据处理结果发送给终端。本申请实施例是基于动态时间窗口的用户状态队列来表示用户状态，能准确的捕获到用户状态并对它进行描述，从而以此为基础解决了相关技术无法解决的上述问题。In the scenario of information recommendation or advertisement delivery, if the user status can be accurately captured, from the technical point of view, the accuracy of information recommendation and advertisement delivery will be greatly improved, and the processing efficiency will be improved. From the perspective of product application, it is of great help to improve the estimated click rate of information recommendation or to get more accurate information. However, in the related art, user images are created using user behaviors of different periods, but time periods are often difficult to select. If the time period of selecting user behavior is too long, for example, the behavior of the user to establish a user portrait within one month or half a month, the change of the user state cannot be sensitively obtained; if the time period of selecting the user behavior is too short, for example, within a few hours The behavior of creating a user portrait may result in too little data and an accurate description of the user's current state. With the embodiment of the present application, the processing logic 10 is executed by the server, as shown in FIG. The processing logic 10 includes: S1, establishing a user status queue according to the first type of data and the second type of data, and identifying each user status information in the queue with time information; S2, obtaining a user status feature according to the user status queue; S3, The user state feature, the candidate data input operation model, output the data processing result, and send the data processing result to the terminal. The embodiment of the present application is based on a user time queue of a dynamic time window to represent a user state, and can accurately capture the user state and describe it, thereby solving the above problems that cannot be solved by related technologies.

本申请实施例中，还可以根据用户行为发生的频率，动态的调整收集用户行为的时间窗口，从而兼顾到对用户变化的敏感性和用户状态表示的准确性。In the embodiment of the present application, the time window for collecting user behaviors can be dynamically adjusted according to the frequency of occurrence of user behavior, thereby taking into account the sensitivity to user changes and the accuracy of user state representation.

上述图1的例子只是实现本申请实施例的一个***架构实例，本申请实施例并不限于上述图1所述的***结构，基于上述图1所述的***架构，提出本申请方法各个实施例。The example of FIG. 1 is only a system architecture example of the embodiment of the present application. The embodiment of the present application is not limited to the system structure described in FIG. 1 , and various embodiments of the method of the present application are proposed based on the system architecture described in FIG. 1 . .

本申请实施例的数据处理方法，如图2所示，所述方法包括：收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据(101)。其中，第一类数据可以为针对所关注的数据产生的各种用户行为，包括用户点击浏览新闻、评论、收藏、转发等；第二类数据可以是用户所关注的数据，比如一段新闻、一个视频、一本小说等。本文中，可以将第二类数据统称为“曝光数据”。The data processing method of the embodiment of the present application, as shown in FIG. 2, includes: collecting a real-time data stream, where the real-time data stream includes a first type of data representing a user behavior and a second character representing the user's attention data itself. Class data (101). The first type of data may be various user behaviors generated for the data of interest, including the user clicking to browse news, comments, favorites, forwarding, etc.; the second type of data may be data of interest to the user, such as a piece of news, a Video, a novel, etc. In this article, the second type of data can be collectively referred to as "exposure data."

本申请实施例中，收集实时的数据流，可以采用分布式流式处理框架(如Spark Streaming)进行数据收集。使用Spark Streaming作为该流式数据处理框架的好处是：该流式数据处理框架作为建立在Spark上的实时计算框架，扩展了Spark处理大规模流式数据的能力，通过该流式数据处理框架提供的丰富的应用编程接口(API)、基于内存的高速执行引擎，用户可以结合流式、批处理和交互试查询进行数据处理，可以满足实时性要求较高的业务，如实时推荐、用户行为分析等。由于该流式数据处理框架可以快速在内存中对数据集进行多次迭代，因此，支持复杂的数据挖掘算法和图形计算算法。除了将Spark Streaming作为本申请实施例中的流式数据处理框架，使用其他流式处理框架也可以实现相同的功能，也在本申请的保护范围之内。In the embodiment of the present application, a real-time data stream is collected, and a distributed streaming processing framework (such as Spark Streaming) may be used for data collection. The advantage of using Spark Streaming as the streaming data processing framework is that the streaming data processing framework, as a real-time computing framework built on Spark, extends Spark's ability to handle large-scale streaming data through the streaming data processing framework. Rich application programming interface (API), memory-based high-speed execution engine, users can combine streaming, batch processing and interactive trial query for data processing, which can meet the requirements of real-time requirements, such as real-time recommendation, user behavior analysis Wait. Because the streaming data processing framework can quickly iterate over the data set in memory, it supports complex data mining algorithms and graphics computing algorithms. In addition to using Spark Streaming as the streaming data processing framework in the embodiments of the present application, the same functions can be implemented using other streaming processing frameworks, and are also within the scope of the present application.

本申请实施例的数据处理方法，包括：根据所述第一类数据和所述第二类数据建立用户状态队列(102)；根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征(103)。具体的，可以根据触发所述第一类数据的时间信息，标识位于所述用户状态队列中的每一个用户状态信息，得到所述用户状态队列中以动态时间窗口表示的每一个用户状态(103a)。用户状态队列的结构中，是以行为的发生时间来标识位于所述用户状态队列中的每一个用户状态信息。该用户状态队列的结构中，除了包括行为的发生时间，还包括行为内容。具体的，行为内容记录了用户看到的新闻id以及具体发生了什么行为，例如只是看了新闻1的标题没有点击进入具体的详情页，则记录为“新闻1:曝光”。而行为的发生时间，则记录发生行为的具体时间。The data processing method of the embodiment of the present application includes: establishing a user status queue according to the first type of data and the second type of data (102); and time information according to the user status queue and triggering the first type of data The user status is tracked in real time to obtain a user status feature (103). Specifically, each user status information located in the user status queue may be identified according to time information that triggers the first type of data, and each user status represented by a dynamic time window in the user status queue is obtained (103a) ). In the structure of the user status queue, each user status information located in the user status queue is identified by the occurrence time of the behavior. The structure of the user status queue includes the behavior content in addition to the occurrence time of the behavior. Specifically, the behavior content records the news id that the user sees and what behavior has occurred specifically. For example, if the title of the news 1 is not clicked and entered into a specific detail page, it is recorded as “News 1: Exposure”. When the behavior occurs, the specific time when the behavior occurs is recorded.

本申请实施例中，所述用户状态队列可以采用动态时间窗口的结构体来实现，则所述用户状态队列也可以称为动态时间窗口用户状态队列。该队列中所包含的两种数据：1)行为的发生时间；2)行为内容，其中，该行为的发生时间可以通过表征用户行为的第一类数据来实现，记录触发该用户行为的时间；该行为内容可以通过表征用户行为的第一类数据和表征用户关注数据自身的第二类数据来实现，记录针对该用户关注数据自身，具体发生了何种用户行为，比如，用户看到的一段视频后具体发生了什么行为(如点赞、评论或转发等等)。In the embodiment of the present application, the user status queue may be implemented by using a dynamic time window structure, and the user status queue may also be referred to as a dynamic time window user status queue. The two types of data contained in the queue are: 1) the time when the behavior occurs; 2) the content of the behavior, wherein the occurrence time of the behavior can be realized by the first type of data representing the behavior of the user, and the time at which the behavior of the user is triggered is recorded; The content of the behavior can be realized by the first type of data representing the behavior of the user and the second type of data representing the user's own data, recording the user's attention to the data itself, and what kind of user behavior occurs, for example, a section seen by the user. What happens after the video (such as likes, comments, or forwarding, etc.).

根据所述用户状态队列得到用户状态特征(103b)。对用户状态进行分析，可以得到用户状态特征。该用户状态特征用于描述用户当下兴趣、偏好主观感受的信号或变量。一个示例中，用户状态特征是根据用户状态队列生成的，可以分为两类特征。1)属性统计类特征；2)反馈类特征。具体的，对于属性统计类特征，是将用户状态队列中所有新闻id对应的，新闻一级分类、二级分类、关键词、标签、主题、标题分次取出，并按不同的行为权重做累加，例如有点击行为的权重为1，收藏行为的权重为1.5，转发行为的2，未发生行为的为0等。以此方法，分别取权重累加top5的新闻一级分类、二级分类、关键词、标签、标题分次作为用户偏好特征。对于反馈类特征，还可以分为正反馈特征和负反馈特征。其中，正反馈特征的一个示例为：将用户近半小时内、取发生点击行为时间离当前最近的20条新闻(若超过20条取20条，若不足20条全部取)的一级分类、二级分类、关键词、标题、主题、标题分次，按时间排序取top5，作为用户的正反馈特征。负反馈特征的一个示例为：将用户近半小时内，所有曝光且为点击新闻的一级分类、二级分类、关键词、标签、主题、标题分次等，按出现次数累加，取top10作为特征。A user status feature is obtained based on the user status queue (103b). By analyzing the user status, the user status feature can be obtained. The user state feature is used to describe a signal or variable of the user's current interest, preference for subjective experience. In one example, user state characteristics are generated based on user state queues and can be divided into two types of features. 1) attribute statistics class features; 2) feedback class features. Specifically, for the attribute statistics class feature, all the news ids in the user status queue are corresponding, and the news level classification, the second level classification, the keyword, the label, the topic, and the title are taken out in stages, and are accumulated according to different behavior weights. For example, the weight of the click behavior is 1, the weight of the collection behavior is 1.5, the forwarding behavior is 2, and the non-behavior is 0. In this way, the news level classification, the secondary classification, the keyword, the label, and the title ranking of top5 are respectively taken as weights as user preference features. For feedback class features, it can also be divided into positive feedback features and negative feedback features. An example of the positive feedback feature is: a first-level classification that takes the user's 20 recent news (if more than 20 are taken, if less than 20 are taken) within about half an hour, Secondary classification, keywords, title, topic, title, and top5 are sorted by time as the positive feedback feature of the user. An example of the negative feedback feature is: the user will be in the first half hour, all exposures and the first-level classification, secondary classification, keywords, labels, topics, title rankings, etc. of the click news, according to the number of occurrences, take top10 as feature.

也就是说，对新闻分类对应不同的行为权重(点击、分享、转发)统计后，按照优先级排序，获取排序靠前的新闻，这些排序靠前的新闻属于新闻敏感度高的新闻。将这些排序靠前的新闻中，新闻敏感度top5的新闻按照时间的维度进行排序，得到的特征就是正反馈特征；将这些排序靠前的新闻中，新闻敏感度top5的新闻按照次数的维度进行排序，得到的特征就是负反馈。进一步的，本申请实施例还可以包括筛选排序等等。That is to say, after the news classification corresponds to different behavior weights (click, share, forward) statistics, the news ranked first is sorted according to the priority order, and the top ranked news belongs to the news with high news sensitivity. In these top-ranking news, the news sensitivity top5 news is sorted according to the time dimension, and the obtained feature is the positive feedback feature; in the top news, the news sensitivity top5 news is in the dimension of the number of times. Sorting, the resulting feature is negative feedback. Further, the embodiment of the present application may further include screening ordering and the like.

本申请实施例的数据处理方法，包括：获取待处理的候选数据和运算模型(104)。将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数(105)。以信息推荐场景为例，该运算模型可以为点击率预估模型，将得到的用户状态特征经点击率预估模型运算后，可以得到点击率。本实施例并不限于信息推荐场景，还适用于广告投放、搜索排放场景等等。根据所述输出参数得到数据处理结果，比如，推荐的新闻、视频、小说等等。根据所述输出参数得到推荐信息，将所述推荐信息发送给终端(106)。推荐信息是基于本申请实施例的运算模型计算得到数据处理结果中的一种信息。The data processing method of the embodiment of the present application includes: acquiring candidate data to be processed and an operation model (104). The user state feature and the candidate data to be processed are used as input parameters of the operation model, and an output parameter is obtained after the operation of the operation model (105). Taking the information recommendation scenario as an example, the operation model can be a click rate estimation model, and the obtained user state feature can be obtained by the click rate estimation model to obtain the click rate. This embodiment is not limited to the information recommendation scenario, but is also applicable to advertisement placement, search for emission scenarios, and the like. Data processing results are obtained based on the output parameters, such as recommended news, videos, novels, and the like. The recommendation information is obtained according to the output parameter, and the recommendation information is sent to the terminal (106). The recommendation information is one of the data processing results calculated based on the calculation model of the embodiment of the present application.

采用本申请实施例，由于能对用户状态队列中的每一个用户状态信息以时间信息进行标识，从而可以使用具备动态时间窗口的用户状态队列来表示用户状态，以精确的定位到某个时刻对应的用户状态、或某个相对短时间段所对应的用户状态变化，符合用户初衷。以由用户状态队列得到的用户状态特征为基础，通过运算模型的运算可以得到精确的数据处理结果，发送数据处理结果给终端，为用户提供更多精确的相关服务。According to the embodiment of the present application, since each user status information in the user status queue can be identified by time information, the user status queue with a dynamic time window can be used to represent the user status, so as to accurately locate the corresponding time. The user state, or the user state change corresponding to a relatively short period of time, conforms to the original intention of the user. Based on the user state characteristics obtained by the user state queue, the operation of the calculation model can obtain accurate data processing results, and send data processing results to the terminal to provide more accurate related services for the user.

本申请实施例的一个示例中，可以通过分布式流式数据处理框架(例如spark streaming或storm)接入实时的新闻曝光数据(用户看到哪些新闻)以及用户实时行为数据(用户点击浏览新闻、评论、收藏、转发等)。对这些实时数据进行处理，为每一位用户建立一个用户状态信息队列，供后续计算用户特征使用。之后，根据之前收集的用户状态队列，计算用户状态特征。最后，将新闻候选集、用户状态特征等信号量输入运算模型，如点击率预估模型，也就是说，是将计算出的用户状态特征，连同带待荐的新闻信息以及用户基本属性等，输入该点击率预估模型(例如逻辑回归或分解机)中，输出每一条待推荐新闻的预估点击率，使点击率预估准确度得到大幅提高。根据每一条待推荐新闻的预估点击率，计算用户最可能喜欢的新闻，生成最终的个性化新闻推荐结果，将最终的推荐新闻展现给用户。In one example of an embodiment of the present application, real-time news exposure data (which news the user sees) and real-time behavior data of the user can be accessed through a distributed streaming data processing framework (eg, spark streaming or storm) (the user clicks to browse the news, Comments, favorites, forwarding, etc.). These real-time data are processed to create a user status information queue for each user for subsequent calculation of user characteristics. The user status feature is then calculated based on the previously collected user status queue. Finally, the semaphores such as the news candidate set and the user state feature are input into the operation model, such as the click rate estimation model, that is, the calculated user state characteristics, together with the news information to be recommended and the basic attributes of the user, etc. Enter the click rate estimation model (such as logistic regression or decomposition machine), and output the estimated click rate of each news to be recommended, so that the click rate estimation accuracy is greatly improved. According to the estimated click rate of each news to be recommended, the news that the user is most likely to like is calculated, the final personalized news recommendation result is generated, and the final recommended news is presented to the user.

本申请实施例中，收集所述实时的数据流时得到新增的第一类数据，如新增的用户行为数据后，提取当前用户状态队列，在所述当前用户状态队列中，查询到与新增的第一类数据对应的第二类数据(如曝光数据，该新增的第一类数据和与其匹配的第二类数据，可以针对同一个新闻，比如，第一类数据为针对新闻1的点击行为，第二类数据为针对新闻1的曝光数据)。从所述当前用户状态队列中，删除所述第二类数据所在的用户状态信息，将位于被删除用户状态信息后的所有用户状态信息进行位置依次前移，对所述当前用户状态队列进行更新。将新增的第一类数据添加入更新后的用户状态队列的队尾部。本申请实施例的一个示例中，可以不停的用实时新闻曝光数据和实时用户行为数来填充队列。队列的数据更新针对用户行为数据，是新来一条用户行为数据时，需要先在用户状态队列里找到其对应的曝光数据，并且将该条曝光数据移出用户状态队列，将用户状态队列中排在该元素后的内容位置依次前移，最后将该条用户行为数据***用户状态队列的末尾。比如，找到新闻1在用户状态队列对应的曝光数据，具体为与新增用户行为数据对应的曝光数据，并将其删除。如何查找与新增用户行为数据对应的曝光数据，一是从数据自身看，比如，在用户状态队列的用户状态信息中，都是针对“新闻1”，二是从针对该数据的行为发生时间看，在时间关系上呈现时间顺序，比如，新增用户行为数据是针对“新闻1”在“2015年10月21日13:45:20秒发生的，曝光数据是针对“新闻1”在2015年10月21日13:45:11秒发生的。将删除元素后的所有元素位置一次迁移，这里，在用户状态队列中不区分内容，可以按照用户点击的时间进行顺序。最后，将新增用户行为数据***新用户状态队列的队尾部。本文中，用户状态队列中的“元素”，即为构成用户状态队列的多个用户状态信息。In the embodiment of the present application, when the real-time data stream is collected, the first type of data is added, for example, after the new user behavior data is added, the current user status queue is extracted, and in the current user status queue, the query is The second type of data corresponding to the first type of data added (such as exposure data, the newly added first type of data and the second type of data matched with it can be directed to the same news, for example, the first type of data is for news 1 click behavior, the second type of data is exposure data for News 1). Deleting the user status information of the second type of data from the current user status queue, and all the user status information after the deleted user status information is forwarded in order, and updating the current user status queue. . Add the new first type of data to the end of the queue of the updated user status queue. In one example of an embodiment of the present application, the queue can be populated with real-time news exposure data and real-time user behavior numbers. The data update of the queue is for the user behavior data. When a new user behavior data is obtained, the corresponding exposure data needs to be found in the user status queue, and the exposure data is moved out of the user status queue, and the user status queue is ranked. The content position after the element is moved forward in turn, and finally the user behavior data is inserted into the end of the user status queue. For example, find the exposure data corresponding to the news state queue of the news 1 , specifically the exposure data corresponding to the newly added user behavior data, and delete it. How to find the exposure data corresponding to the newly added user behavior data, one is from the data itself, for example, in the user status information of the user status queue, it is for "News 1", and the second is the time from the behavior for the data. Look, the time sequence is presented in time relationship. For example, the new user behavior data is for "News 1" at "13:45:20 seconds on October 21, 2015, and the exposure data is for "News 1" in 2015. It occurs at 13:45:11 on October 21st. All the elements after the element is deleted are migrated once. Here, the content is not distinguished in the user status queue, and can be sorted according to the time the user clicks. Finally, it will be added. User behavior data is inserted into the tail of the new user status queue. In this paper, the "element" in the user status queue is the multiple user status information that constitutes the user status queue.

本申请实施例中，收集所述实时的数据流时得到新增的第二类数据，如新增的曝光数据后，提取当前用户状态队列，将新增的第二类数据直接添加入所述当前用户状态队列的队尾部，对所述当前用户状态队列进行更新。本申请实施例的一个示例中，可以不停的用实时新闻曝光数据和实时用户行为数来填充队列。队列的数据更新针对实时新闻曝光数据时，在数据流实时更新以得到新增的该曝光数据之前，已经根据在先第一类数据和在先第二类数据建立了原有的用户状态队列，即当前用户状态队列，则针对新增的该曝光数据进行队列数据更新的过程中，直接将该曝光数据填入当前用户状态队列的队尾部。In the embodiment of the present application, when the real-time data stream is collected, a second type of data is added, and after the new exposure data is added, the current user status queue is extracted, and the newly added second type data is directly added to the The tail of the current user status queue updates the current user status queue. In one example of an embodiment of the present application, the queue can be populated with real-time news exposure data and real-time user behavior numbers. When the data update of the queue is for real-time news exposure data, the original user state queue has been established according to the first type of data and the previous second type of data before the data stream is updated in real time to obtain the newly added exposure data. That is, the current user status queue, in the process of updating the queue data for the newly added exposure data, directly fills the exposure data into the tail of the current user status queue.

本申请实施例中，可以建立实时流式数据连接，实时获取到用户看到哪些新闻的曝光数据，以及用户在点击进入哪些新闻的详情页进行浏览、或者评论过哪些新闻、对哪些新闻做过转发操作的用户行为数据。可以不断使用新闻曝光实时数据和用户行为实时数据，来填充用户状态。In the embodiment of the present application, a real-time streaming data connection can be established, and the exposure data of which news the user sees is obtained in real time, and the details page of the news clicked by the user to browse, or which news has been commented, and which news has been performed. User behavior data for forwarding operations. User status can be populated by continuously using real-time data of news exposure and real-time data of user behavior.

本申请实施例中，一方面，如果是老用户，则其用户状态队列已经存储于存储介质中，这里，该存储介质包括各种类型的数据库。则对于老用户，直接从存储介质中取出用户状态队列。另一方面，如果是新用户，则新建一个用户状态队列(或称为动态时间窗口用户状态表示队列)。可以根据当前的数据，以及队列变化情况来更新用户状态队列。最终，将更新完毕的用户状态队列写入存储介质中。In the embodiment of the present application, on the one hand, if it is an old user, its user status queue has been stored in a storage medium, where the storage medium includes various types of databases. For the old user, the user status queue is taken directly from the storage medium. On the other hand, if it is a new user, create a new user status queue (or called a dynamic time window user status representation queue). The user status queue can be updated based on current data and queue changes. Finally, the updated user status queue is written to the storage medium.

本申请实施例中，配置了队列更新策略。一方面，针对队列长度，所述第一类数据(新增的用户行为数据)和/或第二类数据(新增的曝光数据)实时更新时，触发对所述当前用户状态队列的更新。所述当前用户状态队列的队列长度达到第一阈值时，将处于所述当前用户状态队列中位置靠前的用户状态信息依次从所述当前用户状态队列中删除，将处于所述当前用户状态队列中位置靠后的用户状态信息依次前移，将实时更新的数据***所述当前用户状态队列的队尾部。另一方面，针对队列有效性，所述第一类数据(新增的用户行为数据)和/或第二类数据(新增的曝光数据)实时更新时，触发对所述当前用户状态队列的更新。获取处于所述当前用户状态队列中队尾部的用户状态信息所对应的第一时间信息，根据所述第一时间信息与当前时间信息的差值来判断所述当前用户状态队列中队列内容的有效性，若所述差值大于第二阈值，则对所述当前用户状态队列中的所有用户状态信息进行清空。In this embodiment of the present application, a queue update policy is configured. In one aspect, for the queue length, when the first type of data (new user behavior data) and/or the second type of data (new exposure data) are updated in real time, an update to the current user status queue is triggered. When the queue length of the current user status queue reaches the first threshold, the user status information in the current user status queue is deleted from the current user status queue in turn, and is in the current user status queue. The user status information in the lower position is advanced in turn, and the data updated in real time is inserted into the tail of the current user status queue. On the other hand, for the queue validity, when the first type of data (new user behavior data) and/or the second type of data (new exposure data) are updated in real time, triggering on the current user status queue Update. Obtaining first time information corresponding to user status information at the end of the current user status queue squad, determining validity of the queue content in the current user status queue according to the difference between the first time information and the current time information And if the difference is greater than the second threshold, clearing all user state information in the current user state queue.

本申请实施例中，就上述用户状态队列的该队列更新策略而言，该队列更新策略需要保证对用户兴趣、偏好的变化的敏感性，又要保证准确性，为此，从队列长度和队列有效性时间两方面来保证上述两点。也就是说，对该用户状态队列，在队列长度和队列内容有效期上都可以设置限制，或者，对其中一个方面进行限制。In the embodiment of the present application, in terms of the queue update policy of the user status queue, the queue update policy needs to ensure sensitivity to changes in user interests and preferences, and accuracy is ensured. For this reason, the queue length and queue are The validity time guarantees the above two points in two aspects. That is to say, for the user status queue, a limit can be set on the queue length and the validity period of the queue content, or one of the aspects can be restricted.

比如，1)队列长度的限制，保证了对用户状态把握的准确性与时效性，当用户刷新频率较快时，队列保留的是用户短期100条浏览记录，当用户刷新频率较慢时，保留的是用户中长期100条浏览记录，这样就实现了一个动态时间窗口的机制，使得用户状态的表示随用户行为频率变化而变化。具体的，根据用户行为数据的统计，可以选取100作为队列长度。新增数据到来要填充队列时，如果队列已中已有100个元素，则需要将最早到达的元素剔除队列，并将后边的元素依次迁移，再将新增的元素***队列末尾。For example, 1) the limitation of the queue length ensures the accuracy and timeliness of the user state. When the user refresh rate is fast, the queue retains the user's short-term 100 browsing records. When the user refresh rate is slow, the reservation is retained. It is the user's medium and long-term 100 browsing records, which implements a dynamic time window mechanism, so that the representation of the user state changes as the user's behavior frequency changes. Specifically, according to the statistics of the user behavior data, 100 can be selected as the queue length. When new data arrives to fill the queue, if there are already 100 elements in the queue, you need to cull the earliest arriving elements, and then move the elements behind, and then add the new elements to the end of the queue.

比如，2)队列内容有时间限制，队列最后一条数据与当前时间超过24小时时，说明用户的使用发生了间隔，在这个时间段认为用户的兴趣偏好是不可感知的，因此，为了使过去的数据对未来用户兴趣偏好预测不产生影响，可以清空队列，以保证用户状态的准确。具体的，在时间上限制队列内容的时效性，当队列最后一个元素的发生时间与当前时间相差大于24小时情况下，将清空队列内容。For example, 2) the queue content has a time limit. When the last data of the queue and the current time exceeds 24 hours, the user's usage is separated. During this time period, the user's interest preference is considered imperceptible. Therefore, in order to make the past The data does not affect the future user interest preference prediction, and the queue can be cleared to ensure the accuracy of the user status. Specifically, the time limit of the queue content is limited in time. When the time of occurrence of the last element of the queue differs from the current time by more than 24 hours, the queue content is cleared.

需要指出的是，本申请实施例的适用场景有很多，比如新闻推荐场景、点击率预估场景、广告投放场景、搜索排序场景等，均适用于上述本申请实施例中的步骤、思想或处理逻辑。It should be noted that there are many applicable scenarios in the embodiments of the present application, such as a news recommendation scenario, a click-through rate estimation scenario, an advertisement delivery scenario, a search ranking scenario, and the like, which are applicable to the steps, ideas, or processes in the foregoing embodiments of the present application. logic.

本申请实施例的数据处理***，如图3所示，包括终端41和服务器42，终端41通过无线或有线方式与服务器42进行信息交互。服务器42从终端41收集到实时的数据流后，根据实时数据流中的第一类数据和第二类数据建立用户状态队列，根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征。具体的，以时间信息标识队列中的每一个用户状态信息。根据用户状态队列得到用户状态特征，将用户状态特征、候选数据输入运算模型，输出数据处理结果(如推荐信息)，将数据处理结果(如推荐信息)发送给终端41。推荐信息是基于本申请实施例的运算模型计算得到数据处理结果中的一种信息。The data processing system of the embodiment of the present application, as shown in FIG. 3, includes a terminal 41 and a server 42, and the terminal 41 performs information interaction with the server 42 by wireless or wired. After collecting the real-time data stream from the terminal 41, the server 42 establishes a user status queue according to the first type of data and the second type of data in the real-time data stream, according to the user status queue and time information for triggering the first type of data. Track changes in user status in real time to get user status characteristics. Specifically, each user status information in the queue is identified by time information. The user status feature is obtained according to the user status queue, the user status feature and the candidate data are input into the operation model, the data processing result (such as recommendation information) is output, and the data processing result (such as recommendation information) is sent to the terminal 41. The recommendation information is one of the data processing results calculated based on the calculation model of the embodiment of the present application.

本申请实施例中，服务器42包括：In the embodiment of the present application, the server 42 includes:

收集单元421，配置为收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据。队列建立单元422，配置为根据所述第一类数据和所述第二类数据建立用户状态队列。状态变化跟踪单元423，配置为根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；具体的，状态变化跟踪单元423还包括：用户状态描述子单元，配置为根据触发所述第一类数据的时间信息，标识位于所述用户状态队列中的每一个用户状态信息，得到所述用户状态队列中以动态时间窗口表示的每一个用户状态。以及特征确定子单元，配置为根据所述用户状态队列得到用户状态特征。获取单元424，配置为获取待处理的候选数据和运算模型。运算单元425，配置为将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数。发送单元426，配置为根据所述输出参数得到数据处理结果，将所述数据处理结果发送给终端。The collecting unit 421 is configured to collect a real-time data stream, the real-time data stream includes a first type of data characterizing the user behavior and a second type of data characterizing the user's attention data itself. The queue establishing unit 422 is configured to establish a user status queue according to the first type of data and the second type of data. The state change tracking unit 423 is configured to track the change of the user state in real time according to the user state queue and the time information of the first type of data, and obtain the user state feature. Specifically, the state change tracking unit 423 further includes: a user a status description subunit, configured to identify each user status information in the user status queue according to time information for triggering the first type of data, and obtain each user in the user status queue represented by a dynamic time window status. And a feature determining subunit configured to obtain a user state feature according to the user state queue. The obtaining unit 424 is configured to acquire candidate data and an operation model to be processed. The operation unit 425 is configured to use the user state feature and the candidate data to be processed as input parameters of the operation model, and obtain an output parameter after the operation of the operation model. The sending unit 426 is configured to obtain a data processing result according to the output parameter, and send the data processing result to the terminal.

本申请实施例一实施方式中，所述收集单元，进一步配置为：收集所述实时的数据流时得到新增的第二类数据。所述服务器还包括：提取单元，配置为提取当前用户状态队列；更新单元，配置为将新增的第二类数据直接添加入所述当前用户状态队列的队尾部，对所述当前用户状态队列进行更新。In an embodiment of the present application, the collecting unit is further configured to: when the real-time data stream is collected, obtain a second type of data that is added. The server further includes: an extracting unit configured to extract a current user status queue; and an updating unit configured to directly add the added second type of data to the tail of the current user status queue, to the current user status queue Update.

本申请实施例一实施方式中，所述收集单元，进一步配置为：收集所述实时的数据流时得到新增的第一类数据。所述服务器还包括：提取单元，配置为提取当前用户状态队列；更新单元，配置为：在所述当前用户状态队列中，查询到与新增的第一类数据对应的第二类数据；从所述当前用户状态队列中，删除所述第二类数据所在的用户状态信息；将位于被删除用户状态信息后的所有用户状态信息进行位置依次前移，对所述当前用户状态队列进行更新；将新增的第一类数据添加入更新后的用户状态队列的队尾部。In an embodiment of the present application, the collecting unit is further configured to: obtain the newly added first type data when collecting the real-time data stream. The server further includes: an extracting unit configured to extract a current user status queue; and an updating unit configured to: query, in the current user status queue, a second type of data corresponding to the newly added first type of data; In the current user status queue, the user status information of the second type of data is deleted; all the user status information after the deleted user status information is moved forward in sequence, and the current user status queue is updated; Add the new first type of data to the end of the queue of the updated user status queue.

本申请实施例一实施方式中，所述服务器还包括：触发单元，配置为所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；第一校验更新单元，配置为所述当前用户状态队列的队列长度达到第一阈值时，将处于所述当前用户状态队列中位置靠前的用户状态信息依次从所述当前用户状态队列中删除，将处于所述当前用户状态队列中位置靠后的用户状态信息依次前移，将实时更新的数据***所述当前用户状态队列的队尾部。In an embodiment of the present application, the server further includes: a triggering unit, configured to trigger an update of the current user status queue when the first type of data and/or the second type of data is updated in real time; a verification update unit, configured to delete the user status information in the current user status queue from the current user status queue, when the queue length of the current user status queue reaches a first threshold, The user status information in the current user status queue is moved forward in turn, and the real-time updated data is inserted into the tail of the current user status queue.

本申请实施例一实施方式中，所述服务器还包括：触发单元，配置为所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；第一校验更新单元，配置为：获取处于所述当前用户状态队列中队尾部的用户状态信息所对应的第一时间信息；根据所述第一时间信息与当前时间信息的差值来判断所述当前用户状态队列中队列内容的有效性，若所述差值大于第二阈值，则对所述当前用户状态队列中的所有用户状态信息进行清空。In an embodiment of the present application, the server further includes: a triggering unit, configured to trigger an update of the current user status queue when the first type of data and/or the second type of data is updated in real time; a verification update unit, configured to: acquire first time information corresponding to user status information at a tail end of the current user status queue; and determine the current user according to a difference between the first time information and current time information The validity of the content of the queue in the status queue. If the difference is greater than the second threshold, all user status information in the current user status queue is cleared.

本申请实施例的一种计算机可读存储介质，其上存储有计算机程序，其中，该计算机程序被处理器执行时实现如上述实施例所述数据处理方法的步骤。A computer readable storage medium of the embodiment of the present application, wherein a computer program is stored thereon, wherein the computer program is executed by the processor to implement the steps of the data processing method according to the above embodiment.

本申请实施例的一种服务器，如图4所示，所述服务器包括：存储器61，用于存储能够在处理器上运行的计算机程序；处理器62，用于运行所述计算机程序时，执行如上述实施例中数据处理方法的步骤。所述服务器还可以包括：外部通信接口63，外部通信接口63用于与终端等外设进行信息交互，具体如服务器接收终端发送的实时数据流，根据实时数据流中的第一类数据和第二类数据建立用户状态队列，以时间信息标识队列中的每一个用户状态信息。服务器42根据用户状态队列得到用户状态特征，将用户状态特征、候选数据输入运算模型，输出数据处理结果，将数据处理结果发送给终端。所述服务器还可以包括：内部通信接口64，所述内部通信接口64具体可以是PCI总线等总线接口。A server of the embodiment of the present application, as shown in FIG. 4, the server includes: a memory 61 for storing a computer program capable of running on a processor; and a processor 62 for executing when the computer program is executed The steps of the data processing method as in the above embodiment. The server may further include: an external communication interface 63, where the external communication interface 63 is used for information interaction with a peripheral device such as a terminal, such as the server receiving the real-time data stream sent by the terminal, according to the first type of data and the first data in the real-time data stream. The second type of data establishes a user status queue, and the time information identifies each user status information in the queue. The server 42 obtains the user state feature according to the user state queue, inputs the user state feature and the candidate data into the operation model, outputs the data processing result, and transmits the data processing result to the terminal. The server may further include: an internal communication interface 64, which may specifically be a bus interface such as a PCI bus.

以一个现实应用场景为例对本申请实施例阐述如下：Taking a real application scenario as an example, the embodiment of the present application is as follows:

以信息推荐中的点击率预估场景为例描述本申请实施例。The embodiment of the present application is described by taking the scenario of the click rate estimation in the information recommendation as an example.

相关技术中，是通过用户行为建立用户画像来描述用户的兴趣、偏好等。具体是使用固定时间长度的用户历史行为数据建立用户画像。可是，时间选取过长(例如一个月)则对用户变化不够敏感，时间选取过短(例如几小时或一天)对用户的行为统计又不够准确。为兼容对用户当下兴趣、偏好的敏感性与准确性，采用本申请实施例，是一种基于动态时间窗口用户状态队列来表示用户状态，能精确的描述出用户当下状态，得到用户的兴趣、偏好等。In the related art, a user portrait is created by user behavior to describe a user's interests, preferences, and the like. Specifically, the user's historical behavior data is used to establish a user portrait. However, long time selection (for example, one month) is not sensitive enough to user changes, and time selection is too short (for example, hours or one day) to be inaccurate in user behavior statistics. In order to be compatible with the sensitivity and accuracy of the user's current interests and preferences, the embodiment of the present application is a dynamic time window user state queue to represent the user state, which can accurately describe the current state of the user and obtain the user's interest. Preferences, etc.

本申请实施例的一处理流程如图5所示，包括：A processing flow of the embodiment of the present application is as shown in FIG. 5, and includes:

步骤301、收集实时流数据。Step 301: Collect real-time stream data.

步骤302、根据实时流数据生成用户状态队列。Step 302: Generate a user status queue according to the real-time stream data.

步骤303、根据用户状态队列计算用户状态特征。Step 303: Calculate a user status feature according to the user status queue.

步骤304、将用户状态特征和候选数据输入点击率预估模型，根据点击率预估模型计算点击率。Step 304: Input the user state feature and the candidate data into the click rate estimation model, and calculate the click rate according to the click rate estimation model.

步骤305、根据点击率得到推荐结果。Step 305: The recommended result is obtained according to the click rate.

本申请实施例中，通过固定时间周期的用户画像来描述用户状态。时间周期选取过长导致对用户时下兴趣、偏好的变化不敏感；时间周期选取过短，导致对用户时下兴趣表示不准确。为兼容对用户当下兴趣、偏好的敏感性与准确性，采用动态时间窗口的用户状态表示队列来表示用户状态，包括如下内容：In the embodiment of the present application, the user state is described by a user portrait of a fixed time period. The long selection of the time period results in insensitivity to the user's current interest and preference changes; the time period selection is too short, resulting in inaccurate representation of the user's current interests. In order to be compatible with the sensitivity and accuracy of the user's current interests and preferences, the user status representation queue of the dynamic time window is used to represent the user status, including the following:

一、动态时间窗口的结构和更新策略。First, the structure and update strategy of the dynamic time window.

1.1)动态时间窗口的用户状态队列结构。1.1) User state queue structure for dynamic time windows.

该队列每一个元素包含两部分数据，分别是行为内容和发生时间。行为内容记录了用户看到的新闻标识(ID)以及具体发生了什么行为，例如只是看了新闻1的标题没有点击进入具体的详情页，则记录为“新闻1:曝光”。发生时间则记录发生行为是的具体时间。动态时间窗口的用户状态队列，其一个样例如6图示，图6为动态时间窗口用户状态队列的样例图。Each element of the queue contains two parts of data, which are the behavior content and the time of occurrence. The behavioral content records the news identification (ID) that the user sees and what behavior has occurred. For example, if the title of the news 1 is not clicked and entered into a specific detail page, it is recorded as "News 1: Exposure". The time of occurrence records the specific time at which the behavior occurred. A user status queue of a dynamic time window, such as a 6 diagram, and FIG. 6 is a sample diagram of a dynamic time window user status queue.

不停的用实时新闻曝光数据和实时用户行为数来填充队列，在使用曝光数据时，直接将曝光数据填入队尾部，其过程如图7所示，图7为曝光数据***动态时间窗口用户状态队列示例图。以A1标识的原有队列有三条内容，当新来一条以A2标识的曝光数据时，直接将其填充在队列尾部。当新来一条用户行为数据时，需要先在队列里找到其对应的曝光数据，并且将该条曝光数据移出队列，将队列中排在该元素后的内容位置依次前移，最后将该条用户行为数据***队列末尾，其过程如图8所示。图8为行为数据***动态时间窗口用户状态队列示例图。如图8所示，以A3标识的原有队列有四条内容，当新来一条A4标识的用户行为数据时，首先，找到新闻3在队列对应的曝光数据，并将其删除，如新闻3对应的用户状态信息在用户状态队列中予以删除(移除或清空或同类意思)。之后，将删除元素后的所有元素位置一次迁移(队列不区分内容，按照用户点击的时间顺序排列，即根据发生时间进行排序)，如图9所示。最后，将用户行为数据***队列末尾，如图10所示。The queue is filled with real-time news exposure data and real-time user behavior. When using the exposure data, the exposure data is directly filled into the tail of the team. The process is shown in Figure 7. Figure 7 shows the user who inserts the exposure data into the dynamic time window. Example diagram of the status queue. The original queue identified by A1 has three contents. When a new exposure data marked with A2 is newly added, it is directly filled in the tail of the queue. When a new user behavior data is obtained, the corresponding exposure data needs to be found in the queue, and the exposure data is moved out of the queue, and the content position of the queue after the element is sequentially moved forward, and finally the user is selected. The behavior data is inserted at the end of the queue, and the process is shown in Figure 8. Figure 8 is a diagram showing an example of behavior data insertion dynamic time window user status queue. As shown in Figure 8, the original queue identified by A3 has four contents. When a new user behavior data of A4 is found, first, the exposure data corresponding to the news 3 in the queue is found and deleted, such as news 3 User status information is deleted (removed or emptied or similar) in the user status queue. After that, all the elements after the element is deleted are migrated at one time (the queue does not distinguish the content, and is arranged in the order of the user's click, that is, sorted according to the time of occurrence), as shown in FIG. Finally, the user behavior data is inserted at the end of the queue, as shown in Figure 10.

1.2)动态时间窗口用户状态表示队列更新策略。1.2) Dynamic Time Window User Status indicates the queue update policy.

更新策略需要保证对用户兴趣、偏好的变化的敏感性，又要保证准确性。从队列长度和队列有效性时间两方面来保证上述两点。1)根据用户在腾讯新闻和天天快报上的阅读行为统计，选取100作为队列长度。新增数据到来要填充队列时，如果队列已中已有100个元素，需要将最早到达的元素剔除队列，并将后边的元素依次迁移，再将新增的元素***队列末尾。2)在时间上限制队列内容的时效性，当队列最后一个元素的发生时间与当前时间相差大于24小时时，将清空队列内容。The update strategy needs to ensure sensitivity to changes in user interests and preferences while ensuring accuracy. The above two points are guaranteed from the two aspects of queue length and queue validity time. 1) According to the user's reading behavior statistics on Tencent News and Daily Express, select 100 as the queue length. When new data arrives to fill the queue, if there are already 100 elements in the queue, the earliest arriving elements need to be culled, and the elements behind are migrated in turn, and the newly added elements are inserted at the end of the queue. 2) Limit the timeliness of the queue content in time. When the time of occurrence of the last element of the queue differs from the current time by more than 24 hours, the queue content will be cleared.

采用动态时间窗口用户状态表示队列的好处是：动态时间窗口用户状态表示队列的对长度和队列内容有效期均有限制。其长度限制保证了对用户状态把握的准确性与时效性，当用户刷新频率较快时，队列保留的是用户短期100条浏览记录，当用户刷新频率较慢时我们保留的是用户中长期100条浏览记录，这样就实现了一个动态时间窗口的机制，使得用户状态的表示随用户行为频率变化而变化。其次，队列内容有时间限制，队列最后一条数据与当前时间超过24小时时，说明用户的使用发生了间隔，在这个时间段，认为用户的兴趣偏好是不可感知的，因此，为了使过去的数据对未来用户兴趣偏好预测不产生影响，清空队列，以保证用户状态的准确。The advantage of using dynamic time window user state to represent the queue is that the dynamic time window user state indicates that the queue length and queue content validity period are limited. The length limit guarantees the accuracy and timeliness of the user state. When the user refresh rate is fast, the queue retains the user's short-term 100 browsing records. When the user refresh rate is slow, we retain the user medium-long time 100. The browsing history, which implements a dynamic time window mechanism, causes the representation of the user state to change as the user's behavior frequency changes. Secondly, the queue content has a time limit. When the last data of the queue and the current time exceeds 24 hours, the user's usage is separated. During this time period, the user's interest preference is considered imperceptible. Therefore, in order to make the past data It does not affect the prediction of future user interest preferences, and the queue is cleared to ensure the accuracy of the user status.

1.3)用户状态的生成。1.3) Generation of user status.

第一步：建立实时流式数据连接，使***能实时获取用户看到那些新闻以及在点击进入那些新闻的详情页进行浏览，或者评论过那些新闻，对那些新闻做过转发操作。我们使用新闻曝光实时数据和用户行为实时数据，来填充用户状态。第二步：如果是老用户，则在在存储介质中(这里存储介质包括各种类型数据库)取出该用户的状态表示队列，如果是新用户则新建一个动态时间窗口用户状态表示队列。第三步：根据当前的数据，以及队列情况，更新队列。第四步：将更新完毕的队列写入存储介质。The first step: establish a real-time streaming data connection, so that the system can get the real-time users to see the news and click on the details page of those news, or comment on those news, and forward the news. We use news exposure real-time data and real-time user behavior data to populate the user state. The second step: if it is an old user, the user's state representation queue is taken out in the storage medium (where the storage medium includes various types of databases), and if it is a new user, a new dynamic time window user state representation queue is created. Step 3: Update the queue based on the current data and the queue status. Step 4: Write the updated queue to the storage medium.

用户状态生成的流程如图11所示，包括：The process of user state generation is shown in Figure 11, including:

步骤501、采用Spark Streaming接入流式数据，流式数据包括新闻曝光数据的数据流和用户行为数据的数据流。Step 501: Accessing streaming data by using Spark Streaming, where the streaming data includes a data stream of news exposure data and a data stream of user behavior data.

步骤502、从存储介质中读取用户历史状态。Step 502: Read a user history status from the storage medium.

步骤503、计算用户当前状态。Step 503: Calculate the current state of the user.

步骤504、将用户当前状态写入存储介质中。Step 504: Write the current state of the user into the storage medium.

这里，对于该流程，其***实现上，使用Spark Streaming作为实时数据流的接入工具，在存储端使用redis作为该存储介质。redis是一个性能非常优秀的内存数据库，为高性能的分布式存储***(key-value)中的数据库，能最大限定确保key/value的数据存储，查询速度快、存放数据量大、支持高并发，在部分场合可以对关系数据库起到很好的补充作用。redis支持存储的value类型相对更多，包括字符串(string)、链表(list)、集合(set)、哈希(hashs)等数据类型。这些数据类型支持各种丰富的操作。在此基础上，redis支持各种不同方式的排序，为了保证效率，数据都是缓存在内存中，redis会周期性的把更新的数据写入磁盘或者把修改操作写入追加的记录文件中。Here, for the process, its system implementation uses Spark Streaming as an access tool for real-time data streams, and redis is used as the storage medium on the storage side. Redis is a very good performance in-memory database. It is a database in a high-performance distributed storage system (key-value). It can maximize the data storage of key/value, fast query, large amount of data storage, and high concurrency. In some occasions, the relational database can be a good complement. Redis supports storing more value types, including string (string), list (list), set (set), hash (hashs) and other data types. These data types support a variety of rich operations. On this basis, redis supports sorting in different ways. In order to ensure efficiency, data is cached in memory. Redis periodically writes updated data to disk or writes modified operations to the appended record file.

二、用户状态特征生成。Second, the user state feature generation.

用户状态特征，是根据用户状态队列生成，目前主要分为两类特征。一类为属性统计类特征，一类为反馈类特征。1)属性统计类特征的实现。将用户状态队列中所有新闻id对应的，新闻一级分类、二级分类、关键词、标签、主题、标题分次取出，并按不同的行为权重做累加，例如有点击行为的权重为1，收藏行为的权重为1.5，转发行为的2，未发生行为的为0等。以此方法，分别取权重累加top5的新闻一级分类、二级分类、关键词、标签、标题分次作为用户偏好特征。2)反馈类特征的实现。反馈类特征分为正反馈特征和负反馈特征，负反馈特征。负反馈特征是将用户近半小时内，所有曝光且为点击新闻的一级分类、二级分类、关键词、标签、主题、标题分次等，按出现次数累加，取top10作为特征。而正反馈特征是将用户近半小时内，取发生点击行为时间离当前最近的20条新闻(若超过20条取20条，若不足20条全部取)的一级分类、二级分类、关键词、标题、主题、标题分次，按时间排序取top5，作为用户的正反馈特征。The user status feature is generated based on the user status queue and is currently divided into two types of features. One is the attribute of the attribute category, and the other is the feature of the feedback class. 1) Implementation of attribute statistics class features. The news level classification, the secondary classification, the keywords, the labels, the topics, and the titles are taken out in batches corresponding to all news ids in the user status queue, and are accumulated according to different behavior weights, for example, the weight of the click behavior is 1, The weight of the collection behavior is 1.5, the forwarding behavior is 2, and the non-behaving behavior is 0. In this way, the news level classification, the secondary classification, the keyword, the label, and the title ranking of top5 are respectively taken as weights as user preference features. 2) Implementation of feedback class features. The feedback class features are divided into positive feedback feature and negative feedback feature, and negative feedback feature. The negative feedback feature is to superimpose the user within one and a half hours, and all the exposures are the first-level classification, the second-level classification, the keyword, the label, the theme, the title classification, etc. of the click news, and are accumulated according to the number of occurrences, taking top10 as a feature. The positive feedback feature is the first-level classification, the second-level classification, and the key to the user’s 20-new news (if more than 20 are taken, if less than 20 are taken) Words, titles, topics, titles are graded, and top5 is sorted by time as a positive feedback feature of the user.

三、点击模型预估点击率。Third, click on the model to estimate the click rate.

将上文计算出的用户状态特征，连同带待荐的新闻信息以及用户基本属性等，输入点击率预估模型(例如逻辑回归或分解机)，输出每一条待推荐新闻的预估点击率。The user state characteristics calculated above, together with the news information to be recommended and the basic attributes of the user, are input into a click rate estimation model (for example, a logistic regression or a decomposition machine), and the estimated click rate of each news to be recommended is output.

四、输出新闻推荐结果：Fourth, the output news recommendation results:

以上一步每一条新闻的预估点击率为主要依据，结合业务规则，将最终的推荐新闻展现给用户。The estimated click-through rate of each news in the above step is the main basis, and the final recommendation news is presented to the user in combination with the business rules.

在本申请所提供的几个实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个***，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元，即可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

另外，在本申请各实施例中的各功能单元可以全部集成在一个处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于一计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk. A medium that can store program code.

或者，本申请上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, the above-described integrated unit of the present application may be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

工业实用性Industrial applicability

Claims

一种数据处理方法，所述方法包括：A data processing method, the method comprising:

收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据；Collecting a real-time data stream, the real-time data stream including a first type of data characterizing the user's behavior and a second type of data characterizing the user's attention data itself;

根据所述第一类数据和所述第二类数据建立用户状态队列；Establishing a user status queue according to the first type of data and the second type of data;

根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；And tracking the change of the user state in real time according to the user status queue and the time information triggering the first type of data, to obtain a user status feature;

获取待处理的候选数据和运算模型；Obtaining candidate data and an operation model to be processed;

将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数；Taking the user state feature and the candidate data to be processed as input parameters of the operation model, and obtaining an output parameter after the operation of the operation model;

根据所述输出参数得到推荐信息，发送所述推荐信息。The recommendation information is obtained according to the output parameter, and the recommendation information is sent.
根据权利要求1所述的方法，其中，所述根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征，包括：The method according to claim 1, wherein the tracking of the change of the user state in real time according to the user status queue and the time information for triggering the first type of data, and obtaining the user status feature, includes:

根据触发所述第一类数据的时间信息，标识位于所述用户状态队列中的每一个用户状态信息，得到所述用户状态队列中以动态时间窗口表示的每一个用户状态；Determining, according to time information of the first type of data, each user status information in the user status queue, and obtaining each user status in the user status queue represented by a dynamic time window;

根据所述用户状态队列得到用户状态特征。A user status feature is obtained based on the user status queue.
根据权利要求1或2所述的方法，其中，所述方法还包括：The method of claim 1 or 2, wherein the method further comprises:

收集所述实时的数据流时得到新增的第二类数据；Adding the second type of data when collecting the real-time data stream;

提取当前用户状态队列；Extract the current user status queue;

将新增的第二类数据直接添加入所述当前用户状态队列的队尾部，对所述当前用户状态队列进行更新。The newly added second type of data is directly added to the tail of the current user status queue, and the current user status queue is updated.
根据权利要求1或2所述的方法，其中，所述方法还包括：The method of claim 1 or 2, wherein the method further comprises:

收集所述实时的数据流时得到新增的第一类数据；Adding the first type of data when collecting the real-time data stream;

提取当前用户状态队列；Extract the current user status queue;

在所述当前用户状态队列中，查询到与新增的第一类数据对应的第二类数据；In the current user status queue, querying the second type of data corresponding to the newly added first type of data;

从所述当前用户状态队列中，删除所述第二类数据所在的用户状态信息；Deleting user status information of the second type of data from the current user status queue;

将位于被删除用户状态信息后的所有用户状态信息进行位置依次前移，对所述当前用户状态队列进行更新；All the user status information after the deleted user status information is moved forward in advance, and the current user status queue is updated;

将新增的第一类数据添加入更新后的用户状态队列的队尾部。Add the new first type of data to the end of the queue of the updated user status queue.
根据权利要求1或2所述的方法，其中，所述方法还包括：The method of claim 1 or 2, wherein the method further comprises:

所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；When the first type of data and/or the second type of data is updated in real time, triggering an update to the current user status queue;

所述当前用户状态队列的队列长度达到第一阈值时，将处于所述当前用户状态队列中位置靠前的用户状态信息依次从所述当前用户状态队列中删除，将处于所述当前用户状态队列中位置靠后的用户状态信息依次前移，将实时更新的数据***所述当前用户状态队列的队尾部。When the queue length of the current user status queue reaches the first threshold, the user status information in the current user status queue is deleted from the current user status queue in turn, and is in the current user status queue. The user status information in the lower position is advanced in turn, and the data updated in real time is inserted into the tail of the current user status queue.
根据权利要求1或2所述的方法，其中，所述方法还包括：The method of claim 1 or 2, wherein the method further comprises:

所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；When the first type of data and/or the second type of data is updated in real time, triggering an update to the current user status queue;

获取处于所述当前用户状态队列中队尾部的用户状态信息所对应的第一时间信息；Obtaining first time information corresponding to user state information at the end of the current user state queue squadron;

根据所述第一时间信息与当前时间信息的差值来判断所述当前用户状态队列中队列内容的有效性，若所述差值大于第二阈值，则对所述当前用户状态队列中的所有用户状态信息进行清空。Determining the validity of the queue content in the current user status queue according to the difference between the first time information and the current time information, and if the difference is greater than the second threshold, then all the current user status queues User status information is cleared.
一种服务器，所述服务器包括：A server, the server comprising:

收集单元，配置为收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据；a collecting unit configured to collect a real-time data stream, the real-time data stream including a first type of data representing a user behavior and a second type of data representing a user's attention data itself;

队列建立单元，配置为根据所述第一类数据和所述第二类数据建立用户状态队列；a queue establishing unit, configured to establish a user status queue according to the first type of data and the second type of data;

状态变化跟踪单元，配置为根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；The state change tracking unit is configured to track the change of the user state in real time according to the user state queue and the time information for triggering the first type of data, to obtain a user state feature;

获取单元，配置为获取待处理的候选数据和运算模型；An obtaining unit configured to acquire candidate data and an operation model to be processed;

运算单元，配置为将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数；An operation unit configured to use the user state feature and the candidate data to be processed as an input parameter of the operation model, and obtain an output parameter after the operation of the operation model;

发送单元，配置为根据所述输出参数得到推荐信息，发送所述推荐信息。The sending unit is configured to obtain recommendation information according to the output parameter, and send the recommendation information.
根据权利要求7所述的服务器，其中，所述状态变化跟踪单元，还包括：The server according to claim 7, wherein the state change tracking unit further comprises:

用户状态描述子单元，配置为根据触发所述第一类数据的时间信息，标识位于所述用户状态队列中的每一个用户状态信息，得到所述用户状态队列中以动态时间窗口表示的每一个用户状态；a user status description sub-unit, configured to identify each user status information in the user status queue according to time information for triggering the first type of data, and obtain each of the user status queues represented by a dynamic time window. user status;

特征确定子单元，配置为根据所述用户状态队列得到用户状态特征。A feature determining sub-unit configured to obtain a user state feature based on the user state queue.
根据权利要求7或8所述的服务器，其中，所述收集单元，进一步配置为：The server according to claim 7 or 8, wherein the collecting unit is further configured to:

收集所述实时的数据流时得到新增的第二类数据；Adding the second type of data when collecting the real-time data stream;

所述服务器还包括：The server further includes:

提取单元，配置为提取当前用户状态队列；An extracting unit configured to extract a current user status queue;

更新单元，配置为将新增的第二类数据直接添加入所述当前用户状态队列的队尾部，对所述当前用户状态队列进行更新。And an update unit configured to directly add the added second type of data to the tail of the current user status queue, and update the current user status queue.
根据权利要求7或8所述的服务器，其中，所述收集单元，进一步配置为：The server according to claim 7 or 8, wherein said collecting unit is further configured to:

收集所述实时的数据流时得到新增的第一类数据；Adding the first type of data when collecting the real-time data stream;

所述服务器还包括：The server further includes:

提取单元，配置为提取当前用户状态队列；An extracting unit configured to extract a current user status queue;

更新单元，配置为：Update unit, configured as:

在所述当前用户状态队列中，查询到与新增的第一类数据对应的第二类数据；In the current user status queue, querying the second type of data corresponding to the newly added first type of data;

从所述当前用户状态队列中，删除所述第二类数据所在的用户状态信息；Deleting user status information of the second type of data from the current user status queue;

将位于被删除用户状态信息后的所有用户状态信息进行位置依次前移，对所述当前用户状态队列进行更新；All the user status information after the deleted user status information is moved forward in advance, and the current user status queue is updated;

将新增的第一类数据添加入更新后的用户状态队列的队尾部。Add the new first type of data to the end of the queue of the updated user status queue.
根据权利要求7或8所述的服务器，其中，所述服务器还包括：The server according to claim 7 or 8, wherein the server further comprises:

触发单元，配置为所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；a trigger unit configured to trigger an update of the current user status queue when the first type of data and/or the second type of data is updated in real time;

第一校验更新单元，配置为所述当前用户状态队列的队列长度达到第一阈值时，将处于所述当前用户状态队列中位置靠前的用户状态信息依次从所述当前用户状态队列中删除，将处于所述当前用户状态队列中位置靠后的用户状态信息依次前移，将实时更新的数据***所述当前用户状态队列的队尾部。a first check update unit, configured to: when the queue length of the current user status queue reaches a first threshold, sequentially delete user status information in the current user status queue from the current user status queue The user status information in the current user status queue is moved forward in turn, and the real-time updated data is inserted into the tail of the current user status queue.
根据权利要求7或8所述的服务器，其中，所述服务器还包括：The server according to claim 7 or 8, wherein the server further comprises:

触发单元，配置为所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；a trigger unit configured to trigger an update of the current user status queue when the first type of data and/or the second type of data is updated in real time;

第一校验更新单元，配置为：The first verification update unit is configured to:

获取处于所述当前用户状态队列中队尾部的用户状态信息所对应的第一时间信息；Obtaining first time information corresponding to user state information at the end of the current user state queue squadron;

根据所述第一时间信息与当前时间信息的差值来判断所述当前用户状态队列中队列内容的有效性，若所述差值大于第二阈值，则对所述当前用户状态队列中的所有用户状态信息进行清空。Determining the validity of the queue content in the current user status queue according to the difference between the first time information and the current time information, and if the difference is greater than the second threshold, then all the current user status queues User status information is cleared.
一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如权利要求1至6任一项所述方法的步骤。A computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the method of any one of claims 1 to 6.
一种服务器，所述服务器包括：A server, the server comprising:

存储器，配置为存储能够在处理器上运行的计算机程序；a memory configured to store a computer program capable of running on a processor;

处理器，配置为运行所述计算机程序时，执行如权利要求1至6任一项所述方法的步骤。A processor, configured to perform the steps of the method of any one of claims 1 to 6 when the computer program is run.
一种数据处理方法，所述方法由服务器执行，所述服务器包括有一个或多个处理器以及存储器，以及一个或一个以上的程序，其中，所述一个或一个以上的程序存储于存储器中，所述程序可以包括一个或一个以上的每一个对应于一组指令的单元，所述一个或多个处理器被配置为执行指令；所述方法包括：A data processing method, the method being performed by a server, the server including one or more processors and a memory, and one or more programs, wherein the one or more programs are stored in a memory, The program can include one or more units each corresponding to a set of instructions, the one or more processors being configured to execute instructions; the method comprising:

收集实时的数据流，所述实时的数据流包含表征用户行为的第一类数据和表征用户关注数据自身的第二类数据；Collecting a real-time data stream, the real-time data stream including a first type of data characterizing the user's behavior and a second type of data characterizing the user's attention data itself;

根据所述第一类数据和所述第二类数据建立用户状态队列；Establishing a user status queue according to the first type of data and the second type of data;

根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征；And tracking the change of the user state in real time according to the user status queue and the time information triggering the first type of data, to obtain a user status feature;

获取待处理的候选数据和运算模型；Obtaining candidate data and an operation model to be processed;

将所述用户状态特征和所述待处理的候选数据作为所述运算模型的输入参数，经所述运算模型的运算后得到输出参数；Taking the user state feature and the candidate data to be processed as input parameters of the operation model, and obtaining an output parameter after the operation of the operation model;

根据所述输出参数得到推荐信息，发送所述推荐信息。The recommendation information is obtained according to the output parameter, and the recommendation information is sent.
根据权利要求15所述的方法，其中，所述根据所述用户状态队列和触发所述第一类数据的时间信息，实时跟踪用户状态的变化，得到用户状态特征，包括：The method according to claim 15, wherein the tracking of the change of the user state in real time according to the user status queue and the time information for triggering the first type of data, and obtaining the user status feature, includes:

根据触发所述第一类数据的时间信息，标识位于所述用户状态队列中的每一个用户状态信息，得到所述用户状态队列中以动态时间窗口表示的每一个用户状态；Determining, according to time information of the first type of data, each user status information in the user status queue, and obtaining each user status in the user status queue represented by a dynamic time window;

根据所述用户状态队列得到用户状态特征。A user status feature is obtained based on the user status queue.
根据权利要求15或16所述的方法，其中，所述方法还包括：The method of claim 15 or 16, wherein the method further comprises:

收集所述实时的数据流时得到新增的第二类数据；Adding the second type of data when collecting the real-time data stream;

提取当前用户状态队列；Extract the current user status queue;

将新增的第二类数据直接添加入所述当前用户状态队列的队尾部，对所述当前用户状态队列进行更新。The newly added second type of data is directly added to the tail of the current user status queue, and the current user status queue is updated.
根据权利要求15或16所述的方法，其中，所述方法还包括：The method of claim 15 or 16, wherein the method further comprises:

收集所述实时的数据流时得到新增的第一类数据；Adding the first type of data when collecting the real-time data stream;

提取当前用户状态队列；Extract the current user status queue;

在所述当前用户状态队列中，查询到与新增的第一类数据对应的第二类数据；In the current user status queue, querying the second type of data corresponding to the newly added first type of data;

从所述当前用户状态队列中，删除所述第二类数据所在的用户状态信息；Deleting user status information of the second type of data from the current user status queue;

将位于被删除用户状态信息后的所有用户状态信息进行位置依次前移，对所述当前用户状态队列进行更新；All the user status information after the deleted user status information is moved forward in advance, and the current user status queue is updated;

将新增的第一类数据添加入更新后的用户状态队列的队尾部。Add the new first type of data to the end of the queue of the updated user status queue.
根据权利要求15或16所述的方法，其中，所述方法还包括：The method of claim 15 or 16, wherein the method further comprises:

所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；When the first type of data and/or the second type of data is updated in real time, triggering an update to the current user status queue;

所述当前用户状态队列的队列长度达到第一阈值时，将处于所述当前用户状态队列中位置靠前的用户状态信息依次从所述当前用户状态队列中删除，将处于所述当前用户状态队列中位置靠后的用户状态信息依次前移，将实时更新的数据***所述当前用户状态队列的队尾部。When the queue length of the current user status queue reaches the first threshold, the user status information in the current user status queue is deleted from the current user status queue in turn, and is in the current user status queue. The user status information in the lower position is advanced in turn, and the data updated in real time is inserted into the tail of the current user status queue.
根据权利要求15或16所述的方法，其中，所述方法还包括：The method of claim 15 or 16, wherein the method further comprises:

所述第一类数据和/或第二类数据实时更新时，触发对所述当前用户状态队列的更新；When the first type of data and/or the second type of data is updated in real time, triggering an update to the current user status queue;

获取处于所述当前用户状态队列中队尾部的用户状态信息所对应的第一时间信息；Obtaining first time information corresponding to user state information at the end of the current user state queue squadron;

根据所述第一时间信息与当前时间信息的差值来判断所述当前用户状态队列中队列内容的有效性，若所述差值大于第二阈值，则对所述当前用户状态队列中的所有用户状态信息进行清空。Determining the validity of the queue content in the current user status queue according to the difference between the first time information and the current time information, and if the difference is greater than the second threshold, then all the current user status queues User status information is cleared.