CN113988225A

CN113988225A - Method and device for establishing representation extraction model, representation extraction and type identification

Info

Publication number: CN113988225A
Application number: CN202111597741.9A
Authority: CN
Inventors: 吕乐; 周璟; 刘佳; 范东云; 傅幸; 王宁涛; 杨阳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-01-28
Anticipated expiration: 2041-12-24
Also published as: CN113988225B

Abstract

The embodiment of the specification provides a method and a device for establishing a representation extraction model, representation extraction and type identification. According to the method of the embodiment, first training data including more than one sample pair is obtained, wherein the sample pair comprises a positive sample pair and a negative sample pair; then training a first representation extraction model and a second representation extraction model by using first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the negative sample pairs; and deploying the first representation extraction model obtained by training in the terminal equipment.

Description

Method and device for establishing representation extraction model, representation extraction and type identification

Technical Field

One or more embodiments of the present specification relate to the technical field of artificial intelligence, and in particular, to methods and apparatuses for establishing a representation extraction model, representation extraction, and type identification.

Background

With the rapid development of smart phones in recent years, the development of terminal devices and edge computing is promoted. This means that many tasks can be performed on the terminal device without being handed over to the cloud for processing. The method has the advantages of reducing cloud load, responding to the user more quickly, protecting the privacy of the user and the like. Although a large amount of feature data is collected and reserved on the terminal device, the principle of artificial intelligence determines that a model capable of extracting the characterization vector needs to be deployed on the terminal device so as to realize a specific task on the terminal device.

Disclosure of Invention

In view of the above, one or more embodiments of the present specification describe a method and apparatus for building a feature extraction model, feature extraction, and type recognition.

According to a first aspect, there is provided a method of building a characterization extraction model, the method comprising:

acquiring first training data containing more than one sample pair, wherein the sample pair comprises a positive sample pair and a negative sample pair, the positive sample pair comprises terminal side characteristics and server side characteristics of the same user, and the negative sample pair comprises terminal side characteristics and server side characteristics of different users;

training a first representation extraction model and a second representation extraction model by using the first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the negative sample pairs;

and deploying the first representation extraction model obtained by training to the terminal equipment.

In one embodiment, the terminal-side feature comprises at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device; and/or

The server-side characteristics include at least one of attribute characteristics and behavior statistical characteristics of the user collected by the server side.

In another embodiment, the first representation extraction model comprises a recurrent neural network RNN, a long short term memory network LSTM, or a gated cyclic unit GRU; and/or

The second characterization extraction model comprises a multilayer perceptron MLP, a convolutional neural network CNN or a residual error network ResNet.

In one embodiment, the deploying the trained first representation extraction model to the terminal device includes:

acquiring second training data comprising more than one labeled sample, wherein the labeled sample comprises terminal side characteristics of a user and a labeled type label;

training a recognition model comprising a first characterization extraction model and a linear regression model obtained by the training by using the second training data; the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, the linear regression model is used for predicting a type label by using the terminal side feature vector of the user, and a training target is the difference between a minimum prediction result and the type label in a labeling sample;

and deploying the recognition model to the terminal equipment.

In another embodiment, during training of the recognition model, only the parameters of the linear regression model are updated, keeping the model parameters of the first characterization extraction model unchanged.

In another embodiment, the terminal-side features comprise a sequence of page features browsed by the user, and the server-side features comprise attribute features and transaction statistical features of the user;

the type tags include a risk type tag, a tag of whether or not there is a particular type of risk, or a risk level for a particular type of risk.

According to a second aspect, there is provided a token extraction method, performed by a terminal device, comprising:

acquiring terminal side characteristics of a user;

inputting the terminal side characteristics of the user into a first characteristic extraction model to obtain a terminal side characteristic vector of the user; wherein the first characterization extraction model is pre-established using the method of any of the above.

According to a third aspect, there is also provided a method of type identification, performed by a terminal device, comprising:

acquiring terminal side characteristics of a user;

inputting the terminal side characteristics of the user into an identification model to obtain a type label predicted by the identification model, wherein the identification model is pre-established by adopting the method.

In one embodiment, further comprising:

and sending the type label to a server so that the server can execute a decision for the user or the terminal equipment by using the type label.

In another embodiment, the terminal-side features of the user comprise a sequence of page features browsed by the user;

the type tags include risk type tags or tags whether or not there is a particular type of risk;

the decision comprises a risk control policy.

According to a fourth aspect, there is provided an apparatus for building a characterization extraction model, comprising:

a first obtaining unit configured to obtain first training data including one or more sample pairs, the sample pairs including a positive sample pair and a negative sample pair, the positive sample pair including a terminal-side feature and a server-side feature of the same user, the negative sample pair including a terminal-side feature and a server-side feature of different users;

a first training unit configured to train a first token extraction model and a second token extraction model using the first training data, wherein the first token extraction model is used to extract a terminal-side token vector of a user using a terminal-side feature of the user, and the second token extraction model is used to extract a server-side token vector of the user using a server-side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the negative sample pairs;

and the model deployment unit is configured to deploy the trained first representation extraction model to the terminal equipment.

In one embodiment, further comprising:

the second acquisition unit is configured to acquire second training data comprising more than one labeled sample, wherein the labeled sample comprises a terminal side characteristic of a user and a labeled type label;

a second training unit configured to train a recognition model including the trained first feature extraction model and a linear regression model using the second training data; the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, the linear regression model is used for predicting a type label by using the terminal side feature vector of the user, and a training target is the difference between a minimum prediction result and the type label in a labeling sample;

the model deployment unit is specifically configured to deploy the recognition model to the terminal device.

In another embodiment, the second training unit is specifically configured to, during the training of the recognition model, keep the model parameters of the first characterization extraction model unchanged, and update only the parameters of the linear regression model.

In one embodiment, the terminal-side features comprise a sequence of page features browsed by a user, and/or the server-side features comprise attribute features and transaction statistical features of the user; and/or

According to a fifth aspect, there is further provided a representation extraction apparatus, disposed in a terminal device, the apparatus including:

a third acquisition unit configured to acquire a terminal-side feature of a user;

the characterization extraction unit is configured to input the terminal side characteristics of the user into a first characterization extraction model to obtain a terminal side characterization vector of the user; wherein the first representation extraction model is pre-established by an apparatus as described in any one of the above.

According to a sixth aspect, there is provided a type identification apparatus provided to a terminal device, the apparatus comprising:

a fourth acquisition unit configured to acquire a terminal-side feature of the user;

a type identification unit configured to input the terminal-side characteristics of the user into an identification model, resulting in a type label predicted by the identification model, wherein the identification model is pre-established by the apparatus as described in any one of the above.

In one embodiment, further comprising:

and the label sending unit is configured to send the type label to a server so that the server can execute a decision for the user or the terminal equipment by using the type label.

the decision comprises a risk control policy.

According to a seventh aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor which, when executing the executable code, implements the method of the first aspect.

In the embodiment of the specification, a representation extraction model is deployed on the terminal device by adopting a self-supervision comparison learning mode. On one hand, the self-supervision comparison learning mode is not limited to labeled labels, and a uniform representation extraction model can be used on different types of tasks, so that consumed computing resources are reduced; on the other hand, the modeling is not limited by the number of marked samples, and the full amount of sample data can be effectively utilized, so that the generalization capability and the representation universality of the model are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 illustrates an exemplary system architecture to which embodiments of the invention may be applied;

FIG. 2 illustrates a flow diagram of a method of building a characterization extraction model, according to one embodiment;

FIG. 3 illustrates a schematic diagram of establishing a characterization extraction model, according to one embodiment;

FIG. 4 illustrates a schematic diagram of building a recognition model according to one embodiment;

FIG. 5 shows a flow diagram of a token extraction method according to one embodiment;

FIG. 6 illustrates a flow diagram of a type identification method according to one embodiment;

FIG. 7 shows a schematic block diagram of the apparatus for building a representation extraction model according to one embodiment;

FIG. 8 shows a schematic block diagram of a token extraction apparatus according to one embodiment;

FIG. 9 shows a schematic block diagram of a type identification apparatus according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Most of the existing characterization extraction models are trained in a supervised manner, that is, the characterization extraction models are trained by using feature data and labels labeled in advance for the feature data. The method needs to be trained respectively for different types of tasks to obtain a plurality of representation extraction models for the different types of tasks, and the method inevitably consumes more computing resources. And in the supervised modeling process, if labeled sample data is few (for example, in the risk identification field, the sample data with risks is few), the generalization capability and the characterization universality of the characterization extraction model are greatly influenced.

The idea of the present specification is to use a self-supervised contrast learning approach to build the characterization extraction model. Specific implementations of the above concepts are described below.

For convenience of understanding, a system architecture to which the technical solution provided in the present specification is applicable will be briefly described first. FIG. 1 illustrates an exemplary system architecture to which embodiments of the invention may be applied.

As shown in fig. 1, the system architecture may include a terminal device 101 and a terminal device 102, a network 103 and a server 104. Network 103 is the medium used to provide communication links between terminal device 101, terminal device 102, and server 104. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may interact with server 104 through network 103 using terminal device 101 and terminal device 102. Various applications, such as a web browser application, a communication-type application, a multimedia application, a game-type application, and the like, may be installed on the terminal device 101 and the terminal device 102.

The terminal device 101 and the terminal device 102 may be, but are not limited to, a smart mobile terminal, a smart home device, a wearable device, a PC (personal computer), and the like. Wherein the smart mobile device may include devices such as a cell phone, a tablet computer, a notebook computer, a PDA (personal digital assistant), an internet automobile, etc. The smart home device may include a smart home device, such as a smart television, a smart refrigerator, and so forth. Wearable devices may include devices such as smart watches, smart glasses, virtual reality devices, augmented reality devices, mixed reality devices (i.e., devices that can support virtual reality and augmented reality), and so forth.

The server 104 may be a single server or a server group including a plurality of servers. The Server 104 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and virtual Private Server (VPs) service. The server 104 may also be a server of a distributed system or a server incorporating a blockchain.

The device for establishing the representation extraction model provided in the present specification may be configured and run in the server 104, and the server 104 deploys the established model in the terminal device 101 or the terminal device 102. It may be implemented as a plurality of software or software modules (for example, for providing distributed services), or as a single software or software module, which is not specifically limited herein.

The representation extraction means and the type identification means provided in this specification may be configured and operated in the terminal device 101 or the terminal device 102, and are configured to extract the terminal-side representation vector and the prediction type tag of the user, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The method provided in the present specification is described in detail below with reference to examples. FIG. 2 illustrates a flow diagram of a method of building a characterization extraction model, according to one embodiment. It will be appreciated that the method may be performed by a server side, but in addition to this, it is not excluded that it may be performed by other computer devices or platforms having a higher computing power. As shown in fig. 2, the method includes:

step 201, obtaining first training data including more than one sample pair, where the sample pair includes a positive sample pair and a negative sample pair, the positive sample pair includes terminal-side features and server-side features of the same user, and the negative sample pair includes terminal-side features and server-side features of different users, that is, one negative sample pair includes terminal-side features of one user and server-side features of another user.

Step 203, training a first representation extraction model and a second representation extraction model by using the first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of the user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side and server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side and server-side eigenvectors of the negative sample pairs.

And step 205, deploying the trained first representation extraction model to the terminal device.

In the method shown in fig. 2, a self-supervised contrast learning manner is adopted to implement the deployment of the representation extraction model on the terminal device. On one hand, the self-supervision comparison learning mode is not limited to labeled labels, and a uniform representation extraction model can be used on different types of tasks, so that consumed computing resources are reduced; on the other hand, the modeling is not limited by the number of marked samples, and the full amount of sample data can be effectively utilized, so that the generalization capability and the representation universality of the model are improved.

It should be noted that the expressions "first", "second", "third", "fourth", and the like in the embodiments of the present specification are not limited to the size, order, and number, and are merely used for name differentiation. For example, the "first preset time period" and the "second preset time period" are merely used to distinguish the two preset time periods in terms of names. As another example, the "first training data" and the "second training data" are merely used to distinguish the two training data in terms of names.

The manner in which the various steps shown in fig. 2 are performed is described below. The above step 201, i.e. "acquiring first training data comprising more than one sample pair", is first described in detail.

Usually, the terminal device can record the relevant characteristics of the user, and the server side can also collect and record the relevant characteristics of the user, and the two characteristics are different.

The terminal device generally records behavior characteristics of the user on the terminal device, status characteristics of the terminal device, and the like, and generally relates to the operation of the user on the terminal device, and changes with time. That is, the terminal-side feature may include at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device.

The behavior feature sequence of the user on the terminal device is composed of behavior features of N continuous time points in a time window with preset first duration. The state feature sequence of the terminal device is composed of the state features of the terminal device at N consecutive time points in the time window with the preset first duration. N is a positive integer greater than 1.

Since the behavior data of the user on the terminal device is usually embodied as page information browsed in the application program, a page feature sequence browsed by the user at N consecutive time points can be obtained. The Page related to this embodiment may also be referred to as a Web Page, and may be a Web Page (Web Page) written based on HTML (HyperText Markup Language), that is, an HTML Page, or may also be a Web Page written based on HTML and Java languages, that is, a Java Server Page (JSP), or may also be a Web Page written in other languages, which is not particularly limited in this embodiment.

Where the page characteristics may actually be encoded information for the page. Suppose that

Representing a user

Go to

The coded information of the individual pages is,

representing a user

Go to

Time information of individual page, user

The characteristic sequence of the pages browsed at the continuous N time points is represented as follows:

. For example, the encoded information of pages viewed every minute by a user in an hour constitutes a sequence of page features for the user.

The state characteristics of the terminal device may include the power of the terminal device, screen brightness, attitude, and the like. I.e. as a sequence of state signatures of the terminal device at N consecutive points in time.

As one of the realizable ways, the user can be directly connected

Taking the page feature sequence browsed at N continuous time points as the terminal side feature of the user

Then, then

。

The server side usually records attribute features or behavior statistical features of the user, these features are usually enumerated and numerical data, and in this embodiment, the relevant features of the user recorded by the server side are referred to as server-side features. That is, the server-side characteristics include at least one of attribute characteristics and behavior statistical characteristics of the user collected by the server side.

The attribute characteristics of the user may be, for example, the user's gender, age, occupation, etc. The behavior statistic characteristics of the user can be statistic values such as transaction amount, transaction stroke number and the like accumulated by the user in a time window of a second preset time length. These features constitute the server-side features of the user and still

For example, a user whose server-side features can be represented as vectors

：

Where d is the vector dimension of the server-side features, generally consistent with the number of server-side feature types employed, and d is a positive integer.

In the embodiments of the present specification, terminal-side features of a plurality of users may be grouped into sets

Feature data of a plurality of users forming a set

. Can be based on

And

some pairs of positive and negative examples are constructed to constitute the first training data.

Wherein the positive sample pairs include terminal-side features and server-side features of the same user. I.e. terminal side features for a given arbitrary user

Get the server-side feature of the same user

Form a positive sample pair (

,

). It is to be noted here that for a positive sample pair

And

the time windows used are corresponding, i.e.

Time window of the first preset duration and

the time window of the second preset duration is adopted correspondingly. Two timesThe windows may be identical, or may have different lengths but the same starting point, or may be partially overlapped.

It is easier to understand that the two time windows are completely identical, e.g. the user is in the process of checking

The characteristic sequence of the web pages browsed within 1 hour from the moment

The user can be obtained from

Statistics such as transaction amount and transaction number within 1 hour from the time point constitute server-side characteristics

。

However, in some special cases, there may be an association between the terminal-side feature and the server-side feature in time windows of different durations. For example, the user uses the terminal equipment 1 to

A series of web page feature sequences are browsed within 1 hour from the moment

Then using the terminal equipment 2 again

A series of web pages are browsed within 1 hour from the beginning of the moment, and the two stages of browsing may together complete the transaction, so that

Corresponding server-side features

May be thatServer side collection slave

The time starts within the time window of 2 hours.

The negative example pairs include terminal-side features and server-side features of different users. E.g. terminal side features for a certain user

And then randomly adopt the server side characteristic of another user

Form a negative sample pair (

,

). Negative sample centering terminal side feature

And server side features

The time window employed need not be limiting. The two corresponding time windows may be, or may be, partially overlapped, or completely non-overlapped and unrelated.

It can be seen that the above-mentioned first training data obtaining method is not limited by whether to perform type labeling on the data, and is not affected by the small data amount of a certain type of label, but can fully utilize the full data.

The above step 203, i.e., "training the first and second token extraction models using the first training data" will be described in detail below with reference to the embodiments.

FIG. 3 illustrates a schematic diagram of establishing a characterization extraction model, according to one embodiment. As shown in fig. 3, the present embodiment involves two kinds of characterization extraction models: a first characterization extraction model and a second characterization extraction model.

The first representation extraction model is used for extracting a terminal side feature vector of a user, namely the terminal side feature input as the user

The terminal-side eigenvectors output as users

. The second feature extraction model is used for extracting the server-side feature vector of the user, namely the server-side feature input as the user

Or

The output is the server side eigenvector and the positive sample pair of the user

The corresponding server-side token vector is represented as

Centering of negative sample

The corresponding server-side token vector is represented as

。

The first representation extraction model is a target model, that is, a model to be deployed in the terminal device, and the second representation extraction model is only used in a training process to assist in obtaining the first representation extraction model.

The first feature extraction model may be, for example, RNN (Recurrent Neural Network), LSTM (Long-Short Term Memory)Memory network), GRU (Gated current Unit), etc., where RNN is taken as an example in fig. 3. RNN from

Is extracted from

In the case of the graph of figure 3,

=RNN(

). Wherein, RNN () is a conversion function adopted by RNN in the above extraction process.

The second characterization extraction model may be, for example, MLP (Multilayer Perceptron), CNN (Convolutional Neural Networks), or ResNet (Residual Network). In fig. 3, MLP is taken as an example. MLP from

Is extracted from

From

Is extracted from

. In the context of figure 3 of the drawings,

=MLP(

)，

=MLP(

). Wherein, MLP () is the transfer function adopted by MLP in the above extraction process.

The model training process in the step adopts a self-supervision contrast learning task. The training goal of the model is to maximize the similarity between the terminal-side and server-side eigenvectors of the same user and minimize the similarity between the terminal-side and server-side eigenvectors of different users (i.e., the similarity between the terminal-side eigenvector of one user and the server-side eigenvector of another user). That is, the similarity of the two features in the positive sample pair on the mapped vector space is maximized and the similarity of the two features in the negative sample pair on the mapped vector space is minimized.

In this embodiment of the present specification, a loss function may be constructed according to the training target, and model parameters of the first representation extraction model and the second representation extraction model are updated in a manner such as gradient descent by using a value of the loss function in each iteration until a preset training end condition is satisfied. The training end condition may include, for example, that a value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number threshold, and the like.

As one of realizable ways, the loss function constructed above may be a NCE-dependent Estimation (Noise-contrast Estimation) loss function. For example, the NCE loss function L can be expressed as follows:

（1）

wherein,

for the similarity calculation function, a calculation method such as cosine similarity may be employed.

To be at the sample

、

And

and calculating expected values on the basis.

As can be seen from the NCE loss function, the process of the above-mentioned self-supervision contrast learning is actually to shorten the spatial distance between the terminal-side feature and the server-side feature of the same user as much as possible by updating the model parameters, and to lengthen the spatial distance between the terminal-side feature and the server-side feature of different users as much as possible. That is, the terminal-side features of the user extracted by the finally trained first token extraction model are enabled to be highly similar in space to the server-side features. Since the server side is typically able to have rich feature dimensions to describe some implicit information in user behavior, it is also typically easy to incorporate a large amount of expert experience and complex logical data. Therefore, the server-side characteristics effectively guide the terminal-side characteristics, and the terminal-side characteristics can contain more comprehensive and universal characterization information and have stronger generalization capability.

The above step 205, that is, "deploy the trained first representation extraction model to the terminal device" is described in detail below with reference to the embodiments.

If the process of establishing the model is executed by the server, the server may send the first representation extraction model obtained by training in step 203 to the terminal device. If the process of establishing the model is executed by other computer equipment or platforms with strong computing power, the computer equipment or platforms can provide the trained first representation extraction model for the server side, and the server side sends the first representation extraction model to the terminal equipment. The computer device or the platform can also directly send the trained first representation extraction model to the terminal device. The server side sends the data to the terminal device for example.

Since the modeling process may be performed in a one-time manner, it may be updated periodically or in response to a triggering event. For example, the first representation extraction model is established every month or year in the above manner, and the established representation extraction model is pushed to the terminal device.

The server side can adopt an active push mode. For example, after the server side obtains the first representation extraction model through training, the server side actively pushes data of the first representation extraction model to the terminal device, and the terminal device stores the data locally or updates the locally stored first representation extraction model.

The server side can also push in response to the request of the terminal equipment. For example, after the server side obtains the first representation extraction model through training, a notification message of model acquisition or model update is sent to the terminal device. The terminal device may determine whether to send a model acquisition request to the server side based on a response of the user to the notification message. And if the user triggers the module for acquiring the model based on the notification message, the terminal equipment sends a model acquisition request to the server side, and the server side responds to the request and sends the established data of the first representation extraction model to the terminal equipment. The first characterization extraction model, which has been stored locally, is stored locally or updated by the terminal device.

In the practical application process, the first representation extraction model is deployed in the terminal device, and the terminal-side representation vector of the user is often extracted, and the terminal-side representation vector of the user is often used for performing downstream tasks. This downstream task often requires identification based on type tags. Therefore, in this step, the server may further perform fine tuning training by hooking the linear regression model on the trained first characterization extraction model, so as to obtain the recognition model. And deploying the identification model in the terminal equipment, namely sending the data of the identification model to the terminal equipment.

FIG. 4 illustrates a schematic diagram of building a recognition model according to one embodiment. As shown in fig. 4, the recognition model mainly includes a first characterization extraction model that has been trained previously and a linear regression model.

The training process of the recognition model mainly adopts a supervised learning mode, so that when the recognition model is established, a second training sample comprising more than one labeled sample is firstly obtained. The annotation sample comprises the terminal-side characteristics of the user and the type label of the annotation.

The acquisition of the terminal-side features of the user is similar to the related description in step 201 in the above embodiment, and is not described herein again. The type tag associated with the annotation is then associated with a particular recognition task. For example, in a wind control scenario, if the identification task is to identify a risk type, the type tag may be a risk type tag. If the identification task is to identify whether there is a particular type of risk, then the type tag may be a tag that has a particular type of risk. For example, assuming that the identification task is to identify whether a fraud risk exists, the labeled type tag may be: 0 or 1. Where 0 indicates that there is no risk of fraud and 1 indicates that there is a risk of fraud. Of course other types of identification tasks are possible, such as identifying a risk level for a particular type of risk, etc., and are not intended to be exhaustive.

As shown in FIG. 4, assume that a labeled sample is

And labeling the label

. The input of the first representation extraction model is still the terminal side characteristic of the user

The output is the terminal-side eigenvector of the user

. The linear regression model is used to

Mapping by means of linear regressionBy projecting on labels of a particular type, i.e. by use of

Predictive type tags

. The linear regression model described above may be implemented by using a simple linear classifier. Wherein in the actual prediction process, the linear regression model is except

Besides, other characteristics may be further combined for prediction, which is not limited in this embodiment.

The training goal of the recognition model is to minimize the prediction result

And

the difference between, i.e. as far as possible, making

And

and (5) the consistency is achieved. Specifically, the loss function may be constructed by using the training targets, and the model parameters are updated in a manner such as gradient descent by using the values of the loss function in each iteration until a preset training end condition is satisfied. The training end condition may include, for example, that a value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number threshold, and the like.

Since the first characterization extraction model is a model that has been trained, the initial parameters are the model parameters that have been trained in the manner shown in fig. 2. Therefore, the recognition model trained based on the first representation extraction model can be converged quickly, and only the recognition model is finely adjusted.

In the training process of the recognition model, only the model parameters of the linear regression model can be updated while keeping the model parameters of the first characterization extraction model unchanged. The model parameters of the first characterization extraction model and the linear regression model may also be updated simultaneously. However, the former is preferable because in this way, after the recognition models of multiple recognition tasks are obtained through training, the recognition models of the multiple recognition tasks share the same first representation extraction model, so that the calculation is simplified, and the storage space is saved.

FIG. 5 shows a flowchart of a token extraction method according to one embodiment, which is performed by a terminal device that has been deployed with a first token extraction model pre-established in the manner shown in FIG. 2. As shown in fig. 5, the method may include the steps of:

step 501: and acquiring the terminal side characteristics of the user.

Step 503: and inputting the terminal side characteristics of the user into the first characterization extraction model to obtain the terminal side characterization vector of the user.

For the terminal device which has deployed the first representation extraction model, the terminal side feature of the user can be directly utilized to obtain the terminal side feature vector of the user, the terminal side feature vector is highly similar to the server side feature in space, the terminal device can utilize the terminal side feature vector to carry out hitching multiple tasks, and the task type is determined according to a specific application scene.

If the task is an identification task, an identification model obtained by training of the server side can be deployed in advance in the terminal device, and the identification model is obtained by further fine tuning training of the server side on the basis of a first characterization extraction model which is pre-established in a mode shown in fig. 2. The flow of the type identification method executed on the terminal device may be as shown in fig. 6, including the following steps:

step 601: and acquiring the terminal side characteristics of the user.

Step 603: and inputting the terminal side characteristics of the user into the identification model to obtain the type label predicted by the identification model.

The terminal-side characteristics of the user obtained in step 501 and step 601 may be at least one of a behavior characteristic sequence of the user and a state characteristic sequence of the terminal device, which are collected by the terminal device.

The structure of the recognition model may be as shown in fig. 4, including a first characterization extraction model and a linear regression model. The first representation extraction model extracts a terminal-side representation vector of the user using the terminal-side features of the user. And the linear regression model predicts the type label by using the terminal side eigenvector of the user. After obtaining the type tag, the terminal device may further process the type tag, but preferably, the terminal device may send the predicted type tag to the server, and the server performs a decision for the user or the terminal device by using the type tag.

The identification model can be applied to various application scenes, and fraud risks are identified in a wind control scene as an example. The terminal equipment collects the page feature sequence browsed by the user, and the prediction result is output by the recognition model on the basis of inputting the page feature sequence. The prediction includes whether a risk of fraud exists. Assuming that there is a fraud risk, after the terminal device sends the prediction result to the server, the server may execute a corresponding risk control policy for the user or the terminal device, for example, reject the service request of the terminal device, add the user to a blacklist, and so on.

In addition to the above-described wind control scenario, other scenarios are possible, such as a predicted type tag being a user type tag. After the terminal device sends the prediction result to the server, the server may recommend content for the user by using the user type tag.

It can be seen that in the whole type identification process, the terminal side characteristics of the user do not need to be sent to the server side, and only the predicted tag needs to be sent to the server side, so that the traffic is saved on one hand, and the privacy of the user is also protected on the other hand.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

According to an embodiment of another aspect, an apparatus for building a characterization extraction model is provided. FIG. 7 shows a schematic block diagram of the apparatus for building a characterization extraction model, according to one embodiment. It is understood that the apparatus may be disposed at a server side, or disposed at other computer devices or platforms with strong computing power. As shown in fig. 7, the apparatus 700 includes: the first acquiring unit 701, the first training unit 702, and the model deploying unit 703 may further include a second acquiring unit 704 and a second training unit 705. The main functions of each component unit are as follows:

a first obtaining unit 701 configured to obtain first training data including more than one sample pair, where a sample pair includes a positive sample pair and a negative sample pair, a positive sample pair includes a terminal-side feature and a server-side feature of the same user, and a negative sample pair includes a terminal-side feature and a server-side feature of different users.

The terminal-side feature may include at least one of a behavior feature sequence of the user on the terminal device and a status feature sequence of the terminal device. The server-side characteristics may include at least one of attribute characteristics and behavior statistics characteristics of the user collected by the server-side.

A first training unit 702 configured to train a first token extraction model and a second token extraction model using first training data, wherein the first token extraction model is used to extract a terminal-side token vector of a user using a terminal-side feature of the user, and the second token extraction model is used to extract a server-side token vector of the user using a server-side feature of the user; the training objective is to maximize the similarity between the terminal-side and server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side and server-side eigenvectors of the negative sample pairs.

The first characterization extraction model may include a sequence encoding network such as RNN, LSTM, or GRU. The second characterization extraction model may include MLP, CNN, or ResNet, among others.

The first training unit 702 employs an auto-supervised contrast learning task in performing model training. In this embodiment of the present disclosure, a loss function may be constructed according to the training targets, and the first training unit 702 updates the model parameters of the first representation extraction model and the second representation extraction model in a gradient descent manner, for example, in each iteration, until a preset training end condition is satisfied. The training end condition may include, for example, that a value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number threshold, and the like.

As one of the realizable ways, the loss function constructed above may be the NCE loss function. Such as shown in equation (1).

The model deployment unit 703 is configured to deploy the trained first representation extraction model to the terminal device.

The specific deployment mode may be to push the data of the first representation extraction model to the terminal device by adopting an active push mode, or to push the data of the first representation extraction model to the terminal device in response to a request of the terminal device. The first characterization extraction model, which has been stored locally, is stored locally or updated by the terminal device.

In this case, the second obtaining unit 704 is configured to obtain the second training data including more than one labeled sample, where the labeled sample includes the terminal-side feature of the user and the labeled type label.

A second training unit 705 configured to train a recognition model including the trained first characterization extraction model and the linear regression model using second training data; the first representation extraction model is used for extracting a terminal side feature vector of the user by using a terminal side feature of the user, the linear regression model is used for predicting the type label by using the terminal side feature vector of the user, and the training target is the difference between the minimum prediction result and the type label in the labeled sample.

The second training unit 705 performs fine-tuning training of the recognition model in a supervised manner. As a preferred embodiment, the second training unit 705 may update only the parameters of the linear regression model while keeping the model parameters of the first characterization extraction model unchanged during the training of the recognition model.

The model deployment unit 703 is specifically configured to deploy the recognition model to the terminal device.

As a typical application scenario, the terminal-side features may include a sequence of page features browsed by a user, and the server-side features include attribute features and transaction statistical features of the user. The type tags include a risk type tag, a tag of whether or not there is a particular type of risk, or a risk level for a particular type of risk.

According to an embodiment of another aspect, a characterization extraction apparatus is also provided. FIG. 8 shows a schematic block diagram of a token extraction apparatus according to one embodiment. It is understood that the apparatus may be provided in a terminal device. As shown in fig. 8, the apparatus 800 includes: a third acquisition unit 801 and a token extraction unit 802. The main functions of each component unit are as follows:

a third obtaining unit 801 configured to obtain a terminal-side feature of the user.

A representation extraction unit 802 configured to input the terminal-side features of the user into the first representation extraction model to obtain a terminal-side feature vector of the user; wherein the first characterization extraction model is pre-established by the apparatus shown in fig. 7.

FIG. 9 shows a schematic block diagram of a type identification apparatus according to one embodiment. It is understood that the apparatus may be provided in a terminal device. As shown in fig. 9, the apparatus 800 includes: the fourth acquiring unit 901 and the type identifying unit 902 may further include a label sending unit 903. The main functions of each component unit are as follows:

a fourth obtaining unit 901 configured to obtain a terminal-side feature of the user.

A type identification unit 902 configured to input the terminal-side features of the user into an identification model, which is pre-established using the apparatus shown in fig. 7, to obtain a type label predicted by the identification model.

A tag sending unit 903, configured to send the type tag to the server, so that the server performs a decision for the user or the terminal device by using the type tag.

As a typical application scenario, the terminal-side feature of the user includes a sequence of page features browsed by the user. The type tags include risk type tags or tags whether or not there is a particular type of risk. The server-side implemented decisions include risk control policies.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 2, 5 or 6.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method of fig. 2, 5 or 6.

With the development of time and technology, computer readable storage media are more and more widely used, and the propagation path of computer programs is not limited to tangible media any more, and the computer programs can be directly downloaded from a network and the like. Any combination of one or more computer-readable storage media may be employed. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present specification, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The processors described above may include one or more single-core processors or multi-core processors. The processor may comprise any combination of general purpose processors or dedicated processors (e.g., image processors, application processor baseband processors, etc.).

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of establishing a characterization extraction model, comprising:

training a first representation extraction model and a second representation extraction model by using the first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvector and the server-side eigenvector in the positive sample pair and minimize the similarity between the terminal-side eigenvector and the server-side eigenvector in the negative sample pair;

2. The method of claim 1, wherein the terminal-side feature comprises at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device; and/or

3. The method of claim 1, wherein the first representation extraction model comprises a Recurrent Neural Network (RNN), a long-short term memory network (LSTM), or a gated cyclic unit (GRU); and/or

4. The method of claim 1, wherein deploying the trained first representation extraction model to the terminal device comprises:

and deploying the recognition model to the terminal equipment.

5. The method of claim 4, wherein during training of the recognition model, only parameters of the linear regression model are updated, keeping model parameters of the first characterization extraction model unchanged.

6. The method of claim 4, wherein,

the terminal side features comprise a page feature sequence browsed by a user; and/or

The server side characteristics comprise attribute characteristics and transaction statistical characteristics of the user; and/or

7. The characterization extraction method is executed by a terminal device and comprises the following steps:

acquiring terminal side characteristics of a user;

inputting the terminal side characteristics of the user into a first characteristic extraction model to obtain a terminal side characteristic vector of the user; wherein the first characterization extraction model is pre-established using the method of any one of claims 1 to 3.

8. The type identification method is executed by a terminal device and comprises the following steps:

acquiring terminal side characteristics of a user;

inputting the terminal side characteristics of the user into an identification model to obtain a type label predicted by the identification model, wherein the identification model is established in advance by adopting the method of any one of claims 4 to 6.

9. The method of claim 8, further comprising:

10. The method of claim 9, wherein the terminal-side features of the user comprise a sequence of page features browsed by the user;

the decision comprises a risk control policy.

11. Apparatus for creating a characterization extraction model, comprising:

a first training unit configured to train a first token extraction model and a second token extraction model using the first training data, wherein the first token extraction model is used to extract a terminal-side token vector of a user using a terminal-side feature of the user, and the second token extraction model is used to extract a server-side token vector of the user using a server-side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvector and the server-side eigenvector in the positive sample pair and minimize the similarity between the terminal-side eigenvector and the server-side eigenvector in the negative sample pair;

12. The apparatus of claim 11, further comprising:

13. The characterization extraction device is arranged on the terminal equipment, and the device comprises:

the characterization extraction unit is configured to input the terminal side characteristics of the user into a first characterization extraction model to obtain a terminal side characterization vector of the user; wherein the first characterization extraction model is pre-established by the apparatus of claim 11.

14. The type recognition device is arranged on the terminal equipment and comprises:

a type identification unit configured to input the terminal-side characteristics of the user into an identification model, resulting in a type label predicted by the identification model, wherein the identification model is pre-established by the apparatus of claim 12.

15. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.