CN113988225A - Method and device for establishing representation extraction model, representation extraction and type identification - Google Patents

Method and device for establishing representation extraction model, representation extraction and type identification Download PDF

Info

Publication number
CN113988225A
CN113988225A CN202111597741.9A CN202111597741A CN113988225A CN 113988225 A CN113988225 A CN 113988225A CN 202111597741 A CN202111597741 A CN 202111597741A CN 113988225 A CN113988225 A CN 113988225A
Authority
CN
China
Prior art keywords
user
terminal
model
server
extraction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111597741.9A
Other languages
Chinese (zh)
Other versions
CN113988225B (en
Inventor
吕乐
周璟
刘佳
范东云
傅幸
王宁涛
杨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111597741.9A priority Critical patent/CN113988225B/en
Publication of CN113988225A publication Critical patent/CN113988225A/en
Application granted granted Critical
Publication of CN113988225B publication Critical patent/CN113988225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • G06F18/2451Classification techniques relating to the decision surface linear, e.g. hyperplane
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification provides a method and a device for establishing a representation extraction model, representation extraction and type identification. According to the method of the embodiment, first training data including more than one sample pair is obtained, wherein the sample pair comprises a positive sample pair and a negative sample pair; then training a first representation extraction model and a second representation extraction model by using first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the negative sample pairs; and deploying the first representation extraction model obtained by training in the terminal equipment.

Description

Method and device for establishing representation extraction model, representation extraction and type identification
Technical Field
One or more embodiments of the present specification relate to the technical field of artificial intelligence, and in particular, to methods and apparatuses for establishing a representation extraction model, representation extraction, and type identification.
Background
With the rapid development of smart phones in recent years, the development of terminal devices and edge computing is promoted. This means that many tasks can be performed on the terminal device without being handed over to the cloud for processing. The method has the advantages of reducing cloud load, responding to the user more quickly, protecting the privacy of the user and the like. Although a large amount of feature data is collected and reserved on the terminal device, the principle of artificial intelligence determines that a model capable of extracting the characterization vector needs to be deployed on the terminal device so as to realize a specific task on the terminal device.
Disclosure of Invention
In view of the above, one or more embodiments of the present specification describe a method and apparatus for building a feature extraction model, feature extraction, and type recognition.
According to a first aspect, there is provided a method of building a characterization extraction model, the method comprising:
acquiring first training data containing more than one sample pair, wherein the sample pair comprises a positive sample pair and a negative sample pair, the positive sample pair comprises terminal side characteristics and server side characteristics of the same user, and the negative sample pair comprises terminal side characteristics and server side characteristics of different users;
training a first representation extraction model and a second representation extraction model by using the first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the negative sample pairs;
and deploying the first representation extraction model obtained by training to the terminal equipment.
In one embodiment, the terminal-side feature comprises at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device; and/or
The server-side characteristics include at least one of attribute characteristics and behavior statistical characteristics of the user collected by the server side.
In another embodiment, the first representation extraction model comprises a recurrent neural network RNN, a long short term memory network LSTM, or a gated cyclic unit GRU; and/or
The second characterization extraction model comprises a multilayer perceptron MLP, a convolutional neural network CNN or a residual error network ResNet.
In one embodiment, the deploying the trained first representation extraction model to the terminal device includes:
acquiring second training data comprising more than one labeled sample, wherein the labeled sample comprises terminal side characteristics of a user and a labeled type label;
training a recognition model comprising a first characterization extraction model and a linear regression model obtained by the training by using the second training data; the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, the linear regression model is used for predicting a type label by using the terminal side feature vector of the user, and a training target is the difference between a minimum prediction result and the type label in a labeling sample;
and deploying the recognition model to the terminal equipment.
In another embodiment, during training of the recognition model, only the parameters of the linear regression model are updated, keeping the model parameters of the first characterization extraction model unchanged.
In another embodiment, the terminal-side features comprise a sequence of page features browsed by the user, and the server-side features comprise attribute features and transaction statistical features of the user;
the type tags include a risk type tag, a tag of whether or not there is a particular type of risk, or a risk level for a particular type of risk.
According to a second aspect, there is provided a token extraction method, performed by a terminal device, comprising:
acquiring terminal side characteristics of a user;
inputting the terminal side characteristics of the user into a first characteristic extraction model to obtain a terminal side characteristic vector of the user; wherein the first characterization extraction model is pre-established using the method of any of the above.
According to a third aspect, there is also provided a method of type identification, performed by a terminal device, comprising:
acquiring terminal side characteristics of a user;
inputting the terminal side characteristics of the user into an identification model to obtain a type label predicted by the identification model, wherein the identification model is pre-established by adopting the method.
In one embodiment, further comprising:
and sending the type label to a server so that the server can execute a decision for the user or the terminal equipment by using the type label.
In another embodiment, the terminal-side features of the user comprise a sequence of page features browsed by the user;
the type tags include risk type tags or tags whether or not there is a particular type of risk;
the decision comprises a risk control policy.
According to a fourth aspect, there is provided an apparatus for building a characterization extraction model, comprising:
a first obtaining unit configured to obtain first training data including one or more sample pairs, the sample pairs including a positive sample pair and a negative sample pair, the positive sample pair including a terminal-side feature and a server-side feature of the same user, the negative sample pair including a terminal-side feature and a server-side feature of different users;
a first training unit configured to train a first token extraction model and a second token extraction model using the first training data, wherein the first token extraction model is used to extract a terminal-side token vector of a user using a terminal-side feature of the user, and the second token extraction model is used to extract a server-side token vector of the user using a server-side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side eigenvectors and the server-side eigenvectors of the negative sample pairs;
and the model deployment unit is configured to deploy the trained first representation extraction model to the terminal equipment.
In one embodiment, the terminal-side feature comprises at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device; and/or
The server-side characteristics include at least one of attribute characteristics and behavior statistical characteristics of the user collected by the server side.
In another embodiment, the first representation extraction model comprises a recurrent neural network RNN, a long short term memory network LSTM, or a gated cyclic unit GRU; and/or
The second characterization extraction model comprises a multilayer perceptron MLP, a convolutional neural network CNN or a residual error network ResNet.
In one embodiment, further comprising:
the second acquisition unit is configured to acquire second training data comprising more than one labeled sample, wherein the labeled sample comprises a terminal side characteristic of a user and a labeled type label;
a second training unit configured to train a recognition model including the trained first feature extraction model and a linear regression model using the second training data; the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, the linear regression model is used for predicting a type label by using the terminal side feature vector of the user, and a training target is the difference between a minimum prediction result and the type label in a labeling sample;
the model deployment unit is specifically configured to deploy the recognition model to the terminal device.
In another embodiment, the second training unit is specifically configured to, during the training of the recognition model, keep the model parameters of the first characterization extraction model unchanged, and update only the parameters of the linear regression model.
In one embodiment, the terminal-side features comprise a sequence of page features browsed by a user, and/or the server-side features comprise attribute features and transaction statistical features of the user; and/or
The type tags include a risk type tag, a tag of whether or not there is a particular type of risk, or a risk level for a particular type of risk.
According to a fifth aspect, there is further provided a representation extraction apparatus, disposed in a terminal device, the apparatus including:
a third acquisition unit configured to acquire a terminal-side feature of a user;
the characterization extraction unit is configured to input the terminal side characteristics of the user into a first characterization extraction model to obtain a terminal side characterization vector of the user; wherein the first representation extraction model is pre-established by an apparatus as described in any one of the above.
According to a sixth aspect, there is provided a type identification apparatus provided to a terminal device, the apparatus comprising:
a fourth acquisition unit configured to acquire a terminal-side feature of the user;
a type identification unit configured to input the terminal-side characteristics of the user into an identification model, resulting in a type label predicted by the identification model, wherein the identification model is pre-established by the apparatus as described in any one of the above.
In one embodiment, further comprising:
and the label sending unit is configured to send the type label to a server so that the server can execute a decision for the user or the terminal equipment by using the type label.
In another embodiment, the terminal-side features of the user comprise a sequence of page features browsed by the user;
the type tags include risk type tags or tags whether or not there is a particular type of risk;
the decision comprises a risk control policy.
According to a seventh aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor which, when executing the executable code, implements the method of the first aspect.
In the embodiment of the specification, a representation extraction model is deployed on the terminal device by adopting a self-supervision comparison learning mode. On one hand, the self-supervision comparison learning mode is not limited to labeled labels, and a uniform representation extraction model can be used on different types of tasks, so that consumed computing resources are reduced; on the other hand, the modeling is not limited by the number of marked samples, and the full amount of sample data can be effectively utilized, so that the generalization capability and the representation universality of the model are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 illustrates an exemplary system architecture to which embodiments of the invention may be applied;
FIG. 2 illustrates a flow diagram of a method of building a characterization extraction model, according to one embodiment;
FIG. 3 illustrates a schematic diagram of establishing a characterization extraction model, according to one embodiment;
FIG. 4 illustrates a schematic diagram of building a recognition model according to one embodiment;
FIG. 5 shows a flow diagram of a token extraction method according to one embodiment;
FIG. 6 illustrates a flow diagram of a type identification method according to one embodiment;
FIG. 7 shows a schematic block diagram of the apparatus for building a representation extraction model according to one embodiment;
FIG. 8 shows a schematic block diagram of a token extraction apparatus according to one embodiment;
FIG. 9 shows a schematic block diagram of a type identification apparatus according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Most of the existing characterization extraction models are trained in a supervised manner, that is, the characterization extraction models are trained by using feature data and labels labeled in advance for the feature data. The method needs to be trained respectively for different types of tasks to obtain a plurality of representation extraction models for the different types of tasks, and the method inevitably consumes more computing resources. And in the supervised modeling process, if labeled sample data is few (for example, in the risk identification field, the sample data with risks is few), the generalization capability and the characterization universality of the characterization extraction model are greatly influenced.
The idea of the present specification is to use a self-supervised contrast learning approach to build the characterization extraction model. Specific implementations of the above concepts are described below.
For convenience of understanding, a system architecture to which the technical solution provided in the present specification is applicable will be briefly described first. FIG. 1 illustrates an exemplary system architecture to which embodiments of the invention may be applied.
As shown in fig. 1, the system architecture may include a terminal device 101 and a terminal device 102, a network 103 and a server 104. Network 103 is the medium used to provide communication links between terminal device 101, terminal device 102, and server 104. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may interact with server 104 through network 103 using terminal device 101 and terminal device 102. Various applications, such as a web browser application, a communication-type application, a multimedia application, a game-type application, and the like, may be installed on the terminal device 101 and the terminal device 102.
The terminal device 101 and the terminal device 102 may be, but are not limited to, a smart mobile terminal, a smart home device, a wearable device, a PC (personal computer), and the like. Wherein the smart mobile device may include devices such as a cell phone, a tablet computer, a notebook computer, a PDA (personal digital assistant), an internet automobile, etc. The smart home device may include a smart home device, such as a smart television, a smart refrigerator, and so forth. Wearable devices may include devices such as smart watches, smart glasses, virtual reality devices, augmented reality devices, mixed reality devices (i.e., devices that can support virtual reality and augmented reality), and so forth.
The server 104 may be a single server or a server group including a plurality of servers. The Server 104 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and virtual Private Server (VPs) service. The server 104 may also be a server of a distributed system or a server incorporating a blockchain.
The device for establishing the representation extraction model provided in the present specification may be configured and run in the server 104, and the server 104 deploys the established model in the terminal device 101 or the terminal device 102. It may be implemented as a plurality of software or software modules (for example, for providing distributed services), or as a single software or software module, which is not specifically limited herein.
The representation extraction means and the type identification means provided in this specification may be configured and operated in the terminal device 101 or the terminal device 102, and are configured to extract the terminal-side representation vector and the prediction type tag of the user, respectively.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The method provided in the present specification is described in detail below with reference to examples. FIG. 2 illustrates a flow diagram of a method of building a characterization extraction model, according to one embodiment. It will be appreciated that the method may be performed by a server side, but in addition to this, it is not excluded that it may be performed by other computer devices or platforms having a higher computing power. As shown in fig. 2, the method includes:
step 201, obtaining first training data including more than one sample pair, where the sample pair includes a positive sample pair and a negative sample pair, the positive sample pair includes terminal-side features and server-side features of the same user, and the negative sample pair includes terminal-side features and server-side features of different users, that is, one negative sample pair includes terminal-side features of one user and server-side features of another user.
Step 203, training a first representation extraction model and a second representation extraction model by using the first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of the user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side and server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side and server-side eigenvectors of the negative sample pairs.
And step 205, deploying the trained first representation extraction model to the terminal device.
In the method shown in fig. 2, a self-supervised contrast learning manner is adopted to implement the deployment of the representation extraction model on the terminal device. On one hand, the self-supervision comparison learning mode is not limited to labeled labels, and a uniform representation extraction model can be used on different types of tasks, so that consumed computing resources are reduced; on the other hand, the modeling is not limited by the number of marked samples, and the full amount of sample data can be effectively utilized, so that the generalization capability and the representation universality of the model are improved.
It should be noted that the expressions "first", "second", "third", "fourth", and the like in the embodiments of the present specification are not limited to the size, order, and number, and are merely used for name differentiation. For example, the "first preset time period" and the "second preset time period" are merely used to distinguish the two preset time periods in terms of names. As another example, the "first training data" and the "second training data" are merely used to distinguish the two training data in terms of names.
The manner in which the various steps shown in fig. 2 are performed is described below. The above step 201, i.e. "acquiring first training data comprising more than one sample pair", is first described in detail.
Usually, the terminal device can record the relevant characteristics of the user, and the server side can also collect and record the relevant characteristics of the user, and the two characteristics are different.
The terminal device generally records behavior characteristics of the user on the terminal device, status characteristics of the terminal device, and the like, and generally relates to the operation of the user on the terminal device, and changes with time. That is, the terminal-side feature may include at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device.
The behavior feature sequence of the user on the terminal device is composed of behavior features of N continuous time points in a time window with preset first duration. The state feature sequence of the terminal device is composed of the state features of the terminal device at N consecutive time points in the time window with the preset first duration. N is a positive integer greater than 1.
Since the behavior data of the user on the terminal device is usually embodied as page information browsed in the application program, a page feature sequence browsed by the user at N consecutive time points can be obtained. The Page related to this embodiment may also be referred to as a Web Page, and may be a Web Page (Web Page) written based on HTML (HyperText Markup Language), that is, an HTML Page, or may also be a Web Page written based on HTML and Java languages, that is, a Java Server Page (JSP), or may also be a Web Page written in other languages, which is not particularly limited in this embodiment.
Where the page characteristics may actually be encoded information for the page. Suppose that
Figure 62017DEST_PATH_IMAGE001
Representing a user
Figure 193921DEST_PATH_IMAGE002
Go to
Figure 329367DEST_PATH_IMAGE003
The coded information of the individual pages is,
Figure 50592DEST_PATH_IMAGE004
representing a user
Figure 220673DEST_PATH_IMAGE002
Go to
Figure 398845DEST_PATH_IMAGE003
Time information of individual page, user
Figure 146221DEST_PATH_IMAGE002
The characteristic sequence of the pages browsed at the continuous N time points is represented as follows:
Figure 933786DEST_PATH_IMAGE005
. For example, the encoded information of pages viewed every minute by a user in an hour constitutes a sequence of page features for the user.
The state characteristics of the terminal device may include the power of the terminal device, screen brightness, attitude, and the like. I.e. as a sequence of state signatures of the terminal device at N consecutive points in time.
As one of the realizable ways, the user can be directly connected
Figure 83008DEST_PATH_IMAGE002
Taking the page feature sequence browsed at N continuous time points as the terminal side feature of the user
Figure 432080DEST_PATH_IMAGE006
Then, then
Figure 542119DEST_PATH_IMAGE007
The server side usually records attribute features or behavior statistical features of the user, these features are usually enumerated and numerical data, and in this embodiment, the relevant features of the user recorded by the server side are referred to as server-side features. That is, the server-side characteristics include at least one of attribute characteristics and behavior statistical characteristics of the user collected by the server side.
The attribute characteristics of the user may be, for example, the user's gender, age, occupation, etc. The behavior statistic characteristics of the user can be statistic values such as transaction amount, transaction stroke number and the like accumulated by the user in a time window of a second preset time length. These features constitute the server-side features of the user and still
Figure 493894DEST_PATH_IMAGE002
For example, a user whose server-side features can be represented as vectors
Figure 372989DEST_PATH_IMAGE008
Figure 752017DEST_PATH_IMAGE009
Where d is the vector dimension of the server-side features, generally consistent with the number of server-side feature types employed, and d is a positive integer.
In the embodiments of the present specification, terminal-side features of a plurality of users may be grouped into sets
Figure 850817DEST_PATH_IMAGE010
Feature data of a plurality of users forming a set
Figure 606283DEST_PATH_IMAGE011
. Can be based on
Figure 339884DEST_PATH_IMAGE012
And
Figure 155393DEST_PATH_IMAGE013
some pairs of positive and negative examples are constructed to constitute the first training data.
Wherein the positive sample pairs include terminal-side features and server-side features of the same user. I.e. terminal side features for a given arbitrary user
Figure 974445DEST_PATH_IMAGE014
Get the server-side feature of the same user
Figure 268023DEST_PATH_IMAGE015
Form a positive sample pair (
Figure 856130DEST_PATH_IMAGE014
,
Figure 842541DEST_PATH_IMAGE015
). It is to be noted here that for a positive sample pair
Figure 647423DEST_PATH_IMAGE014
And
Figure 885638DEST_PATH_IMAGE015
the time windows used are corresponding, i.e.
Figure 718465DEST_PATH_IMAGE014
Time window of the first preset duration and
Figure 751143DEST_PATH_IMAGE015
the time window of the second preset duration is adopted correspondingly. Two timesThe windows may be identical, or may have different lengths but the same starting point, or may be partially overlapped.
It is easier to understand that the two time windows are completely identical, e.g. the user is in the process of checking
Figure 341524DEST_PATH_IMAGE016
The characteristic sequence of the web pages browsed within 1 hour from the moment
Figure 342017DEST_PATH_IMAGE014
The user can be obtained from
Figure 29350DEST_PATH_IMAGE016
Statistics such as transaction amount and transaction number within 1 hour from the time point constitute server-side characteristics
Figure 232930DEST_PATH_IMAGE015
However, in some special cases, there may be an association between the terminal-side feature and the server-side feature in time windows of different durations. For example, the user uses the terminal equipment 1 to
Figure 372924DEST_PATH_IMAGE016
A series of web page feature sequences are browsed within 1 hour from the moment
Figure 952941DEST_PATH_IMAGE014
Then using the terminal equipment 2 again
Figure 494781DEST_PATH_IMAGE017
A series of web pages are browsed within 1 hour from the beginning of the moment, and the two stages of browsing may together complete the transaction, so that
Figure 134841DEST_PATH_IMAGE014
Corresponding server-side features
Figure 762131DEST_PATH_IMAGE015
May be thatServer side collection slave
Figure 847636DEST_PATH_IMAGE016
The time starts within the time window of 2 hours.
The negative example pairs include terminal-side features and server-side features of different users. E.g. terminal side features for a certain user
Figure 509562DEST_PATH_IMAGE014
And then randomly adopt the server side characteristic of another user
Figure 54944DEST_PATH_IMAGE018
Form a negative sample pair (
Figure 435110DEST_PATH_IMAGE014
,
Figure 91350DEST_PATH_IMAGE018
). Negative sample centering terminal side feature
Figure 607782DEST_PATH_IMAGE014
And server side features
Figure 324065DEST_PATH_IMAGE018
The time window employed need not be limiting. The two corresponding time windows may be, or may be, partially overlapped, or completely non-overlapped and unrelated.
It can be seen that the above-mentioned first training data obtaining method is not limited by whether to perform type labeling on the data, and is not affected by the small data amount of a certain type of label, but can fully utilize the full data.
The above step 203, i.e., "training the first and second token extraction models using the first training data" will be described in detail below with reference to the embodiments.
FIG. 3 illustrates a schematic diagram of establishing a characterization extraction model, according to one embodiment. As shown in fig. 3, the present embodiment involves two kinds of characterization extraction models: a first characterization extraction model and a second characterization extraction model.
The first representation extraction model is used for extracting a terminal side feature vector of a user, namely the terminal side feature input as the user
Figure 925948DEST_PATH_IMAGE014
The terminal-side eigenvectors output as users
Figure 887344DEST_PATH_IMAGE019
. The second feature extraction model is used for extracting the server-side feature vector of the user, namely the server-side feature input as the user
Figure 258283DEST_PATH_IMAGE015
Or
Figure 145467DEST_PATH_IMAGE018
The output is the server side eigenvector and the positive sample pair of the user
Figure 110012DEST_PATH_IMAGE015
The corresponding server-side token vector is represented as
Figure 498268DEST_PATH_IMAGE020
Centering of negative sample
Figure 599079DEST_PATH_IMAGE018
The corresponding server-side token vector is represented as
Figure 781799DEST_PATH_IMAGE021
The first representation extraction model is a target model, that is, a model to be deployed in the terminal device, and the second representation extraction model is only used in a training process to assist in obtaining the first representation extraction model.
The first feature extraction model may be, for example, RNN (Recurrent Neural Network), LSTM (Long-Short Term Memory)Memory network), GRU (Gated current Unit), etc., where RNN is taken as an example in fig. 3. RNN from
Figure 732175DEST_PATH_IMAGE014
Is extracted from
Figure 658543DEST_PATH_IMAGE019
In the case of the graph of figure 3,
Figure 613861DEST_PATH_IMAGE019
=RNN(
Figure 233061DEST_PATH_IMAGE014
). Wherein, RNN () is a conversion function adopted by RNN in the above extraction process.
The second characterization extraction model may be, for example, MLP (Multilayer Perceptron), CNN (Convolutional Neural Networks), or ResNet (Residual Network). In fig. 3, MLP is taken as an example. MLP from
Figure 906619DEST_PATH_IMAGE015
Is extracted from
Figure 636677DEST_PATH_IMAGE020
From
Figure 712081DEST_PATH_IMAGE018
Is extracted from
Figure 502182DEST_PATH_IMAGE021
. In the context of figure 3 of the drawings,
Figure 898922DEST_PATH_IMAGE020
=MLP(
Figure 432671DEST_PATH_IMAGE015
),
Figure 362581DEST_PATH_IMAGE021
=MLP(
Figure 58005DEST_PATH_IMAGE018
). Wherein, MLP () is the transfer function adopted by MLP in the above extraction process.
The model training process in the step adopts a self-supervision contrast learning task. The training goal of the model is to maximize the similarity between the terminal-side and server-side eigenvectors of the same user and minimize the similarity between the terminal-side and server-side eigenvectors of different users (i.e., the similarity between the terminal-side eigenvector of one user and the server-side eigenvector of another user). That is, the similarity of the two features in the positive sample pair on the mapped vector space is maximized and the similarity of the two features in the negative sample pair on the mapped vector space is minimized.
In this embodiment of the present specification, a loss function may be constructed according to the training target, and model parameters of the first representation extraction model and the second representation extraction model are updated in a manner such as gradient descent by using a value of the loss function in each iteration until a preset training end condition is satisfied. The training end condition may include, for example, that a value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number threshold, and the like.
As one of realizable ways, the loss function constructed above may be a NCE-dependent Estimation (Noise-contrast Estimation) loss function. For example, the NCE loss function L can be expressed as follows:
Figure 706155DEST_PATH_IMAGE022
(1)
wherein,
Figure 918962DEST_PATH_IMAGE023
for the similarity calculation function, a calculation method such as cosine similarity may be employed.
Figure 828012DEST_PATH_IMAGE024
To be at the sample
Figure 333817DEST_PATH_IMAGE025
Figure 328318DEST_PATH_IMAGE015
And
Figure 79236DEST_PATH_IMAGE018
and calculating expected values on the basis.
As can be seen from the NCE loss function, the process of the above-mentioned self-supervision contrast learning is actually to shorten the spatial distance between the terminal-side feature and the server-side feature of the same user as much as possible by updating the model parameters, and to lengthen the spatial distance between the terminal-side feature and the server-side feature of different users as much as possible. That is, the terminal-side features of the user extracted by the finally trained first token extraction model are enabled to be highly similar in space to the server-side features. Since the server side is typically able to have rich feature dimensions to describe some implicit information in user behavior, it is also typically easy to incorporate a large amount of expert experience and complex logical data. Therefore, the server-side characteristics effectively guide the terminal-side characteristics, and the terminal-side characteristics can contain more comprehensive and universal characterization information and have stronger generalization capability.
The above step 205, that is, "deploy the trained first representation extraction model to the terminal device" is described in detail below with reference to the embodiments.
If the process of establishing the model is executed by the server, the server may send the first representation extraction model obtained by training in step 203 to the terminal device. If the process of establishing the model is executed by other computer equipment or platforms with strong computing power, the computer equipment or platforms can provide the trained first representation extraction model for the server side, and the server side sends the first representation extraction model to the terminal equipment. The computer device or the platform can also directly send the trained first representation extraction model to the terminal device. The server side sends the data to the terminal device for example.
Since the modeling process may be performed in a one-time manner, it may be updated periodically or in response to a triggering event. For example, the first representation extraction model is established every month or year in the above manner, and the established representation extraction model is pushed to the terminal device.
The server side can adopt an active push mode. For example, after the server side obtains the first representation extraction model through training, the server side actively pushes data of the first representation extraction model to the terminal device, and the terminal device stores the data locally or updates the locally stored first representation extraction model.
The server side can also push in response to the request of the terminal equipment. For example, after the server side obtains the first representation extraction model through training, a notification message of model acquisition or model update is sent to the terminal device. The terminal device may determine whether to send a model acquisition request to the server side based on a response of the user to the notification message. And if the user triggers the module for acquiring the model based on the notification message, the terminal equipment sends a model acquisition request to the server side, and the server side responds to the request and sends the established data of the first representation extraction model to the terminal equipment. The first characterization extraction model, which has been stored locally, is stored locally or updated by the terminal device.
In the practical application process, the first representation extraction model is deployed in the terminal device, and the terminal-side representation vector of the user is often extracted, and the terminal-side representation vector of the user is often used for performing downstream tasks. This downstream task often requires identification based on type tags. Therefore, in this step, the server may further perform fine tuning training by hooking the linear regression model on the trained first characterization extraction model, so as to obtain the recognition model. And deploying the identification model in the terminal equipment, namely sending the data of the identification model to the terminal equipment.
FIG. 4 illustrates a schematic diagram of building a recognition model according to one embodiment. As shown in fig. 4, the recognition model mainly includes a first characterization extraction model that has been trained previously and a linear regression model.
The training process of the recognition model mainly adopts a supervised learning mode, so that when the recognition model is established, a second training sample comprising more than one labeled sample is firstly obtained. The annotation sample comprises the terminal-side characteristics of the user and the type label of the annotation.
The acquisition of the terminal-side features of the user is similar to the related description in step 201 in the above embodiment, and is not described herein again. The type tag associated with the annotation is then associated with a particular recognition task. For example, in a wind control scenario, if the identification task is to identify a risk type, the type tag may be a risk type tag. If the identification task is to identify whether there is a particular type of risk, then the type tag may be a tag that has a particular type of risk. For example, assuming that the identification task is to identify whether a fraud risk exists, the labeled type tag may be: 0 or 1. Where 0 indicates that there is no risk of fraud and 1 indicates that there is a risk of fraud. Of course other types of identification tasks are possible, such as identifying a risk level for a particular type of risk, etc., and are not intended to be exhaustive.
As shown in FIG. 4, assume that a labeled sample is
Figure 108372DEST_PATH_IMAGE025
And labeling the label
Figure 20965DEST_PATH_IMAGE026
. The input of the first representation extraction model is still the terminal side characteristic of the user
Figure 768341DEST_PATH_IMAGE025
The output is the terminal-side eigenvector of the user
Figure 57371DEST_PATH_IMAGE027
. The linear regression model is used to
Figure 941013DEST_PATH_IMAGE027
Mapping by means of linear regressionBy projecting on labels of a particular type, i.e. by use of
Figure 791551DEST_PATH_IMAGE027
Predictive type tags
Figure 760644DEST_PATH_IMAGE028
. The linear regression model described above may be implemented by using a simple linear classifier. Wherein in the actual prediction process, the linear regression model is except
Figure 853365DEST_PATH_IMAGE027
Besides, other characteristics may be further combined for prediction, which is not limited in this embodiment.
The training goal of the recognition model is to minimize the prediction result
Figure 591514DEST_PATH_IMAGE028
And
Figure 111488DEST_PATH_IMAGE026
the difference between, i.e. as far as possible, making
Figure 567877DEST_PATH_IMAGE028
And
Figure 198709DEST_PATH_IMAGE026
and (5) the consistency is achieved. Specifically, the loss function may be constructed by using the training targets, and the model parameters are updated in a manner such as gradient descent by using the values of the loss function in each iteration until a preset training end condition is satisfied. The training end condition may include, for example, that a value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number threshold, and the like.
Since the first characterization extraction model is a model that has been trained, the initial parameters are the model parameters that have been trained in the manner shown in fig. 2. Therefore, the recognition model trained based on the first representation extraction model can be converged quickly, and only the recognition model is finely adjusted.
In the training process of the recognition model, only the model parameters of the linear regression model can be updated while keeping the model parameters of the first characterization extraction model unchanged. The model parameters of the first characterization extraction model and the linear regression model may also be updated simultaneously. However, the former is preferable because in this way, after the recognition models of multiple recognition tasks are obtained through training, the recognition models of the multiple recognition tasks share the same first representation extraction model, so that the calculation is simplified, and the storage space is saved.
FIG. 5 shows a flowchart of a token extraction method according to one embodiment, which is performed by a terminal device that has been deployed with a first token extraction model pre-established in the manner shown in FIG. 2. As shown in fig. 5, the method may include the steps of:
step 501: and acquiring the terminal side characteristics of the user.
Step 503: and inputting the terminal side characteristics of the user into the first characterization extraction model to obtain the terminal side characterization vector of the user.
For the terminal device which has deployed the first representation extraction model, the terminal side feature of the user can be directly utilized to obtain the terminal side feature vector of the user, the terminal side feature vector is highly similar to the server side feature in space, the terminal device can utilize the terminal side feature vector to carry out hitching multiple tasks, and the task type is determined according to a specific application scene.
If the task is an identification task, an identification model obtained by training of the server side can be deployed in advance in the terminal device, and the identification model is obtained by further fine tuning training of the server side on the basis of a first characterization extraction model which is pre-established in a mode shown in fig. 2. The flow of the type identification method executed on the terminal device may be as shown in fig. 6, including the following steps:
step 601: and acquiring the terminal side characteristics of the user.
Step 603: and inputting the terminal side characteristics of the user into the identification model to obtain the type label predicted by the identification model.
The terminal-side characteristics of the user obtained in step 501 and step 601 may be at least one of a behavior characteristic sequence of the user and a state characteristic sequence of the terminal device, which are collected by the terminal device.
The structure of the recognition model may be as shown in fig. 4, including a first characterization extraction model and a linear regression model. The first representation extraction model extracts a terminal-side representation vector of the user using the terminal-side features of the user. And the linear regression model predicts the type label by using the terminal side eigenvector of the user. After obtaining the type tag, the terminal device may further process the type tag, but preferably, the terminal device may send the predicted type tag to the server, and the server performs a decision for the user or the terminal device by using the type tag.
The identification model can be applied to various application scenes, and fraud risks are identified in a wind control scene as an example. The terminal equipment collects the page feature sequence browsed by the user, and the prediction result is output by the recognition model on the basis of inputting the page feature sequence. The prediction includes whether a risk of fraud exists. Assuming that there is a fraud risk, after the terminal device sends the prediction result to the server, the server may execute a corresponding risk control policy for the user or the terminal device, for example, reject the service request of the terminal device, add the user to a blacklist, and so on.
In addition to the above-described wind control scenario, other scenarios are possible, such as a predicted type tag being a user type tag. After the terminal device sends the prediction result to the server, the server may recommend content for the user by using the user type tag.
It can be seen that in the whole type identification process, the terminal side characteristics of the user do not need to be sent to the server side, and only the predicted tag needs to be sent to the server side, so that the traffic is saved on one hand, and the privacy of the user is also protected on the other hand.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
According to an embodiment of another aspect, an apparatus for building a characterization extraction model is provided. FIG. 7 shows a schematic block diagram of the apparatus for building a characterization extraction model, according to one embodiment. It is understood that the apparatus may be disposed at a server side, or disposed at other computer devices or platforms with strong computing power. As shown in fig. 7, the apparatus 700 includes: the first acquiring unit 701, the first training unit 702, and the model deploying unit 703 may further include a second acquiring unit 704 and a second training unit 705. The main functions of each component unit are as follows:
a first obtaining unit 701 configured to obtain first training data including more than one sample pair, where a sample pair includes a positive sample pair and a negative sample pair, a positive sample pair includes a terminal-side feature and a server-side feature of the same user, and a negative sample pair includes a terminal-side feature and a server-side feature of different users.
The terminal-side feature may include at least one of a behavior feature sequence of the user on the terminal device and a status feature sequence of the terminal device. The server-side characteristics may include at least one of attribute characteristics and behavior statistics characteristics of the user collected by the server-side.
A first training unit 702 configured to train a first token extraction model and a second token extraction model using first training data, wherein the first token extraction model is used to extract a terminal-side token vector of a user using a terminal-side feature of the user, and the second token extraction model is used to extract a server-side token vector of the user using a server-side feature of the user; the training objective is to maximize the similarity between the terminal-side and server-side eigenvectors of the positive sample pairs and minimize the similarity between the terminal-side and server-side eigenvectors of the negative sample pairs.
The first characterization extraction model may include a sequence encoding network such as RNN, LSTM, or GRU. The second characterization extraction model may include MLP, CNN, or ResNet, among others.
The first training unit 702 employs an auto-supervised contrast learning task in performing model training. In this embodiment of the present disclosure, a loss function may be constructed according to the training targets, and the first training unit 702 updates the model parameters of the first representation extraction model and the second representation extraction model in a gradient descent manner, for example, in each iteration, until a preset training end condition is satisfied. The training end condition may include, for example, that a value of the loss function is less than or equal to a preset loss function threshold, the number of iterations reaches a preset number threshold, and the like.
As one of the realizable ways, the loss function constructed above may be the NCE loss function. Such as shown in equation (1).
The model deployment unit 703 is configured to deploy the trained first representation extraction model to the terminal device.
The specific deployment mode may be to push the data of the first representation extraction model to the terminal device by adopting an active push mode, or to push the data of the first representation extraction model to the terminal device in response to a request of the terminal device. The first characterization extraction model, which has been stored locally, is stored locally or updated by the terminal device.
In the practical application process, the first representation extraction model is deployed in the terminal device, and the terminal-side representation vector of the user is often extracted, and the terminal-side representation vector of the user is often used for performing downstream tasks. This downstream task often requires identification based on type tags. Therefore, in this step, the server may further perform fine tuning training by hooking the linear regression model on the trained first characterization extraction model, so as to obtain the recognition model. And deploying the identification model in the terminal equipment, namely sending the data of the identification model to the terminal equipment.
In this case, the second obtaining unit 704 is configured to obtain the second training data including more than one labeled sample, where the labeled sample includes the terminal-side feature of the user and the labeled type label.
A second training unit 705 configured to train a recognition model including the trained first characterization extraction model and the linear regression model using second training data; the first representation extraction model is used for extracting a terminal side feature vector of the user by using a terminal side feature of the user, the linear regression model is used for predicting the type label by using the terminal side feature vector of the user, and the training target is the difference between the minimum prediction result and the type label in the labeled sample.
The second training unit 705 performs fine-tuning training of the recognition model in a supervised manner. As a preferred embodiment, the second training unit 705 may update only the parameters of the linear regression model while keeping the model parameters of the first characterization extraction model unchanged during the training of the recognition model.
The model deployment unit 703 is specifically configured to deploy the recognition model to the terminal device.
As a typical application scenario, the terminal-side features may include a sequence of page features browsed by a user, and the server-side features include attribute features and transaction statistical features of the user. The type tags include a risk type tag, a tag of whether or not there is a particular type of risk, or a risk level for a particular type of risk.
According to an embodiment of another aspect, a characterization extraction apparatus is also provided. FIG. 8 shows a schematic block diagram of a token extraction apparatus according to one embodiment. It is understood that the apparatus may be provided in a terminal device. As shown in fig. 8, the apparatus 800 includes: a third acquisition unit 801 and a token extraction unit 802. The main functions of each component unit are as follows:
a third obtaining unit 801 configured to obtain a terminal-side feature of the user.
A representation extraction unit 802 configured to input the terminal-side features of the user into the first representation extraction model to obtain a terminal-side feature vector of the user; wherein the first characterization extraction model is pre-established by the apparatus shown in fig. 7.
FIG. 9 shows a schematic block diagram of a type identification apparatus according to one embodiment. It is understood that the apparatus may be provided in a terminal device. As shown in fig. 9, the apparatus 800 includes: the fourth acquiring unit 901 and the type identifying unit 902 may further include a label sending unit 903. The main functions of each component unit are as follows:
a fourth obtaining unit 901 configured to obtain a terminal-side feature of the user.
A type identification unit 902 configured to input the terminal-side features of the user into an identification model, which is pre-established using the apparatus shown in fig. 7, to obtain a type label predicted by the identification model.
A tag sending unit 903, configured to send the type tag to the server, so that the server performs a decision for the user or the terminal device by using the type tag.
As a typical application scenario, the terminal-side feature of the user includes a sequence of page features browsed by the user. The type tags include risk type tags or tags whether or not there is a particular type of risk. The server-side implemented decisions include risk control policies.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 2, 5 or 6.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method of fig. 2, 5 or 6.
With the development of time and technology, computer readable storage media are more and more widely used, and the propagation path of computer programs is not limited to tangible media any more, and the computer programs can be directly downloaded from a network and the like. Any combination of one or more computer-readable storage media may be employed. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present specification, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The processors described above may include one or more single-core processors or multi-core processors. The processor may comprise any combination of general purpose processors or dedicated processors (e.g., image processors, application processor baseband processors, etc.).
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (15)

1. A method of establishing a characterization extraction model, comprising:
acquiring first training data containing more than one sample pair, wherein the sample pair comprises a positive sample pair and a negative sample pair, the positive sample pair comprises terminal side characteristics and server side characteristics of the same user, and the negative sample pair comprises terminal side characteristics and server side characteristics of different users;
training a first representation extraction model and a second representation extraction model by using the first training data, wherein the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, and the second representation extraction model is used for extracting a server side feature vector of the user by using a server side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvector and the server-side eigenvector in the positive sample pair and minimize the similarity between the terminal-side eigenvector and the server-side eigenvector in the negative sample pair;
and deploying the first representation extraction model obtained by training to the terminal equipment.
2. The method of claim 1, wherein the terminal-side feature comprises at least one of a sequence of behavior features of the user on the terminal device and a sequence of status features of the terminal device; and/or
The server-side characteristics include at least one of attribute characteristics and behavior statistical characteristics of the user collected by the server side.
3. The method of claim 1, wherein the first representation extraction model comprises a Recurrent Neural Network (RNN), a long-short term memory network (LSTM), or a gated cyclic unit (GRU); and/or
The second characterization extraction model comprises a multilayer perceptron MLP, a convolutional neural network CNN or a residual error network ResNet.
4. The method of claim 1, wherein deploying the trained first representation extraction model to the terminal device comprises:
acquiring second training data comprising more than one labeled sample, wherein the labeled sample comprises terminal side characteristics of a user and a labeled type label;
training a recognition model comprising a first characterization extraction model and a linear regression model obtained by the training by using the second training data; the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, the linear regression model is used for predicting a type label by using the terminal side feature vector of the user, and a training target is the difference between a minimum prediction result and the type label in a labeling sample;
and deploying the recognition model to the terminal equipment.
5. The method of claim 4, wherein during training of the recognition model, only parameters of the linear regression model are updated, keeping model parameters of the first characterization extraction model unchanged.
6. The method of claim 4, wherein,
the terminal side features comprise a page feature sequence browsed by a user; and/or
The server side characteristics comprise attribute characteristics and transaction statistical characteristics of the user; and/or
The type tags include a risk type tag, a tag of whether or not there is a particular type of risk, or a risk level for a particular type of risk.
7. The characterization extraction method is executed by a terminal device and comprises the following steps:
acquiring terminal side characteristics of a user;
inputting the terminal side characteristics of the user into a first characteristic extraction model to obtain a terminal side characteristic vector of the user; wherein the first characterization extraction model is pre-established using the method of any one of claims 1 to 3.
8. The type identification method is executed by a terminal device and comprises the following steps:
acquiring terminal side characteristics of a user;
inputting the terminal side characteristics of the user into an identification model to obtain a type label predicted by the identification model, wherein the identification model is established in advance by adopting the method of any one of claims 4 to 6.
9. The method of claim 8, further comprising:
and sending the type label to a server so that the server can execute a decision for the user or the terminal equipment by using the type label.
10. The method of claim 9, wherein the terminal-side features of the user comprise a sequence of page features browsed by the user;
the type tags include risk type tags or tags whether or not there is a particular type of risk;
the decision comprises a risk control policy.
11. Apparatus for creating a characterization extraction model, comprising:
a first obtaining unit configured to obtain first training data including one or more sample pairs, the sample pairs including a positive sample pair and a negative sample pair, the positive sample pair including a terminal-side feature and a server-side feature of the same user, the negative sample pair including a terminal-side feature and a server-side feature of different users;
a first training unit configured to train a first token extraction model and a second token extraction model using the first training data, wherein the first token extraction model is used to extract a terminal-side token vector of a user using a terminal-side feature of the user, and the second token extraction model is used to extract a server-side token vector of the user using a server-side feature of the user; the training objective is to maximize the similarity between the terminal-side eigenvector and the server-side eigenvector in the positive sample pair and minimize the similarity between the terminal-side eigenvector and the server-side eigenvector in the negative sample pair;
and the model deployment unit is configured to deploy the trained first representation extraction model to the terminal equipment.
12. The apparatus of claim 11, further comprising:
the second acquisition unit is configured to acquire second training data comprising more than one labeled sample, wherein the labeled sample comprises a terminal side characteristic of a user and a labeled type label;
a second training unit configured to train a recognition model including the trained first feature extraction model and a linear regression model using the second training data; the first representation extraction model is used for extracting a terminal side feature vector of a user by using a terminal side feature of the user, the linear regression model is used for predicting a type label by using the terminal side feature vector of the user, and a training target is the difference between a minimum prediction result and the type label in a labeling sample;
the model deployment unit is specifically configured to deploy the recognition model to the terminal device.
13. The characterization extraction device is arranged on the terminal equipment, and the device comprises:
a third acquisition unit configured to acquire a terminal-side feature of a user;
the characterization extraction unit is configured to input the terminal side characteristics of the user into a first characterization extraction model to obtain a terminal side characterization vector of the user; wherein the first characterization extraction model is pre-established by the apparatus of claim 11.
14. The type recognition device is arranged on the terminal equipment and comprises:
a fourth acquisition unit configured to acquire a terminal-side feature of the user;
a type identification unit configured to input the terminal-side characteristics of the user into an identification model, resulting in a type label predicted by the identification model, wherein the identification model is pre-established by the apparatus of claim 12.
15. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-10.
CN202111597741.9A 2021-12-24 2021-12-24 Method and device for establishing representation extraction model, representation extraction and type identification Active CN113988225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111597741.9A CN113988225B (en) 2021-12-24 2021-12-24 Method and device for establishing representation extraction model, representation extraction and type identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111597741.9A CN113988225B (en) 2021-12-24 2021-12-24 Method and device for establishing representation extraction model, representation extraction and type identification

Publications (2)

Publication Number Publication Date
CN113988225A true CN113988225A (en) 2022-01-28
CN113988225B CN113988225B (en) 2022-05-06

Family

ID=79734276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111597741.9A Active CN113988225B (en) 2021-12-24 2021-12-24 Method and device for establishing representation extraction model, representation extraction and type identification

Country Status (1)

Country Link
CN (1) CN113988225B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170304A (en) * 2022-06-22 2022-10-11 支付宝(杭州)信息技术有限公司 Method and device for extracting risk feature description
CN115545720A (en) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 Model training method, business wind control method and business wind control device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553488A (en) * 2020-07-10 2020-08-18 支付宝(杭州)信息技术有限公司 Risk recognition model training method and system for user behaviors
CN112215238A (en) * 2020-10-29 2021-01-12 支付宝(杭州)信息技术有限公司 Method, system and device for constructing general feature extraction model
CN113269232A (en) * 2021-04-25 2021-08-17 北京沃东天骏信息技术有限公司 Model training method, vectorization recall method, related device and storage medium
CN113344131A (en) * 2021-06-30 2021-09-03 商汤国际私人有限公司 Network training method and device, electronic equipment and storage medium
CN113505896A (en) * 2021-07-28 2021-10-15 深圳前海微众银行股份有限公司 Longitudinal federated learning modeling optimization method, apparatus, medium, and program product
CN113516255A (en) * 2021-07-28 2021-10-19 深圳前海微众银行股份有限公司 Federal learning modeling optimization method, apparatus, readable storage medium, and program product
US20210327029A1 (en) * 2020-04-13 2021-10-21 Google Llc Systems and Methods for Contrastive Learning of Visual Representations

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210327029A1 (en) * 2020-04-13 2021-10-21 Google Llc Systems and Methods for Contrastive Learning of Visual Representations
CN111553488A (en) * 2020-07-10 2020-08-18 支付宝(杭州)信息技术有限公司 Risk recognition model training method and system for user behaviors
CN112215238A (en) * 2020-10-29 2021-01-12 支付宝(杭州)信息技术有限公司 Method, system and device for constructing general feature extraction model
CN113269232A (en) * 2021-04-25 2021-08-17 北京沃东天骏信息技术有限公司 Model training method, vectorization recall method, related device and storage medium
CN113344131A (en) * 2021-06-30 2021-09-03 商汤国际私人有限公司 Network training method and device, electronic equipment and storage medium
CN113505896A (en) * 2021-07-28 2021-10-15 深圳前海微众银行股份有限公司 Longitudinal federated learning modeling optimization method, apparatus, medium, and program product
CN113516255A (en) * 2021-07-28 2021-10-19 深圳前海微众银行股份有限公司 Federal learning modeling optimization method, apparatus, readable storage medium, and program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WU YW ET AL: "《Federated Contrastive Learning for Dermatological Disease Diagnosis via On-device Learning》", 《IEEE》 *
姜其胜: "《基于表征学习的面部表情识别研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170304A (en) * 2022-06-22 2022-10-11 支付宝(杭州)信息技术有限公司 Method and device for extracting risk feature description
CN115545720A (en) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 Model training method, business wind control method and business wind control device
CN115545720B (en) * 2022-11-29 2023-03-10 支付宝(杭州)信息技术有限公司 Model training method, business wind control method and business wind control device

Also Published As

Publication number Publication date
CN113988225B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN109104620B (en) Short video recommendation method and device and readable medium
CN113988225B (en) Method and device for establishing representation extraction model, representation extraction and type identification
CN108280115B (en) Method and device for identifying user relationship
CN109471978B (en) Electronic resource recommendation method and device
CN111405030B (en) Message pushing method and device, electronic equipment and storage medium
CN111368551B (en) Method and device for determining event main body
CN113158554B (en) Model optimization method and device, computer equipment and storage medium
CN113051911B (en) Method, apparatus, device, medium and program product for extracting sensitive words
CN115114439A (en) Method and device for multi-task model reasoning and multi-task information processing
CN113919361B (en) Text classification method and device
CN110008926B (en) Method and device for identifying age
CN110097004B (en) Facial expression recognition method and device
US20240161172A1 (en) Information pushing method and apparatus
CN115062709A (en) Model optimization method, device, equipment, storage medium and program product
CN113362852A (en) User attribute identification method and device
CN113140012B (en) Image processing method, device, medium and electronic equipment
CN113342170A (en) Gesture control method, device, terminal and storage medium
CN117095460A (en) Self-supervision group behavior recognition method and system based on long-short time relation predictive coding
CN116578925A (en) Behavior prediction method, device and storage medium based on feature images
CN116956117A (en) Method, device, equipment, storage medium and program product for identifying label
CN116071590A (en) Model training method, system, computer device and storage medium
CN115576789A (en) Method and system for identifying lost user
CN115393100A (en) Resource recommendation method and device
CN112541548B (en) Method, device, computer equipment and storage medium for generating relational network
CN114443904A (en) Video query method, video query device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant