CN112559872A - Method, system, computer device and storage medium for identifying user between devices - Google Patents

Method, system, computer device and storage medium for identifying user between devices Download PDF

Info

Publication number
CN112559872A
CN112559872A CN202011517605.XA CN202011517605A CN112559872A CN 112559872 A CN112559872 A CN 112559872A CN 202011517605 A CN202011517605 A CN 202011517605A CN 112559872 A CN112559872 A CN 112559872A
Authority
CN
China
Prior art keywords
similarity
candidate
privacy
calculating
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011517605.XA
Other languages
Chinese (zh)
Inventor
付金伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202011517605.XA priority Critical patent/CN112559872A/en
Publication of CN112559872A publication Critical patent/CN112559872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a system, computer equipment and a storage medium for identifying users among equipment, wherein the method comprises the following steps: constructing a training set and a testing set: constructing a training set by using a matching pair of the device id and the cookie of a known user, and constructing a test set by using a log record set; and (3) calculating the ip privacy: obtaining ip private density according to the training set and the test set; and obtaining candidate pair similarity: training an XGboost model by using the training set, and obtaining candidate pairs and the similarity of the candidate pairs through the trained XGboost model according to the similarity vector; a node clustering step: and creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, wherein the nodes in one class belong to the same user. The invention does not depend on the login account, thereby reducing the limitation to the application range of the cross-screen technology.

Description

Method, system, computer device and storage medium for identifying user between devices
Technical Field
The invention belongs to the field of user identification among devices, and particularly relates to a method and a system for identifying users among devices, computer equipment and a storage medium.
Background
The cross-screen marketing refers to a behavior of putting targeted and accurate personalized advertisement information to a target audience of an advertiser by integrating various channel terminals such as a mobile phone, a tablet, a computer and a television, and achieving the purpose of brand marketing through information interaction with consumers. A difficulty with cross-screen marketing is accurate inter-device user identification. How to accurately identify different devices (such as mobile phones and computers) of the same user is a key point of cross-screen marketing. Two realistic feasibility approaches are currently emerging on the market: precise matching and pre-estimated matching. However, both existing methods have varying degrees of incompatibility.
The current prior art is:
1. binding unified account numbers:
the strong account system splices the behaviors of the same user on multiple devices (pc and mobile terminal) through the common id of multiple channels, and mainly depends on the cross-device login account of the user or the support of certain id identification provided by a large-scale internet platform. The accurate matching can achieve very high communicating precision, and the platform account of the user cannot be changed frequently.
2. By means of specific rules
And the user identification among the devices can be realized by depending on user data such as wifi addresses, ip addresses and the like, and whether the devices under the same ip address are the same user or not is judged through the wifi addresses or the ip addresses.
3. Interpretable machine learning algorithm
The data model is used to make inferences, i.e., to computationally identify the likelihood that multiple devices in different channels are actually the same user. The behaviors of the user under different platforms have certain identification characteristics, and the characteristics comprise technical parameters, behavior tags and encrypted behavior data from a third party. The method has the advantages that the method does not depend on the login account, the limitation on the application range of the cross-screen technology is reduced, almost all users in the internet ecology are included to the maximum extent, and the scale of potential customers is enlarged.
4. Deep learning algorithm
And constructing a deep learning model based on the neural network to predict whether the cookie and the device belong to the same person or not.
Disclosure of Invention
The embodiment of the application provides a method, a system and a computer storage device for identifying users among devices, which are used for at least solving the problem of subjective factor influence in the related technology.
The invention provides a method for identifying users among devices, which comprises the following steps:
constructing a training set and a testing set: constructing a training set by using a matching pair of the device id and the cookie of a known user, and constructing a test set by using a log record set;
and (3) calculating the ip privacy: obtaining the ip private density by an iterative method based on semi-supervised learning according to the training set and the test set;
generating a candidate set: obtaining a candidate set according to the training set, the test set and the ip privacy;
a similarity vector obtaining step: calculating attribute similarity of the candidate set, and forming a multi-dimensional similarity vector;
and obtaining candidate pair similarity: training an XGboost model by using the training set, and obtaining candidate pairs and the similarity of the candidate pairs through the trained XGboost model according to the similarity vector;
a node clustering step: and creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, wherein the nodes in one class belong to the same user.
The inter-device user identification, wherein the step of calculating the ip privacy comprises:
and (3) a step of initiating a communication result set: initializing the data of the training set as a communication result set;
constructing an ip inverted index: constructing an inverted index of the ip in the prediction set;
and (3) calculating the ip privacy: according to the formula pri (IP)i)=sum_max(IPi,m)/∑ciCalculating IP privacy density, wherein IPiFor inverted index, sum _ max (IP)iM) is IPiCorresponding first m largest
Figure BDA0002847931310000021
The sum of (1);
and (3) calculating the similarity of the ip set: integrating user, equipment and Cookie information and then calculating the similarity of the ip set;
and updating the ip privacy: and updating the ip privacy according to the ip set similarity.
The inter-device user identification, wherein the step of generating a candidate set includes:
and id and cookie information integration step: integrating the user information in the training set and the device id and cookie information in the testing set to obtain integrated information;
a candidate set generation step: and after judging the integration information, constructing the candidate set.
The inter-device user identification, wherein the similarity vector obtaining step includes: and calculating the attribute similarity of the candidate set according to the equipment characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, and forming a multi-dimensional similarity vector.
The invention also provides a system for identifying users among devices, which comprises:
a training set and test set constructing module, which constructs a training set by using the matching pair of the device id and the cookie of the known user and constructs a test set by using the log record set;
the ip privacy calculation module obtains the ip privacy density through an iterative method based on semi-supervised learning according to the training set and the test set;
a candidate set generating module, which obtains a candidate set according to the training set, the test set and the ip privacy;
a similarity vector obtaining module, which calculates the attribute similarity of the candidate set and forms a multi-dimensional similarity vector;
the candidate pair similarity obtaining module is used for training an XGboost model by using the training set and obtaining a candidate pair and the similarity of the candidate pair through the trained XGboost model according to the similarity vector;
and the node clustering module is used for creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, and enabling the nodes in one class to belong to the same user.
In the above system for identifying a user between devices, the module for calculating ip privacy includes:
a get-through result set initialization unit, wherein the get-through result set initialization unit takes the data of the training set as the initialization of the get-through result set;
constructing an ip inverted index unit, wherein the ip inverted index unit constructs an ip inverted index in a prediction set;
the unit for calculating the IP privacy degree is used for calculating the IP privacy degree according to a formula pri (IP)i)=sum_max(IPi,m)/∑ciCalculating IP privacy density, wherein IPiFor inverted index, sum _ max (IP)iM) is IPiCorresponding first m largest
Figure BDA0002847931310000041
The sum of (1);
the unit for calculating the similarity of the ip set integrates the information of the user, the equipment and the Cookie and then calculates the similarity of the ip set;
and the ip privacy updating unit updates the ip privacy according to the ip set similarity.
The inter-device user identification system, wherein the candidate set generating module includes:
the id and cookie information integration unit integrates the user information in the training set and the equipment id and cookie information in the test set to obtain integrated information;
and the candidate set generating unit is used for judging the integration information and then constructing the candidate set.
In the inter-device user identification system, the similarity vector obtaining module calculates the attribute similarity of the candidate set according to the device characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, and forms a multi-dimensional similarity vector.
The invention also includes a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the inter-device user identification method as described in any of the above when executing the computer program.
The invention also comprises a storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements an inter-device user identification method as defined in any one of the above.
The invention has the beneficial effects that:
the method is based on ip and does not depend on login accounts, the limitation on the application range of the cross-screen technology is reduced, almost all users in the internet ecology are included to the maximum extent, and the scale of potential customers is enlarged.
According to the method, the communicated candidate set is generated based on the ip privacy, so that the influence of unstable ip on accuracy is avoided, and the efficiency of cross-screen identification is improved.
The method uses the device characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, so that the recognition result is more explanatory.
The method not only considers the user identification among different types of equipment, but also considers the user identification among the same type of equipment, such as the user identification of cookie and the user identification of device and device, and solves the limitation that the recall rate of the prior method is lower.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application.
In the drawings:
FIG. 1 is a flow chart of a method of user identification between devices;
FIG. 2 is a flow chart illustrating the substeps of step S2 in FIG. 1;
FIG. 3 is a flow chart illustrating the substeps of step S3 in FIG. 1;
FIG. 4 is a schematic diagram of the structure of the inter-device user identification system of the present invention;
fig. 5 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.
Before describing in detail the various embodiments of the present invention, the core inventive concepts of the present invention are summarized and described in detail by the following several embodiments.
Referring to fig. 1, fig. 1 is a flowchart of a method for identifying a user between devices. As shown in fig. 1, the method for identifying a user between devices of the present invention includes:
building a training set and a test set step S1: constructing a training set by using a matching pair of the device id and the cookie of a known user, and constructing a test set by using a log record set;
step S2 of calculating the ip privacy level: obtaining the ip private density by an iterative method based on semi-supervised learning according to the training set and the test set;
generating candidate set step S3: obtaining a candidate set according to the training set, the test set and the ip privacy;
similarity vector obtaining step S4: calculating attribute similarity of the candidate set, and forming a multi-dimensional similarity vector;
obtaining candidate pair similarity step S5: training an XGboost model by using the training set, and obtaining candidate pairs and the similarity of the candidate pairs through the trained XGboost model according to the similarity vector;
a node clustering step S6: and creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, wherein the nodes in one class belong to the same user.
Referring to fig. 2, fig. 2 is a flowchart illustrating a sub-step of step S2 in fig. 1. As shown in fig. 2, the step S2 of calculating the ip privacy includes:
a put-through result set initialization step S21: initializing the data of the training set as a communication result set;
building an inverted index of ip step S22: constructing an inverted index of the ip in the prediction set;
step S23 of calculating the ip privacy level: according to the formula pri (IP)i)=sum_max(IPi,m)/∑ciCalculating IP privacy density, wherein IPiFor inverted index, sum _ max (IP)iM) is IPiCorresponding first m largest
Figure BDA0002847931310000071
The sum of (1);
step S24 of calculating the similarity of the ip sets: integrating user, equipment and Cookie information and then calculating the similarity of the ip set;
update ip privacy step S25: and updating the ip privacy according to the ip set similarity.
Referring to fig. 3, fig. 3 is a flowchart illustrating a sub-step of step S3 in fig. 1. As shown in fig. 3, the generating candidate set step S3 includes:
id and cookie information integration step S31: integrating the user information in the training set and the device id and cookie information in the testing set to obtain integrated information;
candidate set generation step S32: and after judging the integration information, constructing the candidate set.
Wherein the similarity vector obtaining step S4 includes: and calculating the attribute similarity of the candidate set according to the equipment characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, and forming a multi-dimensional similarity vector.
Hereinafter, the inter-device user identification method according to the present invention will be described in detail with reference to the following examples.
The first embodiment is as follows:
the method is used for dealing with the current complex and changeable marketing environment: audience watching habit migration, diversification of connected equipment, fragmentation of media and content, data islanding and the like, and the method helps personalized services of advertisers to more comprehensively evaluate cross-media and purchase behavior monitoring, further improves the overall effect in cross-screen marketing and interaction with consumers, and meanwhile, in order to solve the problems of low communication efficiency, excessive dependence on ip addresses, unexplainable models, low recall rate and the like in the prior art, the cross-screen communication method supporting the ip privacy density provided by the scheme comprehensively considers user identification among equipment, cookies and known users, and first calculates the privacy of all ips in a data set based on a semi-supervised learning iterative method; generating a candidate set based on the ip privacy; calculating attribute similarity of the candidate set, and using the following four aspects of related information to form a multi-dimensional similarity vector:
1. the equipment is characterized in that: geographical information, number of bound ips, number of accessed media types
2. ip characteristics: number of ip shared between devices, importance characteristics of ip shared between devices
3. Multilayer time period: number of breakthroughs in morning, noon and evening
4. Media behavior characteristics: the number of media shared among the devices, the number of media types, the similarity characteristic of the jaccard of the shared media among the devices, and the similarity characteristic of the jaccard of the shared media types.
Then, predicting the similarity of the candidate pairs by using an XGboost model; and finally, generating a similarity graph based on the obtained similarity, and clustering users by using a graph clustering algorithm, wherein nodes in one class belong to the same user.
The method comprises the following steps:
1. constructing a training set and a testing set: the training set uses a matching pair of device id and cookie of known users, the training set uses a record set of logs.
2. And calculating the ip privacy density in the training set and the test set.
a. And initializing the training set data of the known user as a communication result set R.
b. Constructing the reverse index of ip in the prediction set R, and recording the reverse index as
Figure BDA0002847931310000091
Wherein
Figure BDA0002847931310000092
Good show user ujUse is made of
Figure BDA0002847931310000093
sub-IPi
c. And (3) calculating the privacy of the ip: pri (IP)i)=sum_max(IPi,m)/∑ciWhere sum _ max (IP)iM) is IPiCorresponding first m largest
Figure BDA0002847931310000094
The sum of (1). And initializing the ip private density of the ip which does not appear in the prediction result set to be 0.
d. And (3) calculating the similarity of the ip set based on the ip privacy: integrating the user, the equipment and the Cookie information to obtain
Figure BDA0002847931310000095
Will be provided with
Figure BDA0002847931310000096
The following normalization process is performed:
Figure BDA0002847931310000097
Figure BDA0002847931310000098
e. and respectively calculating the similarity of each cookie in the test set with the ip set of all users and all device ids in the training set. Selection and cookieiThe id of the user or the device with the highest similarity is recorded as id'iThe similarity is siIf s isiIf the value is larger than the threshold value theta, the cookie is considerediAnd id'iBelong to the same user u'iThen all will contain cookiesiIs recorded with the user flag u'iOtherwise, it belongs to new user u ″iWill all contain cookiesiThe user to which the record belongs is marked u ″i. And updating the ip private density.
f. And respectively calculating the similarity of each device in the test set and the ip set of all users in the training set. Selection and deviThe user with the highest similarity is recorded as id'jThe similarity is sjIf s isjIf the value is larger than the threshold value theta, dev is consideredjAnd id'jBelong to the same user u'jThen all will contain devjIs recorded with the user flag u'jOtherwise, it belongs to new user u ″jAll will contain devjThe user to which the record belongs is marked u ″j. And updating the ip private density.
g. And returning the final ip privacy information.
3. And generating a candidate set matching pair of cross-screen communication. The candidate set generated in the way can improve the processing efficiency of cross-screen communication.
a. Signaling users in training set data setInformation and device id and cookie information in the test set are integrated and recorded as
Figure BDA0002847931310000101
Wherein idiRepresenting a device id, cookie, or user.
Figure BDA0002847931310000102
Represents idjUse is made of
Figure BDA0002847931310000103
sub-IPi
b. For each IPiIf its privacy probability is greater than a threshold δ, the IPiCan be used for generating a candidate set and adding the candidate set into the IP set IPusable
c. Generating a candidate set can for each cookie in the test setiEach u in the training data setjWith cookiesiIs recorded as the common ip set
Figure BDA0002847931310000104
If it is not empty and is associated with IPusableIf the intersection of (A) is not empty, then (cookie)i,uj) Adding the candidate set into can, otherwise, not adding the candidate set; dev for each device in the test setjWith cookiesiIs recorded as the common ip set
Figure BDA0002847931310000105
If it is not empty and is associated with IPusableIf the intersection of (A) is not empty, then (cookie)i,devj) Adding the candidate set into can, otherwise, not adding the candidate set; dev for each device in the test setiAnd ujIs recorded as the common ip set
Figure BDA0002847931310000106
If it is not empty and is associated with IPusableIf the intersection of (d) is not empty, then (dev)i,uj) Add to can as candidate set, else not add to candidate set
4. Calculating attribute similarity of the candidate set, and using the following four aspects of related information to form a multi-dimensional similarity vector:
the equipment is characterized in that: geographical information, number of bound ips, number of accessed media types
ip characteristics: number of ip shared between devices, importance characteristics of ip shared between devices
Multilayer time period: number of breakthroughs in morning, noon and evening
Media behavior characteristics: the number of media shared between devices, the number of media types, the Jaccard similarity characteristics of the shared media between devices, the similarity characteristics of the Jaccard of the shared media types.
5. And training the XGboost model by using a training data set, and obtaining the candidate pairs and the similarity of the candidate pairs through the XGboost model for the similarity vectors of the candidate pairs.
6. And creating a similarity graph based on the similarity of the candidate pairs, and clustering different nodes on the similarity graph by using a graph clustering algorithm, so that the nodes in one class belong to the same user.
Example two:
referring to fig. 4, fig. 4 is a schematic structural diagram of an inter-device user identification system according to the present invention. Fig. 4 shows an inter-device user identification system according to the present invention, which includes:
a training set and test set constructing module, which constructs a training set by using the matching pair of the device id and the cookie of the known user and constructs a test set by using the log record set;
the ip privacy calculation module obtains the ip privacy density through an iterative method based on semi-supervised learning according to the training set and the test set;
a candidate set generating module, which obtains a candidate set according to the training set, the test set and the ip privacy;
a similarity vector obtaining module, which calculates the attribute similarity of the candidate set and forms a multi-dimensional similarity vector;
the candidate pair similarity obtaining module is used for training an XGboost model by using the training set and obtaining a candidate pair and the similarity of the candidate pair through the trained XGboost model according to the similarity vector;
and the node clustering module is used for creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, and enabling the nodes in one class to belong to the same user.
Wherein, the module for calculating the ip privacy comprises:
a get-through result set initialization unit, wherein the get-through result set initialization unit takes the data of the training set as the initialization of the get-through result set;
constructing an ip inverted index unit, wherein the ip inverted index unit constructs an ip inverted index in a prediction set;
the unit for calculating the IP privacy degree is used for calculating the IP privacy degree according to a formula pri (IP)i)=sum_max(IPi,m)/∑ciCalculating IP privacy density, wherein IPiFor inverted index, sum _ max (IP)iM) is IPiCorresponding first m largest
Figure BDA0002847931310000111
The sum of (1);
the unit for calculating the similarity of the ip set integrates the information of the user, the equipment and the Cookie and then calculates the similarity of the ip set;
and the ip privacy updating unit updates the ip privacy according to the ip set similarity.
Wherein the generate candidate set module comprises:
the id and cookie information integration unit integrates the user information in the training set and the equipment id and cookie information in the test set to obtain integrated information;
and the candidate set generating unit is used for judging the integration information and then constructing the candidate set.
The similarity vector obtaining module calculates the attribute similarity of the candidate set according to the equipment characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, and forms a multi-dimensional similarity vector.
Example three:
referring to FIG. 5, the embodiment discloses an embodiment of a computer device. The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 implements any of the inter-device user identification methods in the above embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 5, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
Bus 80 includes hardware, software, or both to couple the components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a HyperTransport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a Microchannel Architecture (MCA) Bus, a Peripheral Component Interconnect (PCl) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may be based on an inter-device user identification method, thereby implementing the methods described in connection with fig. 1-3.
In addition, in combination with the inter-device user identification method in the foregoing embodiments, embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement an inter-device user identification method in the above embodiments.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
In summary, the method for identifying the users between the devices provided by the invention has the beneficial effects that the method is based on ip and does not depend on login accounts, so that the limitation on the application range of the cross-screen technology is reduced, almost all users in the internet ecology are included to the maximum extent, and the scale of potential customers is enlarged; according to the method, the communicated candidate set is generated based on the ip privacy, so that the influence of unstable ip on accuracy is avoided, and the efficiency of cross-screen identification is improved; the method uses the device characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, so that the recognition result is more explanatory; the method not only considers the user identification among different types of equipment, but also considers the user identification among the same type of equipment, such as the user identification of cookie and the user identification of device and device, and solves the limitation that the recall rate of the prior method is lower.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An inter-device user identification method, comprising:
constructing a training set and a testing set: constructing a training set by using a matching pair of the device id and the cookie of a known user, and constructing a test set by using a log record set;
and (3) calculating the ip privacy: obtaining the ip private density by an iterative method based on semi-supervised learning according to the training set and the test set;
generating a candidate set: obtaining a candidate set according to the training set, the test set and the ip privacy;
a similarity vector obtaining step: calculating attribute similarity of the candidate set, and forming a multi-dimensional similarity vector;
and obtaining candidate pair similarity: training an XGboost model by using the training set, and obtaining candidate pairs and the similarity of the candidate pairs through the trained XGboost model according to the similarity vector;
a node clustering step: and creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, wherein the nodes in one class belong to the same user.
2. The inter-device user identification method of claim 1, wherein the calculating the ip privacy level step comprises:
and (3) a step of initiating a communication result set: initializing the data of the training set as a communication result set;
constructing an ip inverted index: constructing an inverted index of the ip in the prediction set;
and (3) calculating the ip privacy: according to the formula pri (IP)i)=sum_max(IPi,m)/∑ciCalculating IP privacy density, wherein IPiFor inverted index, sum _ max (IP)iM) is IPiCorresponding first m largest
Figure FDA0002847931300000011
The sum of (1);
and (3) calculating the similarity of the ip set: integrating user, equipment and Cookie information and then calculating the similarity of the ip set;
and updating the ip privacy: and updating the ip privacy according to the ip set similarity.
3. The inter-device user identification method of claim 1, wherein the generating a candidate set step comprises:
and id and cookie information integration step: integrating the user information in the training set and the device id and cookie information in the testing set to obtain integrated information;
a candidate set generation step: and after judging the integration information, constructing the candidate set.
4. The inter-device user identification method of claim 1, wherein the similarity vector obtaining step comprises: and calculating the attribute similarity of the candidate set according to the equipment characteristics, the ip characteristics, the multi-level time period and the media behavior characteristics, and forming a multi-dimensional similarity vector.
5. An inter-device user identification system, comprising:
a training set and test set constructing module, which constructs a training set by using the matching pair of the device id and the cookie of the known user and constructs a test set by using the log record set;
the ip privacy calculation module obtains the ip privacy density through an iterative method based on semi-supervised learning according to the training set and the test set;
a candidate set generating module, which obtains a candidate set according to the training set, the test set and the ip privacy;
a similarity vector obtaining module, which calculates the attribute similarity of the candidate set and forms a multi-dimensional similarity vector;
the candidate pair similarity obtaining module is used for training an XGboost model by using the training set and obtaining a candidate pair and the similarity of the candidate pair through the trained XGboost model according to the similarity vector;
and the node clustering module is used for creating a similarity graph according to the similarity of the candidate pairs, clustering different nodes on the similarity graph by using a graph clustering algorithm, and enabling the nodes in one class to belong to the same user.
6. The inter-device user identification system of claim 5, wherein the calculate ip privacy module comprises:
a get-through result set initialization unit, wherein the get-through result set initialization unit takes the data of the training set as the initialization of the get-through result set;
constructing an ip inverted index unit, wherein the ip inverted index unit constructs an ip inverted index in a prediction set;
the unit for calculating the IP privacy degree is used for calculating the IP privacy degree according to a formula pri (IP)i)=sum_max(IPi,m)/∑ciCalculating IP privacy density, wherein IPiFor inverted index, sum _ max (IP)iM) is IPiCorresponding first m largest
Figure FDA0002847931300000031
The sum of (1);
the unit for calculating the similarity of the ip set integrates the information of the user, the equipment and the Cookie and then calculates the similarity of the ip set;
and the ip privacy updating unit updates the ip privacy according to the ip set similarity.
7. The inter-device user identification system of claim 5, wherein the generate candidate set module comprises:
the id and cookie information integrating unit integrates the user information in the training set and the equipment id and cookie information in the test set to obtain integrated information;
and the candidate set generating unit is used for judging the integration information and then constructing the candidate set.
8. The inter-device user identification system of claim 1, wherein the similarity vector obtaining module calculates the attribute similarity of the candidate set according to device characteristics, ip characteristics, multi-level time periods, and media behavior characteristics, and forms a multi-dimensional similarity vector.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the inter-device user identification method according to any one of claims 1 to 4 when executing the computer program.
10. A storage medium on which a computer program is stored, which program, when executed by a processor, implements the inter-device user identification method according to any one of claims 1 to 4.
CN202011517605.XA 2020-12-21 2020-12-21 Method, system, computer device and storage medium for identifying user between devices Pending CN112559872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011517605.XA CN112559872A (en) 2020-12-21 2020-12-21 Method, system, computer device and storage medium for identifying user between devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011517605.XA CN112559872A (en) 2020-12-21 2020-12-21 Method, system, computer device and storage medium for identifying user between devices

Publications (1)

Publication Number Publication Date
CN112559872A true CN112559872A (en) 2021-03-26

Family

ID=75031207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011517605.XA Pending CN112559872A (en) 2020-12-21 2020-12-21 Method, system, computer device and storage medium for identifying user between devices

Country Status (1)

Country Link
CN (1) CN112559872A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129054A (en) * 2021-03-30 2021-07-16 广州博冠信息科技有限公司 User identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224606A (en) * 2015-09-02 2016-01-06 新浪网技术(中国)有限公司 A kind of disposal route of user ID and device
CN108924246A (en) * 2018-07-25 2018-11-30 东北大学 It is a kind of support user's private ip find across screen method for tracing
CN111080349A (en) * 2019-12-04 2020-04-28 北京悠易网际科技发展有限公司 Method, apparatus, server and medium for identifying multiple devices of same user
CN111090807A (en) * 2019-12-16 2020-05-01 秒针信息技术有限公司 Knowledge graph-based user identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224606A (en) * 2015-09-02 2016-01-06 新浪网技术(中国)有限公司 A kind of disposal route of user ID and device
CN108924246A (en) * 2018-07-25 2018-11-30 东北大学 It is a kind of support user's private ip find across screen method for tracing
CN111080349A (en) * 2019-12-04 2020-04-28 北京悠易网际科技发展有限公司 Method, apparatus, server and medium for identifying multiple devices of same user
CN111090807A (en) * 2019-12-16 2020-05-01 秒针信息技术有限公司 Knowledge graph-based user identification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129054A (en) * 2021-03-30 2021-07-16 广州博冠信息科技有限公司 User identification method and device
CN113129054B (en) * 2021-03-30 2024-05-31 广州博冠信息科技有限公司 User identification method and device

Similar Documents

Publication Publication Date Title
US11531867B2 (en) User behavior prediction method and apparatus, and behavior prediction model training method and apparatus
TWI712963B (en) Recommendation system construction method and device
WO2018192496A1 (en) Trend information generation method and device, storage medium and electronic device
US10474827B2 (en) Application recommendation method and application recommendation apparatus
WO2020156389A1 (en) Information pushing method and device
US10747771B2 (en) Method and apparatus for determining hot event
US20210232929A1 (en) Neural architecture search
CN114265979B (en) Method for determining fusion parameters, information recommendation method and model training method
WO2020093289A1 (en) Resource recommendation method and apparatus, electronic device and storage medium
CN109471978B (en) Electronic resource recommendation method and device
CN110008973B (en) Model training method, method and device for determining target user based on model
CN110009486B (en) Method, system, equipment and computer readable storage medium for fraud detection
CN110781407A (en) User label generation method and device and computer readable storage medium
WO2019019385A1 (en) Cross-platform data matching method and apparatus, computer device and storage medium
CN105894028B (en) User identification method and device
CN112214775A (en) Injection type attack method and device for graph data, medium and electronic equipment
CN110855487B (en) Network user similarity management method, device and storage medium
CN112581185A (en) Method, system, computer and storage medium for estimating gender and age of advertisement user
CN109754135B (en) Credit behavior data processing method, apparatus, storage medium and computer device
JP2024508502A (en) Methods and devices for pushing information
CN112559872A (en) Method, system, computer device and storage medium for identifying user between devices
CN114092162B (en) Recommendation quality determination method, and training method and device of recommendation quality determination model
CN110557351B (en) Method and apparatus for generating information
CN112818235B (en) Method and device for identifying illegal user based on association characteristics and computer equipment
CN115563377A (en) Enterprise determination method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination