CN110533085B - Same-person identification method and device, storage medium and computer equipment - Google Patents

Same-person identification method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN110533085B
CN110533085B CN201910740557.1A CN201910740557A CN110533085B CN 110533085 B CN110533085 B CN 110533085B CN 201910740557 A CN201910740557 A CN 201910740557A CN 110533085 B CN110533085 B CN 110533085B
Authority
CN
China
Prior art keywords
user
sample
identified
same
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910740557.1A
Other languages
Chinese (zh)
Other versions
CN110533085A (en
Inventor
刘逸哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dazhu Hangzhou Technology Co ltd
Original Assignee
Dazhu Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dazhu Hangzhou Technology Co ltd filed Critical Dazhu Hangzhou Technology Co ltd
Priority to CN201910740557.1A priority Critical patent/CN110533085B/en
Publication of CN110533085A publication Critical patent/CN110533085A/en
Application granted granted Critical
Publication of CN110533085B publication Critical patent/CN110533085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for identifying the same person, a storage medium and computer equipment, wherein the method comprises the following steps: clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster; respectively extracting at least one group of training sample users from each sample user cluster, and acquiring the same-person marking information of the training sample users; training a same-person recognition model by using the training sample user and the corresponding same-person marking information; and carrying out the same-person recognition on the user to be recognized according to the trained same-person recognition model. According to the method and the device, through clustering of sample users, the training amount of the same-person recognition model is reduced, training optimization is achieved, and training efficiency is improved.

Description

Same-person identification method and device, storage medium and computer equipment
Technical Field
The present application relates to the field of data analysis technologies, and in particular, to a method and an apparatus for identifying a person, a storage medium, and a computer device.
Background
The current internet is developed vigorously, a batch of electronic commerce and network financial service companies are promoted, and because the electronic commerce companies can be subsidized by various new people, the financial service companies can directly borrow the users, so that a plurality of users can change the mobile phone numbers, and means such as re-registration can obtain benefits, so that how to determine website registered users, or the service objects are the same person, the method becomes the key for reducing the operation cost and the risk of the electronic commerce and the internet financial service companies.
In the field of same-person recognition, the construction of training samples is very important for training a same-person recognition model. How to quickly determine which two users belong to the same user from a large number of sample users, and then construct a training sample set is an important problem in the field.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for identifying the same person, a storage medium, and a computer device, which reduce the training amount of the same person identification model by clustering sample users, thereby implementing optimization of training and improving training efficiency.
According to an aspect of the present application, there is provided a method of identifying a fellow person, including:
clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster;
respectively extracting at least one group of training sample users from each sample user cluster, and acquiring the same-person marking information of the training sample users;
training a same-person recognition model by using the training sample user and the corresponding same-person marking information;
and carrying out the same-person recognition on the user to be recognized according to the trained same-person recognition model.
Specifically, before clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster, the method further includes:
acquiring basic data of a sample user;
and counting the characteristic information of the sample user according to a preset characteristic category based on the basic data of the sample user.
Specifically, the clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster specifically includes:
determining the clustering quantity according to the quantity of the sample users, and generating initial clustering centers of the corresponding clustering quantity;
and performing K-means clustering on the characteristic information of the sample users according to the initial clustering centers to obtain the sample user clusters with corresponding clustering quantity and the clustering centers corresponding to the sample user clusters.
Specifically, any group of training sample users includes the sample user corresponding to the clustering center in any sample user cluster and any other sample user in the same sample cluster, and the same-person labeling information includes the same-person label or non-same-person label.
Specifically, the identifying the same person for the user to be identified according to the trained same person identification model specifically includes:
according to the basic data of the user to be identified, counting the characteristic information of the user to be identified according to the preset characteristic category;
clustering the users to be identified based on the characteristic information of the users to be identified to obtain at least one user cluster to be identified and a clustering center corresponding to the user cluster to be identified;
acquiring a central user and a comparison user corresponding to any user cluster to be identified, wherein the central user is the user to be identified corresponding to the clustering center of the user cluster to be identified, and the comparison user is all the users to be identified except the clustering center in the user cluster to be identified;
and inputting the characteristic information corresponding to the central user and the characteristic information corresponding to any comparison user in any user cluster to be identified into the trained same-person identification model to obtain a result of whether the central user and any comparison user are the same user.
Specifically, the method further comprises:
if the central user and any comparison user are the same user, establishing a same-person set corresponding to the central user according to any comparison user.
Specifically, the basic feature data includes, but is not limited to, at least one of communication data, carrier service data, and e-carrier operation data of the sample user, or a combination thereof.
According to another aspect of the present application, there is provided a same person identification apparatus including:
the sample clustering module is used for clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster;
the training sample acquisition module is used for respectively extracting at least one group of training sample users from each sample user cluster and acquiring the same-person marking information of the training sample users;
the training module is used for training the same-person recognition model by utilizing the training sample user and the corresponding same-person marking information;
and the recognition module is used for carrying out the same-person recognition on the user to be recognized according to the trained same-person recognition model.
Specifically, the apparatus further comprises:
the basic data acquisition module is used for clustering the sample users based on the characteristic information of the sample users to acquire basic data of the sample users before at least one sample user cluster is obtained;
and the characteristic information counting module is used for counting the characteristic information of the sample user according to a preset characteristic category based on the basic data of the sample user.
Specifically, the sample clustering module specifically includes:
the cluster center generating unit is used for determining the cluster number according to the number of the sample users and generating initial cluster centers of corresponding cluster numbers;
and the clustering unit is used for carrying out K-means clustering on the characteristic information of the sample users according to the initial clustering center to obtain the sample user clusters with corresponding clustering quantity and the clustering centers corresponding to each sample user cluster.
Specifically, any group of training sample users includes the sample user corresponding to the clustering center in any sample user cluster and any other sample user in the same sample cluster, and the same-person labeling information includes the same-person label or non-same-person label.
Specifically, the identification module specifically includes:
the characteristic information counting unit is used for counting the characteristic information of the user to be identified according to the basic data of the user to be identified and the preset characteristic category;
the clustering unit is used for clustering the users to be identified based on the characteristic information of the users to be identified to obtain at least one user cluster to be identified and a clustering center corresponding to the user cluster to be identified;
an identified user obtaining unit, configured to obtain a central user and a comparison user corresponding to any one of the to-be-identified user clusters, where the central user is the to-be-identified user corresponding to a clustering center of the to-be-identified user cluster, and the comparison user is all the to-be-identified users in the to-be-identified user cluster except the clustering center;
and the identification unit is used for inputting the characteristic information corresponding to the central user and the characteristic information corresponding to any comparison user in any user cluster to be identified into the trained same-person identification model to obtain a result of whether the central user and any comparison user are the same user.
Specifically, the apparatus further comprises:
and the result output module is used for establishing a same-person set corresponding to the central user according to any comparison user if the central user and any comparison user are the same user.
Specifically, the basic feature data includes, but is not limited to, at least one of communication data, carrier service data, and e-carrier operation data of the sample user, or a combination thereof.
According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method of peer identification.
According to yet another aspect of the present application, there is provided a computer device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the above-mentioned peer identification method when executing the program.
By means of the technical scheme, the same-person identification method and device, the storage medium and the computer equipment provided by the application cluster the sample users by utilizing the characteristic information of the sample users, place the sample users with high possibility of belonging to the same person into the same cluster, select training sample users in each cluster, mark the same-person information of the training sample users, train the same-person identification model by utilizing the characteristic information corresponding to the training sample users and the same-person information mark, and finally identify the same person of the users to be identified by utilizing the same-person identification model. According to the method and the device, through clustering of sample users, the training amount of the same-person recognition model is reduced, training optimization is achieved, and training efficiency is improved.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic flow chart illustrating a method for identifying a fellow person according to an embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating another method for identifying a fellow person provided by the embodiment of the present application;
fig. 3 is a schematic structural diagram illustrating a peer identification device according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of another peer identification device provided in an embodiment of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In this embodiment, a method for identifying a person is provided, as shown in fig. 1, the method includes:
step 101, clustering sample users based on characteristic information of the sample users to obtain at least one sample user cluster.
In the embodiment of the present application, before performing the same-person recognition, the same-person recognition model should be trained, generally, a sample required for training, that is, a sample belonging to the same user, needs to be selected from sample data in a manual screening manner, however, the workload of selecting sample data belonging to the same user from a large amount of sample data is very large, and therefore, in the embodiment of the present application, sample users that may belong to the same identity are divided into one cluster first by clustering the sample users, so that the workload of labeling can be greatly reduced.
It should be noted that, in this embodiment of the present application, sample users are clustered based on feature information of the sample users, the feature information of the sample users may include communication data of the sample users, operator service data, e-commerce operation data, and the like, the communication data may include incoming telephone numbers, outgoing telephone numbers, and the like, the operator service data may include short messages sent by various operation organizations, for example, credit card swiping information and the like sent by banks, and the e-commerce operation data may include purchase, collection, flow data, and the like of the sample users on an e-commerce platform.
And 102, respectively extracting at least one group of training sample users from each sample user cluster, and acquiring the same-person marking information of the training sample users.
After dividing sample users into one or more sample user clusters, namely dividing sample users with high possibility of belonging to the same person into one cluster, dividing the sample users in the sample user clusters into a plurality of groups of training sample users, wherein each group of training sample users comprises two sample users, and after dividing each sample user cluster into corresponding training sample users, judging whether each group of training sample users are the same user through manual marking or other modes and marking.
And 103, training the same-person recognition model by using the training sample user and the corresponding same-person marking information.
After the training sample user is labeled, the one-person recognition model can be trained according to the characteristic information of the training sample and the corresponding one-person labeling information, wherein the one-person recognition model is a two-classification model. In addition, the embodiment of the present application does not limit the homo-recognition model and the training method, for example, a basic logistic regression model is adopted, a loss function is set, and a random gradient descent method is used to train parameters of the model to obtain a final homo-recognition model. And carrying out the same-person identification on the user to be identified by utilizing the same-person identification model obtained by training.
And 104, carrying out the same-person recognition on the user to be recognized according to the trained same-person recognition model.
In the process of carrying out the same-person identification on the users to be identified by using the same-person identification model, the users to be identified can be clustered and divided into a plurality of clusters according to a clustering method for sample users, then every two user characteristic information in the same cluster are input into the same-person identification model for the same-person identification, and the situation that every two users to be identified are combined to carry out the same-person identification is avoided, so that the identification time is wasted, and the identification efficiency is improved.
Certainly, in order to improve the identification accuracy, all the users to be identified may be combined in pairs and input to the same-person identification model to realize the same-person identification.
By applying the technical scheme of the embodiment, the characteristic information of the sample users is firstly utilized to cluster the sample users, the sample users with high possibility of belonging to the same person are placed in the same cluster, then the training sample users are selected from each cluster, the identity information of the training sample users is labeled, then the identity recognition model is trained by utilizing the characteristic information corresponding to the training sample users and the identity information, and finally the identity recognition model is utilized to identify the user to be recognized. According to the method and the device, through clustering of sample users, the training amount of the same-person recognition model is reduced, training optimization is achieved, and training efficiency is improved.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the specific implementation process of the embodiment, another method for identifying the same person is provided, as shown in fig. 2, and the method includes:
step 201, acquiring basic data of a sample user.
Wherein the basic characteristic data includes, but is not limited to, at least one of communication data, operator service data, and e-commerce operation data of the sample user, or a combination thereof.
And step 202, counting the characteristic information of the sample user according to the preset characteristic category based on the basic data of the sample user.
And summarizing the basic data of the sample user into the feature information of the sample user according to a preset feature category, for example, the preset feature category comprises incoming telephone numbers, the incoming telephone numbers in the basic data of the sample user comprise A, B, D, all telephone number databases comprise A, B, C, D, E five telephone numbers, the feature information of the incoming telephone numbers of the sample user is extracted as (1, 1, 0, 1, 0), and other feature information is not illustrated.
Step 203, determining the clustering number according to the number of the sample users, and generating an initial clustering center of the corresponding clustering number.
And 204, carrying out K-means clustering on the characteristic information of the sample users according to the initial clustering center to obtain sample user clusters with corresponding clustering quantity and a clustering center corresponding to each sample user cluster.
In step 203 and step 204, after the feature information of the sample user is extracted, clustering analysis is performed by using the feature information of the sample user, and the sample user is divided into sample user clusters, which may specifically adopt a K-means clustering manner. Of course, other clustering manners may also be adopted, and the embodiment of the present application explains the K-means clustering manner, and first determines the clustering number K according to the number of sample users, for example, each 100 sample users corresponds to one cluster, then randomly generates K initial clustering centers, or generates K initial clustering centers according to other agreed manners, and finally calculates the distance between each sample and each clustering center according to the K initial clustering centers, allocates each sample to the nearest clustering center, and finally obtains K sample user clusters and K clustering centers.
Step 205, at least one group of training sample users are extracted from each sample user cluster, and the same-person labeling information of the training sample users is obtained.
Specifically, any group of training sample users includes sample users corresponding to a clustering center in any sample user cluster and any other sample users in the same sample cluster, and the same-person labeling information includes same-person labeling or non-same-person labeling.
In the above embodiment, when the training sample users are extracted, in order to reduce the amount of the training samples, clustering is performed according to the distance between the clustering center sample and other samples when clustering is performed, so that the probability that the clustering center sample of each cluster and other samples are the same person is high, the clustering center sample of each cluster and other samples can be combined in pairs to obtain the training samples, and therefore the same-person information labeling is performed on each group of training sample users in a manner of manual labeling and the like, and whether each group of training sample users belong to the same person is specifically labeled.
And step 206, training the same-person recognition model by using the training sample users and the corresponding same-person marking information.
The method for training the homo-recognition model may refer to the description of step 103, and is not described herein again.
And step 207, counting the characteristic information of the user to be identified according to the preset characteristic category according to the basic data of the user to be identified.
And 208, clustering the users to be identified based on the characteristic information of the users to be identified to obtain at least one user cluster to be identified and a clustering center corresponding to the user cluster to be identified.
Step 209, a central user and a comparison user corresponding to any user cluster to be identified are obtained, wherein the central user is the user to be identified corresponding to the clustering center of the user cluster to be identified, and the comparison user is all the users to be identified except the clustering center in the user cluster to be identified.
Step 210, inputting the feature information corresponding to the central user in any user cluster to be identified and the feature information corresponding to any comparison user into the trained same-person identification model, and obtaining a result of whether the central user and any comparison user are the same user.
In the above steps 207 to 210, in the process of performing peer identification on the to-be-identified users, in order to improve the identification efficiency, the to-be-identified users may be clustered to obtain user clusters to be identified, and then the peer identification model trained in step 206 is used to identify whether the cluster center user and other users in each user cluster to be identified are the same user. The specific clustering method can be the same as that of a sample user, and in order to improve the identification efficiency, clustering is performed according to the distance between a clustering center sample and other samples during clustering, so that the clustering center sample of each cluster is highly likely to be the same person as the other samples, and therefore, during identification, characteristic information of a center user corresponding to the clustering center in the same cluster and other users is input into a same person identification model for identification.
In step 211, if the central user and any comparison user are the same user, a peer set corresponding to the central user is established according to any comparison user.
And after the conclusion that whether each comparison user of any user cluster to be identified belongs to the same user as the central user is obtained, establishing a same-person set for the comparison users belonging to the same user as the central user, wherein the users in the set and the central user belong to the same person.
By applying the technical scheme of the embodiment, firstly, the user characteristic information is summarized by fusing the multi-element data such as the user communication relation, the operator service data, the e-commerce operation data and the like, and compared with the prior art that whether the user is the same person is judged by simple rule matching, the judgment result of the information of the same person is more accurate; secondly, clustering analysis of the user is used as previous value operation of the same-person recognition model, data volume is reduced, and an optimization method for calculation performance is adopted; and thirdly, the same-person recognition model is used as a post-operation of clustering analysis, so that the discrimination accuracy of the same-person recognition model is improved.
Further, as a specific implementation of the method in fig. 1, an embodiment of the present application provides a device for identifying a same person, as shown in fig. 3, where the device includes: the system comprises a sample clustering module 31, a training sample obtaining module 32, a training module 33 and an identification module 34.
The sample clustering module 31 is configured to cluster sample users based on feature information of the sample users to obtain at least one sample user cluster;
a training sample obtaining module 32, configured to extract at least one group of training sample users from each sample user cluster, and obtain the same-person labeling information of the training sample users;
the training module 33 is used for training the same-person recognition model by using the training sample users and the corresponding same-person marking information;
and the recognition module 34 is configured to perform the same-person recognition on the user to be recognized according to the trained same-person recognition model.
In a specific application scenario, the apparatus further includes: basic data acquisition module 35, and characteristic information statistics module 36.
The basic data obtaining module 35 is configured to cluster the sample users based on the feature information of the sample users, and obtain basic data of the sample users before obtaining at least one sample user cluster;
and the characteristic information counting module 36 is configured to count the characteristic information of the sample user according to a preset characteristic category based on the basic data of the sample user.
In a specific application scenario, the sample clustering module 31 specifically includes: cluster center generating section 311 and clustering section 312.
A cluster center generating unit 311, configured to determine the number of clusters according to the number of sample users, and generate initial cluster centers of corresponding cluster numbers;
the clustering unit 312 is configured to perform K-means clustering on the feature information of the sample users according to the initial clustering center to obtain sample user clusters of corresponding clustering numbers and a clustering center corresponding to each sample user cluster.
In a specific application scenario, any group of training sample users includes a sample user corresponding to a clustering center in any sample user cluster and any other sample user in the same sample cluster, and the identity labeling information includes identity labeling or non-identity labeling.
In a specific application scenario, the identifying module 34 specifically includes: characteristic information counting section 341, clustering section 342, identified user acquiring section 343, and identifying section 344.
The characteristic information counting unit 341 is configured to count the characteristic information of the user to be identified according to the preset characteristic category according to the basic data of the user to be identified;
the clustering unit 342 is configured to cluster the users to be identified based on the feature information of the users to be identified, so as to obtain at least one user cluster to be identified and a clustering center corresponding to the user cluster to be identified;
the identified user acquiring unit 343 is configured to acquire a center user and a comparison user corresponding to any one of the to-be-identified user clusters, where the center user is the to-be-identified user corresponding to the clustering center of the to-be-identified user cluster, and the comparison user is all the to-be-identified users in the to-be-identified user cluster except the clustering center;
the identifying unit 344 is configured to input feature information corresponding to a central user in any to-be-identified user cluster and feature information corresponding to any comparison user into the trained same-person identification model, so as to obtain a result of whether the central user and any comparison user are the same user.
In a specific application scenario, the apparatus further includes: and a result output module 37.
And a result output module 37, configured to establish a peer set corresponding to the central user according to any comparison user if the central user and any comparison user are the same user.
Specifically, the basic feature data includes, but is not limited to, at least one of communication data, carrier service data, and e-carrier operation data of the sample user, or a combination thereof.
It should be noted that other corresponding descriptions of the functional units related to the same-person identification device provided in the embodiment of the present application may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not described again here.
Based on the methods shown in fig. 1 and fig. 2, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the method for identifying the same person as shown in fig. 1 and fig. 2.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the implementation scenarios of the present application.
Based on the method shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3 and fig. 4, in order to achieve the above object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the same person identification method as described above with reference to fig. 1 and 2.
Optionally, the computer device may also include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, sensors, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.
It will be appreciated by those skilled in the art that the present embodiment provides a computer device architecture that is not limiting of the computer device, and that may include more or fewer components, or some components in combination, or a different arrangement of components.
The storage medium may further include an operating system and a network communication module. An operating system is a program that manages and maintains the hardware and software resources of a computer device, supporting the operation of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.
Through the description of the above embodiment, those skilled in the art can clearly understand that the present application can be implemented by software plus a necessary universal hardware platform, or by hardware implementation, the feature information of the sample user is firstly utilized to cluster the sample users, the sample users with high possibility of belonging to the same person are placed in the same cluster, then the training sample users are selected from each cluster, after the identity information of the training sample users is labeled, the feature information corresponding to the training sample users and the identity information label are utilized to train the identity recognition model, and finally the identity recognition model is utilized to perform identity recognition on the users to be recognized. According to the method and the device, through clustering of sample users, the training amount of the same-person recognition model is reduced, training optimization is achieved, and training efficiency is improved.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (9)

1. A method for identifying a person, comprising:
clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster;
respectively extracting at least one group of training sample users from each sample user cluster, and acquiring the same-person marking information of the training sample users;
training a same-person recognition model by using the training sample user and the corresponding same-person marking information;
according to basic data of a user to be identified, counting characteristic information of the user to be identified according to a preset characteristic category; clustering the users to be identified based on the characteristic information of the users to be identified to obtain at least one user cluster to be identified and a clustering center corresponding to the user cluster to be identified; acquiring a central user and a comparison user corresponding to any user cluster to be identified, wherein the central user is the user to be identified corresponding to the clustering center of the user cluster to be identified, and the comparison user is all the users to be identified except the clustering center in the user cluster to be identified; and inputting the characteristic information corresponding to the central user and the characteristic information corresponding to any comparison user in any user cluster to be identified into the trained same-person identification model to obtain a result of whether the central user and any comparison user are the same user.
2. The method of claim 1, wherein before clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster, the method further comprises:
acquiring basic data of a sample user;
and counting the characteristic information of the sample user according to a preset characteristic category based on the basic data of the sample user.
3. The method according to claim 2, wherein the clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster specifically comprises:
determining the clustering quantity according to the quantity of the sample users, and generating initial clustering centers of the corresponding clustering quantity;
and performing K-means clustering on the characteristic information of the sample users according to the initial clustering centers to obtain the sample user clusters with corresponding clustering quantity and the clustering centers corresponding to the sample user clusters.
4. The method according to claim 3, wherein any group of the training sample users includes the sample user corresponding to a cluster center in any sample user cluster and any other sample user in the same sample cluster, and the identity label information includes identity labels or non-identity labels.
5. The method of claim 1, further comprising:
if the central user and any comparison user are the same user, establishing a same-person set corresponding to the central user according to any comparison user.
6. The method according to any one of claims 2 to 5, wherein the base characteristic data includes, but is not limited to, at least one of communication data, carrier service data, and e-carrier operation data of the sample user, or a combination thereof.
7. A peer recognition device, comprising:
the sample clustering module is used for clustering the sample users based on the characteristic information of the sample users to obtain at least one sample user cluster;
the training sample acquisition module is used for respectively extracting at least one group of training sample users from each sample user cluster and acquiring the same-person marking information of the training sample users;
the training module is used for training the same-person recognition model by utilizing the training sample user and the corresponding same-person marking information;
the identification module is used for counting the characteristic information of the user to be identified according to the basic data of the user to be identified and the preset characteristic category; clustering the users to be identified based on the characteristic information of the users to be identified to obtain at least one user cluster to be identified and a clustering center corresponding to the user cluster to be identified; acquiring a central user and a comparison user corresponding to any user cluster to be identified, wherein the central user is the user to be identified corresponding to the clustering center of the user cluster to be identified, and the comparison user is all the users to be identified except the clustering center in the user cluster to be identified; and inputting the characteristic information corresponding to the central user and the characteristic information corresponding to any comparison user in any user cluster to be identified into the trained same-person identification model to obtain a result of whether the central user and any comparison user are the same user.
8. A storage medium on which a computer program is stored, which program, when being executed by a processor, carries out the method of identifying a fellow person of any one of claims 1 to 6.
9. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the method of identifying a person of any one of claims 1 to 6 when executing the program.
CN201910740557.1A 2019-08-12 2019-08-12 Same-person identification method and device, storage medium and computer equipment Active CN110533085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910740557.1A CN110533085B (en) 2019-08-12 2019-08-12 Same-person identification method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910740557.1A CN110533085B (en) 2019-08-12 2019-08-12 Same-person identification method and device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN110533085A CN110533085A (en) 2019-12-03
CN110533085B true CN110533085B (en) 2022-04-01

Family

ID=68663021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910740557.1A Active CN110533085B (en) 2019-08-12 2019-08-12 Same-person identification method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN110533085B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159243B (en) * 2019-12-30 2023-08-04 ***通信集团江苏有限公司 User type identification method, device, equipment and storage medium
CN111625817B (en) * 2020-05-12 2023-05-02 咪咕文化科技有限公司 Abnormal user identification method, device, electronic equipment and storage medium
CN111598360A (en) * 2020-07-24 2020-08-28 北京淇瑀信息科技有限公司 Service policy determination method and device and electronic equipment
CN112085114A (en) * 2020-09-14 2020-12-15 杭州中奥科技有限公司 Online and offline identity matching method, device, equipment and storage medium
CN112148981A (en) * 2020-09-29 2020-12-29 广州小鹏自动驾驶科技有限公司 Method, device, equipment and storage medium for identifying same
CN112819106B (en) * 2021-04-16 2021-07-13 江西博微新技术有限公司 IFC component type identification method, device, storage medium and equipment
CN113139005A (en) * 2021-04-22 2021-07-20 康键信息技术(深圳)有限公司 Same-person identification method based on same-person identification model and related equipment
CN113361603B (en) * 2021-06-04 2024-05-10 北京百度网讯科技有限公司 Training method, category identification device, electronic device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355170A (en) * 2016-11-22 2017-01-25 Tcl集团股份有限公司 Photo classifying method and device
CN107358945A (en) * 2017-07-26 2017-11-17 谢兵 A kind of more people's conversation audio recognition methods and system based on machine learning
CN108229321A (en) * 2017-11-30 2018-06-29 北京市商汤科技开发有限公司 Human face recognition model and its training method and device, equipment, program and medium
CN109816043A (en) * 2019-02-02 2019-05-28 拉扎斯网络科技(上海)有限公司 Determination method, apparatus, electronic equipment and the storage medium of user's identification model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9025864B2 (en) * 2010-06-01 2015-05-05 Hewlett-Packard Development Company, L.P. Image clustering using a personal clothing model
KR102450374B1 (en) * 2016-11-17 2022-10-04 삼성전자주식회사 Method and device to train and recognize data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355170A (en) * 2016-11-22 2017-01-25 Tcl集团股份有限公司 Photo classifying method and device
CN107358945A (en) * 2017-07-26 2017-11-17 谢兵 A kind of more people's conversation audio recognition methods and system based on machine learning
CN108229321A (en) * 2017-11-30 2018-06-29 北京市商汤科技开发有限公司 Human face recognition model and its training method and device, equipment, program and medium
CN109816043A (en) * 2019-02-02 2019-05-28 拉扎斯网络科技(上海)有限公司 Determination method, apparatus, electronic equipment and the storage medium of user's identification model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Centralized and Clustered Features for Person Re-Identification";Jian Lu 等;《 IEEE Signal Processing Letters》;20190630;第26卷(第6期);933-937 *
"视频中的人脸聚类***的设计与实现";胡易;《中国优秀博硕士学位论文全文数据库(硕士)-信息科技辑》;20180415;第2018年卷(第4期);I138-2619 *

Also Published As

Publication number Publication date
CN110533085A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110533085B (en) Same-person identification method and device, storage medium and computer equipment
US20180357643A1 (en) Apparatus and method of detecting abnormal financial transaction
CN110490721B (en) Financial voucher generating method and related product
CN111954173B (en) Method, device, server and computer readable storage medium for sending short message
CN110162359B (en) Method, device and system for pushing novice guiding information
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN111949702B (en) Abnormal transaction data identification method, device and equipment
CN109829073B (en) Image searching method and device
CN113505272B (en) Control method and device based on behavior habit, electronic equipment and storage medium
CN109412832B (en) User service providing method and system
CN111598162A (en) Cattle risk monitoring method, terminal equipment and storage medium
CN111626767B (en) Resource data issuing method, device and equipment
CN112364014A (en) Data query method, device, server and storage medium
CN113206909A (en) Crank call interception method and device
CN111241502B (en) Cross-device user identification method and device, electronic device and storage medium
CN110675263B (en) Risk identification method and device for transaction data
CN111612085A (en) Method and device for detecting abnormal point in peer-to-peer group
CN111126071A (en) Method and device for determining questioning text data and data processing method of customer service group
CN113221005A (en) Customer service pushing method, server and related products
CN113779346A (en) Method and device for identifying one person with multiple accounts
CN111933151A (en) Method, device and equipment for processing call data and storage medium
CN111241401A (en) Search request processing method and device
CN115408606A (en) Insurance information pushing method and device, storage medium and computer equipment
CN113011503B (en) Data evidence obtaining method of electronic equipment, storage medium and terminal
CN113536129A (en) Service push method and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant