CN105721629A - User identifier matching method and device - Google Patents

User identifier matching method and device Download PDF

Info

Publication number
CN105721629A
CN105721629A CN201610172168.XA CN201610172168A CN105721629A CN 105721629 A CN105721629 A CN 105721629A CN 201610172168 A CN201610172168 A CN 201610172168A CN 105721629 A CN105721629 A CN 105721629A
Authority
CN
China
Prior art keywords
matched
location
user
address
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610172168.XA
Other languages
Chinese (zh)
Other versions
CN105721629B (en
Inventor
程允胜
吴海山
周景博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610172168.XA priority Critical patent/CN105721629B/en
Publication of CN105721629A publication Critical patent/CN105721629A/en
Application granted granted Critical
Publication of CN105721629B publication Critical patent/CN105721629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/69Types of network addresses using geographic information, e.g. room number
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a user identifier matching method and device. One specific implementation manner of the user identifier matching method comprises the following steps of: analyzing a pre-stored user operation information set, and obtaining at least one positioning area distributed by various internet protocol IP addresses and weights of various positioning areas recorded in the user operation information set, wherein user operation information in the user operation information set includes user identifiers, IP addresses and positioning point coordinates; obtaining the positioning information similarity of a user identifier to be matched and other user identifiers recorded in various user operation information sets according to the positioning area distributed by the IP address associated with the user identifier and the weights of various positioning areas; and determining other user identifiers matched with the user identifier to be matched according to the positioning information similarity. By means of the implementation manner, the user identifiers can be matched accurately and reliably.

Description

ID matching process and device
Technical field
The application relates to field of computer technology, is specifically related to user's Portrait brand technology field, particularly relates to ID matching process and device.
Background technology
Flourish along with the Internet, the demand precisely being analyzed the attribute of each user and relation by user's representation data is more and clearer and more definite.User's portrait is the virtual representations of real user, is built upon the targeted customer's model on a series of truthful data.Go to understand user by user's investigation, they are divided into different types, then extract typical characteristic in each type by the difference according to their target, behavior and viewpoint, give name, photo, some demography key element, scenes etc. to describe, be the formation of user's representation data.User's portrait makes enterprise can pass through the Internet and advantageously obtains user's feedback information more widely, for analyzing the important business informations such as user behavior custom, consumption habit further precisely, rapidly, it is provided that enough data bases.
At present, some Large-Scale Interconnecteds nets enterprise generally has a plurality of product line, and every product line has respective user profile.In order to extract user's representation data more accurately, it is necessary to the ID in multiple product lines is mated, to determine whether each product line ID belongs to same user.The method of existing ID coupling is usually based solely on the IP (InternetProtocol associated by ID, procotol) ID is mated by address, or ID is mated by independent location information associated by ID.
But, owing to the IP address allocation scheme of each common carrier is different, and usually random assortment, the reliability being therefore based solely on the scheme that ID is mated by IP address is relatively low.Simultaneously as user would generally select the Location Request that shielding is unnecessary when accessing Internet service, therefore the location information of user is usually present disappearance, thus ID is mated by the location information according to excalation that is difficult to exactly.
Summary of the invention
The purpose of the application is in that to propose a kind of ID matching process and device, solves the technical problem that background section above is mentioned.
First aspect, this application provides a kind of ID matching process, described method includes: the user's operation information set prestored is analyzed, obtain in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, wherein, the user's operation information in described user's operation information set includes following information: ID, IP address, anchor point coordinate;What the IP address associated by ID was distributed positions region and the weight in each region, location, obtains the location information similarity between other ID of record in ID to be matched and each described user's operation information set;According to location information similarity, it is determined that other ID mated with described ID to be matched.
In certain embodiments, the described user's operation information set to prestoring is analyzed, obtain in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, including: obtain the anchor point coordinate set associated by each IP address of record in described user's operation information set;For each described IP address, the anchor point coordinate set associated by described IP address is carried out cluster analysis, obtain at least one corresponding cluster, as the described IP region, location being distributed;For each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
In certain embodiments, described for each described IP address, determine the weight in each region, location that described IP address is distributed, including: the location areal being distributed is deleted more than the IP address of predeterminable range threshold value more than the distance average of the anchor point coordinate in predetermined number threshold value or region, location with center point coordinate;For remaining each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
In certain embodiments, the described weight determining each region, location that described IP address is distributed, including: according to the number of the anchor point coordinate in each region, location that described IP address is distributed and scope, it is determined that the initial weight in each region, location;The center point coordinate in each region, location being distributed the IP address associated by ID is as center point coordinate corresponding to ID, the center point coordinate that the ID of record in described user's operation information set is corresponding is carried out gridding according to geographic layout, generates at least two grid;Obtain the initial weight sum in the region, location, center point coordinate place in each described grid that each ID of record is corresponding in described user's operation information set, as the frequency that each grid is corresponding with each ID, and obtain the initial weight sum in region, location, center point coordinate place in each grid, as total user's frequency that each grid is corresponding;Based on the described frequency, calculated the weight in each region, location by TF-IDF algorithm.
In certain embodiments, described method also includes: calculate the IP address similarity between described ID to be matched and each other ID;And it is described according to location information similarity, determine other ID mated with described ID to be matched, including: according to the location information similarity between described ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with described ID to be matched.
In certain embodiments, described according to the location information similarity between described ID to be matched and each other ID and IP address similarity, determine other ID mated with described ID to be matched, including: obtaining described ID to be matched and other ID characteristic of correspondence information each, described characteristic information includes: the IP address similarity between described ID to be matched and other ID, location information similarity;Based on described ID to be matched and each other ID characteristic of correspondence information, by the order models of training in advance, obtain ID to be matched and the probability of each other ID coupling;Determine that the described probability of correspondence mates with described ID to be matched more than other ID of predetermined threshold value.
In certain embodiments, the user operation data message in described user's operation information set also includes: terminal type information, operation system information;And described characteristic information also includes at least one in following information: the identical ip addresses quantity between described ID to be matched and other ID, corresponding center point coordinate coincidence quantity, described ID to be matched and the terminal type information associated by other ID, operation system information.
In certain embodiments, in described user's operation information set, the ID of record includes first user mark and the second ID, and described ID to be matched and each other ID described are belonging respectively to first user mark and the second ID.
In certain embodiments, after location information similarity between other ID recorded in obtaining ID to be matched and each described user's operation information set, described method also includes: according to the order from big to small of the location information similarity between described ID to be matched, described user's operation information set is chosen in the second ID of record predetermined quantity the second ID successively, obtains candidate's the second ID set;And it is described according to location information similarity, determine other ID mated with described ID to be matched, including: according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, it is determined that identify, with described first user to be matched, the second ID mated.
In certain embodiments, according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, before determining the second ID mated with described first user to be matched mark, described method also includes: for each second ID in described candidate the second ID set, obtain the location information similarity between described second ID and each first user mark;Choose predetermined quantity first user mark successively according to the order from big to small of the location information similarity between described second ID, obtain candidate's first user logo collection;If described ID to be matched is not in described candidate's first user logo collection, then described second ID is deleted from described candidate the second ID set.
Second aspect, this application provides a kind of ID coalignment, described device includes: location information acquisition unit, for the user's operation information set prestored is analyzed, obtain in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, wherein, the user's operation information in described user's operation information set includes following information: ID, IP address, anchor point coordinate;Location information similarity acquiring unit, the region, location being distributed for the IP address associated by ID and the weight in each region, location, obtain the location information similarity between other ID of record in ID to be matched and each described user's operation information set;Matching unit, for according to location information similarity, it is determined that other ID mated with described ID to be matched.
In certain embodiments, described location information acquisition unit includes: coordinate set obtains subelement, for obtaining each anchor point coordinate set associated by IP address of record in described user's operation information set;Cluster subelement, for for each described IP address, the anchor point coordinate set associated by described IP address being carried out cluster analysis, obtains at least one corresponding cluster, as the described IP region, location being distributed;Weight determines subelement, for for each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
In certain embodiments, described weight determines that subelement includes: extensive IP removes module, for being deleted more than the IP address of predeterminable range threshold value more than the distance average of the anchor point coordinate in predetermined number threshold value or region, location with center point coordinate by the location being distributed areal;Weight determination module, for for remaining each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
In certain embodiments, described weight determines that subelement includes: initial weight determines module, for number and the scope of the anchor point coordinate in each region, location of being distributed according to described IP address, it is determined that the initial weight in each region, location;Gridding module, for the center point coordinate in each region, location that the IP address associated by ID is distributed as center point coordinate corresponding to ID, the center point coordinate that the ID of record in described user's operation information set is corresponding is carried out gridding according to geographic layout, generates at least two grid;Frequency acquisition module, for obtaining the initial weight sum in the region, location, center point coordinate place in each described grid that each ID of record is corresponding in described user's operation information set, as the frequency that each grid is corresponding with each ID, and obtain the initial weight sum in region, location, center point coordinate place in each grid, as total user's frequency that each grid is corresponding;Weight computation module, for based on the described frequency, calculating the weight of each cluster by TF-IDF algorithm.
In certain embodiments, described device also includes: IP similarity calculated, for calculating the IP address similarity between described ID to be matched and each other ID;And described matching unit is additionally operable to according to the location information similarity between described ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with described ID to be matched.
In certain embodiments, described matching unit includes: characteristic information obtains subelement, for obtaining described ID to be matched and other ID characteristic of correspondence information each, described characteristic information includes: the IP address similarity between described ID to be matched and other ID, location information similarity;Sequence subelement, for based on described ID to be matched and each other ID characteristic of correspondence information, by the order models of training in advance, obtains ID to be matched and the probability of each other ID coupling;Coupling subelement, the described probability for determining correspondence mates with described ID to be matched more than other ID of predetermined threshold value.
In certain embodiments, the user operation data message in described user's operation information set also includes: terminal type information, operation system information;And described characteristic information also includes at least one in following information: the identical ip addresses quantity between described ID to be matched and other ID, corresponding center point coordinate coincidence quantity, described ID to be matched and the terminal type information associated by other ID, operation system information.
In certain embodiments, in described user's operation information set, the ID of record includes first user mark and the second ID, and described ID to be matched and each other ID described are belonging respectively to first user mark and the second ID.
In certain embodiments, described device also includes: first chooses unit, for obtain the location information similarity between other ID recorded in ID to be matched and each described user's operation information set at described location information similarity acquiring unit after, according to the order from big to small of the location information similarity between described ID to be matched, described user's operation information set is chosen in the second ID of record predetermined quantity the second ID successively, obtains candidate's the second ID set;And described matching unit is additionally operable to the location information similarity between according to each second ID in described ID to be matched and described candidate the second ID set, it is determined that identify, with described first user to be matched, the second ID mated.
In certain embodiments, described location information similarity acquiring unit is additionally operable at described matching unit according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, before determining the second ID mated with described first user to be matched mark, for each second ID in described candidate the second ID set, obtain the location information similarity between described second ID and each first user mark;And described device also includes: second chooses unit, identify for choosing predetermined quantity first user successively according to the order from big to small of the location information similarity between described second ID, obtain candidate's first user logo collection;Candidate's filter element, for at described matching unit according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, before determining the second ID mated with described first user to be matched mark, when described ID to be matched is not in described candidate's first user logo collection, described second ID is deleted from described candidate the second ID set.
The ID matching process of the application offer and device, by obtaining the weight at least one region, location that each procotol IP address of record in user's operation information set is distributed and each region, location, supplement and location information that perfect ID is corresponding;And the IP address associated by ID be distributed location region and each location region weight, obtain the location information similarity between other ID of record in ID to be matched and each described user's operation information set, according to location information similarity, determine other ID mated with described ID to be matched, it is achieved that accurately and reliably ID is mated.
Accompanying drawing explanation
By reading the detailed description that non-limiting example is made made with reference to the following drawings, other features, purpose and advantage will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 is the flow chart of an embodiment of the ID matching process according to the application;
Fig. 3 A is the illustrative diagram of some data process of an embodiment of the ID matching process according to the application;
Fig. 3 B is the illustrative diagram of the other data process of an embodiment of the ID matching process according to the application;
Fig. 4 is the matching effect comparison diagram of an embodiment of the ID matching process according to the application;
Fig. 5 is the flow chart of another embodiment of the ID matching process according to the application;
Fig. 6 is the matching effect comparison diagram of another embodiment of the ID matching process according to the application;
Fig. 7 is the structural representation of an embodiment of the ID coalignment according to the application;
Fig. 8 is adapted for the structural representation of the computer system of the server for realizing the embodiment of the present application.
Detailed description of the invention
Below in conjunction with drawings and Examples, the application is described in further detail.It is understood that specific embodiment described herein is used only for explaining related invention, but not the restriction to this invention.It also should be noted that, for the ease of describing, accompanying drawing illustrate only the part relevant to about invention.
It should be noted that when not conflicting, the embodiment in the application and the feature in embodiment can be mutually combined.Describe the application below with reference to the accompanying drawings and in conjunction with the embodiments in detail.
Fig. 1 illustrates the exemplary system architecture 100 of the embodiment of the ID matching process that can apply the application or ID matching process device.
As it is shown in figure 1, system architecture 100 can include terminal unit 101,102,103, network 104 and server 105.Network 104 in order to provide the medium of communication link between terminal unit 101,102,103 and server 105.Network 104 can include various connection type, for instance wired, wireless communication link or fiber optic cables etc..
User can use terminal unit 101,102,103 mutual with server 105 by network 104, to receive or to send message etc..Terminal unit 101,102,103 can be provided with various client application, for instance the application of browser application, searching class, shopping class application, social platform software etc..
Terminal unit 101,102,103 can be the various electronic equipments supporting the application of browser application, searching class, includes but not limited to smart mobile phone, panel computer, pocket computer on knee and desk computer etc..
Server 105 can be to provide the server of various service, for instance the browser application on terminal unit 101,102,103, searching class application etc. provide the database server or Cloud Server supported.The user's operation information received can be stored, integrates by server, analysis etc. processes, so that ID to be mated.
It should be noted that the ID matching process that the embodiment of the present application provides generally is performed by server 105.Correspondingly, ID coalignment is generally disposed in server 105.
It should be understood that the number of terminal unit in Fig. 1, network and server is merely schematic.According to realizing needs, it is possible to have any number of terminal unit, network and server.
The flow process 200 of an embodiment of ID matching process according to the application is illustrated with continued reference to Fig. 2, Fig. 2.
As in figure 2 it is shown, the ID matching process of the present embodiment comprises the following steps:
Step 201, is analyzed the user's operation information set prestored, and obtains in above-mentioned user's operation information set the weight at least one region, location that each IP address of record is distributed and each region, location.
Wherein, the user's operation information in above-mentioned user's operation information set includes following information: ID, IP address, anchor point coordinate.In the present embodiment, ID matching process runs on electronic equipment thereon (such as the server shown in Fig. 1) can from locally or remotely obtaining user's operation information set, and for each IP address of record in the user's operation information set got, obtain the anchor point coordinate with IP address information in above-mentioned user's operation information set.It is then possible to according to the distance between the anchor point coordinate of IP address information, the multiple anchor point coordinates with IP address information are divided at least one region, location, each region includes at least one anchor point coordinate.Afterwards, can according to the number of the elements of a fix in region, scope and/or the user time of staying in the zone, determine the weight in each region, location, or determined the weight in each region, location by other weight calculation algorithm (such as TF-IDF algorithm).
In some optional implementations of the present embodiment, electronic equipment can obtain each anchor point coordinate set associated by IP address of record in above-mentioned user's operation information set;For each above-mentioned IP address, the anchor point coordinate set associated by above-mentioned IP address is carried out cluster analysis, obtain at least one corresponding cluster, as the above-mentioned IP region, location being distributed;For each above-mentioned IP address, it is determined that the weight in each region, location that above-mentioned IP address is distributed.Wherein, the anchor point coordinate set associated by above-mentioned IP address can be carried out cluster analysis by K-means algorithm by electronic equipment, obtains at least one corresponding cluster.
Step 202, what the IP address associated by ID was distributed positions region and the weight in each region, location, obtains the location information similarity between other ID of record in ID to be matched and each above-mentioned user's operation information set.
In the present embodiment, for each to ID to be matched and other ID, electronic equipment can the IP address associated by two ID be distributed location region and location region weight, generate two vectors, two vector similarities are calculated, as the location information similarity between two ID by such as cosine similarity algorithm, Jaccard similarity algorithm or other similarity algorithms.
Step 203, according to location information similarity, it is determined that other ID mated with above-mentioned ID to be matched.
In the present embodiment, electronic equipment can determine that the location information similarity between ID to be matched is other ID mated with above-mentioned ID to be matched more than other ID of predetermined similarity threshold.In addition, electronic equipment can also according to location information similarity and some other characteristic information (such as, terminal type information associated by ID, operation system information), calculated the probability mated between other ID of ID to be matched and each by the order models of training in advance, and determine that the above-mentioned probability of correspondence mates with above-mentioned ID to be matched more than other ID of predetermined threshold.
In some optional implementations of the present embodiment, in step 201 for each above-mentioned IP address, it is determined that the process of weight in each region, location that IP address is distributed may include that deletes the location areal being distributed more than the IP address of predeterminable range threshold value (such as 3000 meters) more than the distance average of the anchor point coordinate in predetermined number threshold value (such as 5) or region, location with center point coordinate;For remaining each above-mentioned IP address, it is determined that the weight in each region, location that above-mentioned IP address is distributed.Thus use crowd is not fixed, in geographical distribution, wide coverage is also while unfixed IP address, overlay area (such as mobile cellular IP address) is removed, the IP address (the outlet IP address of such as family, company etc.) using user that only analyzing has comparison fixing relatively is analyzed, and improves the accuracy of ID coupling.
In addition, in some optional implementations of the present embodiment, the process of the weight in each region, location that the above-mentioned IP address of determination in step 201 is distributed may include that the number according to the anchor point coordinate in each region, location that IP address is distributed and scope, it is determined that the initial weight in each region, location;The center point coordinate in each region, location being distributed the IP address associated by ID is as center point coordinate corresponding to ID, the center point coordinate that the ID of record in above-mentioned user's operation information set is corresponding is carried out gridding according to geographic layout, generates at least two grid;Obtain the initial weight sum in the region, location, center point coordinate place in each above-mentioned grid that each ID of record is corresponding in above-mentioned user's operation information set, as the frequency that each grid is corresponding with each ID, and obtain the initial weight sum in region, location, center point coordinate place in each grid, as total user's frequency that each grid is corresponding;Based on the above-mentioned frequency, calculated the weight in each region, location by TF-IDF (TermFrequency InverseDocumentFrequency, word frequency-reverse file word frequency) algorithm.
Wherein, the scope of the anchor point coordinate in region, location can represent with the distance average of the anchor point coordinate in region, location with center point coordinate, electronic equipment can pass through the ratio of the number of the anchor point coordinate in the region, all location that number and the IP address of the anchor point coordinate in region, a location are distributed, and above-mentioned scope determines the weight in region, location, wherein, above-mentioned more big weight is more high, the more little weight of scope is more high.TF-IDF algorithm is usually in order to assess the words significance level for a copy of it file in a file set or a corpus, the main thought of this algorithm is:: if the frequency TF that certain word or phrase occur in one section of article is high, and seldom occur in other articles, then think that this word or phrase have good class discrimination ability, be adapted to classification.In the present embodiment, electronic equipment can using grid as word, using ID as file, the weight in each region, location is calculated by TF-IDF algorithm, therefore in the present embodiment, the weight of the distributed areas at the grid place that more high, the corresponding total user's frequency of corresponding with this ID frequency is more low is more high.
This implementation, by the number of the anchor point coordinate in each region, location of being first distributed according to IP address and scope, determine the initial weight in each region, location, it is then based on this initial weight, the weight in each region, location is calculated by TF-IDF algorithm, thus having considered the independence of distributed areas, liveness and range of activity, it is determined that go out more reasonably weight.
In some optional implementations of the present embodiment, in above-mentioned user's operation information set, the ID of record includes first user mark and the second ID, and above-mentioned ID to be matched and each other ID above-mentioned are belonging respectively to first user mark and the second ID.Wherein, first user mark and the second ID can be distinguished by electronic equipment by flag bit, it is the ID of two different product lines that first user mark and the second ID can be distributed, such as, ID when being scanned in webpage by browser, and by searching for the ID applied when APP scans for.If ID to be matched is first user mark, when mating ID to be matched, only the second ID and ID to be matched being carried out Similarity Measure etc. and process, thus decreasing the data volume of calculating, accelerating matching efficiency.
Based on a upper implementation, in some optional implementations of the present embodiment, after step 202, the ID matching process of the present embodiment can also include: according to the order from big to small of the location information similarity between ID to be matched, above-mentioned user's operation information set is chosen in the second ID of record predetermined quantity (such as 50) second ID successively, obtains candidate's the second ID set.And, step 203 may include that the location information similarity between according to each second ID in above-mentioned ID to be matched and above-mentioned candidate the second ID set, it is determined that identifies, with above-mentioned first user to be matched, the second ID mated.By this implementation, decrease the amount of calculation of step 203, improve the efficiency of ID coupling.
Based on a upper implementation, in some optional implementations of the present embodiment, before step 203, the ID matching process of the present embodiment can also include: for each second ID in above-mentioned candidate the second ID set, obtains the location information similarity between above-mentioned second ID and each first user mark;Choose predetermined quantity (such as 50) first user mark successively according to the order from big to small of the location information similarity between above-mentioned second ID, obtain candidate's first user logo collection;If above-mentioned ID to be matched is not in above-mentioned candidate's first user logo collection, then above-mentioned second ID is deleted from above-mentioned candidate the second ID set.By this implementation, ensure that other ID of process participating in step 203 must be predetermined ranking before the location information similarity between ID to be matched comes ID to be matched, and the location information similarity between ID to be matched and this other ID comes other ID of front predetermined ranking of this other ID.Thus decreasing extreme portions non-associated users mark, reducing noise data, promoting matching efficiency, accuracy rate and recall rate.
Some example data processing procedures of the present embodiment are described below in conjunction with Fig. 3 A and Fig. 3 B.The ID matching process of the present embodiment, can first obtain each anchor point coordinate set associated by IP address of record in user's operation information set, wherein, certain anchor point coordinate set associated by IP address can as shown in Figure 3A, and the dot-hatched in Fig. 3 A represents the anchor point coordinate in above-mentioned anchor point coordinate set;Afterwards, at least one region, location being distributed by each IP address of record in the parser above-mentioned user's operation information set of acquisition such as cluster and the weight in each region, location, result can be as shown in Figure 3 B, in Fig. 3 B, 4 each points of mark can represent the centre coordinate point in the region, location that this IP address is distributed, the other numeric representation of mark point for the weight in region, location;Then, it is possible to by the region, location shown in Fig. 3 B and weight, carry out the location information similarity between ID, and determine, according to location information similarity, other ID mated with above-mentioned ID to be matched.
Fig. 4 is the matching effect comparison diagram of an embodiment of the ID matching process according to the application.Wherein, the accuracy rate and recall rate that are based solely on the matching result that IP address carries out mating and undertaken by the ID matching process of the present embodiment mating are each provided in prior art.From fig. 4, it can be seen that accuracy rate and recall rate by the ID matching process of the present embodiment have had certain lifting.
The ID matching process that the present embodiment provides, by obtaining the weight at least one region, location that each procotol IP address of record in user's operation information set is distributed and each region, location, supplement and location information that perfect ID is corresponding;And the IP address associated by ID be distributed location region and each location region weight, obtain the location information similarity between other ID of record in ID to be matched and each above-mentioned user's operation information set, according to location information similarity, determine other ID mated with above-mentioned ID to be matched, it is achieved that accurately and reliably ID is mated.
The flow process 500 of another embodiment of ID matching process according to the application is illustrated with continued reference to Fig. 5, Fig. 5.
As it is shown in figure 5, the ID matching process of the present embodiment comprises the following steps:
Step 501, is analyzed the user's operation information set prestored, and obtains in above-mentioned user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location.
Wherein, the user's operation information in above-mentioned user's operation information set includes following information: ID, IP address, anchor point coordinate.
In the present embodiment, step 501 concrete processes and is referred in Fig. 2 correspondence embodiment the related description of step 201, does not repeat them here.
Step 502, what the IP address associated by ID was distributed positions region and the weight in each region, location, obtains the location information similarity between other ID of record in ID to be matched and each above-mentioned user's operation information set.
In the present embodiment, step 502 concrete processes and is referred in Fig. 2 correspondence embodiment the related description of step 202, does not repeat them here.
Step 503, calculates the IP address similarity between above-mentioned ID to be matched and each other ID.
In the present embodiment, first ID matching process runs on electronic equipment thereon (such as the server shown in Fig. 1) can obtain at least 1 IP address associated by each user, then pass through TF-IDF algorithm or the weight of each IP address of other weighing computation methods calculating, finally according to the weight of the memory IP address, IP address of association, calculate the IP address similarity between ID to be matched and each other ID.
It should be noted that step 503 can perform with step 501 or step 502 simultaneously, it is also possible to performing before step 501, or perform after step 502, its execution sequence is not limited by the present embodiment.
Step 504, according to the location information similarity between above-mentioned ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with above-mentioned ID to be matched.
In the present embodiment, electronic equipment can according to the weight of default location information similarity and IP address similarity, calculate the synthesis pertinence between other ID each of ID to be matched coupling, and determine other ID mated with above-mentioned ID to be matched according to synthesis pertinence.Wherein, the weight of above-mentioned location information similarity and IP address similarity can by manually rule of thumb pre-setting with practical situation.
In some optional implementations of the present embodiment, step 504 may include that the above-mentioned ID to be matched of acquisition and other ID characteristic of correspondence information each, and features described above information includes: the IP address similarity between above-mentioned ID to be matched and other ID, location information similarity;Based on above-mentioned ID to be matched and each other ID characteristic of correspondence information, by the order models of training in advance, obtain ID to be matched and the probability of each other ID coupling;Determine that the above-mentioned probability of correspondence mates with above-mentioned ID to be matched more than other ID of predetermined threshold value.Wherein, above-mentioned order models can be based on the training sample set with mark, is obtained by the LTR such as Pairwise (LearningToRank, study sequence) method training.Wherein it is possible to using features described above information corresponding for known two ID belonging to same user as positive sample, using features described above information corresponding for known two ID of incoherent user as negative sample.This implementation calculates, based on the IP address similarity between ID to be matched and other ID, location information similarity, the probability that other ID is mated with ID to be matched by the order models of training in advance, compare the synthesis pertinence between other ID each calculating ID to be matched coupling by manually arranging weight, more accurately and reliably.
Based on a upper implementation, in some optional implementations of the present embodiment, the user operation data message in above-mentioned user's operation information set can also include: terminal type information, operation system information.And, features described above information also includes at least one in following information: the identical ip addresses quantity between above-mentioned ID to be matched and other ID, corresponding center point coordinate coincidence quantity, above-mentioned ID to be matched and the terminal type information associated by other ID, operation system information.This implementation is when obtaining the probability of ID to be matched and each other ID coupling, it is contemplated that more influence factor, so that the accuracy of coupling is higher.
The Fig. 6 matching effect comparison diagram according to another embodiment of the ID matching process of the application.Wherein, each provide in prior art carry out mating separately through IP address, the ID matching process by Fig. 2 correspondence embodiment and the ID matching process by the present embodiment carry out the accuracy rate of matching result and the recall rate mated.Certain lifting has been had again by the ID matching process of Fig. 2 correspondence embodiment from fig. 6, it can be seen that compared with recall rate by the accuracy rate of the ID matching process of the present embodiment.
As can be known from Fig. 5 and Fig. 6, compared with the embodiment that Fig. 2 is corresponding, the flow process 500 of the ID matching process in the present embodiment adds the reference factor that IP address similarity is mated as ID.Thus, the scheme that the present embodiment describes is referred to more fully influence factor, thus improving matching accuracy.
With further reference to Fig. 7, as the realization to method shown in above-mentioned each figure, this application provides an embodiment of a kind of ID coalignment, this device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and this device specifically can apply in various electronic equipment.
As it is shown in fig. 7, the ID coalignment 700 of the present embodiment includes: location information acquisition unit 701, location information similarity acquiring unit 702 and matching unit 703.Wherein, location information acquisition unit 701 is for being analyzed the user's operation information set prestored, obtain in above-mentioned user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, wherein, the user's operation information in above-mentioned user's operation information set includes following information: ID, IP address, anchor point coordinate;Location information similarity acquiring unit 702 positions region and the weight in each region, location for what the IP address associated by ID was distributed, obtains the location information similarity between other ID of record in ID to be matched and each above-mentioned user's operation information set;Matching unit 703 is for according to location information similarity, it is determined that other ID mated with above-mentioned ID to be matched.
In the present embodiment, location information acquisition unit 701, location information similarity acquiring unit 702 and matching unit 703 concrete processes can respectively with reference in Fig. 2 correspondence embodiment, and the process of step 201, step 202 and step 203 does not repeat them here.
In some optional implementations of the present embodiment, location information acquisition unit 701 may include that coordinate set obtains subelement 7011, for obtaining each anchor point coordinate set associated by IP address of record in above-mentioned user's operation information set;Cluster subelement 7012, for for each above-mentioned IP address, the anchor point coordinate set associated by above-mentioned IP address being carried out cluster analysis, obtains at least one corresponding cluster, as the above-mentioned IP region, location being distributed;Weight determines subelement 7013, for for each above-mentioned IP address, it is determined that the weight in each region, location that above-mentioned IP address is distributed.
Wherein, the anchor point coordinate set associated by above-mentioned IP address can be carried out cluster analysis by K-means algorithm by cluster subelement 7012, obtains at least one corresponding cluster.
In some optional implementations of the present embodiment, weight determines that subelement 7013 may include that extensive IP removes module (not shown), for being deleted more than the IP address of predeterminable range threshold value more than the distance average of the anchor point coordinate in predetermined number threshold value or region, location with center point coordinate by the location being distributed areal;Weight determination module (not shown), for for remaining each above-mentioned IP address, it is determined that the weight in each region, location that above-mentioned IP address is distributed.Extensive IP removes the associated description that the concrete technique effect processing and bringing of module and weight determination module is referred in Fig. 2 correspondence embodiment corresponding implementation, does not repeat them here.
In addition, in some optional implementations of the present embodiment, weight determines that subelement 7013 may include that initial weight determines module (not shown), for the number of anchor point coordinate in each region, location of being distributed according to above-mentioned IP address and scope, it is determined that the initial weight in each region, location;Gridding module (not shown), for the center point coordinate in each region, location that the IP address associated by ID is distributed as center point coordinate corresponding to ID, the center point coordinate that the ID of record in above-mentioned user's operation information set is corresponding is carried out gridding according to geographic layout, generates at least two grid;Frequency acquisition module (not shown), for obtaining the initial weight sum in the region, location, center point coordinate place in each above-mentioned grid that each ID of record is corresponding in above-mentioned user's operation information set, as the frequency that each grid is corresponding with each ID, and obtain the initial weight sum in region, location, center point coordinate place in each grid, as total user's frequency that each grid is corresponding;Weight computation module (not shown), for based on the above-mentioned frequency, calculating the weight of each cluster by TF-IDF algorithm.Wherein, initial weight determines the associated description that the concrete technology effect processing and bringing of module, gridding module, frequency acquisition module and weight computation module is referred in Fig. 2 correspondence embodiment corresponding implementation, does not repeat them here.
In some optional implementations of the present embodiment, the ID coalignment of the present embodiment can also include: IP similarity calculated 704, for calculating the IP address similarity between above-mentioned ID to be matched and each other ID.And, matching unit 703 can be also used for according to the location information similarity between above-mentioned ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with above-mentioned ID to be matched.The concrete technology effect processing and bringing of this implementation is referred in Fig. 5 correspondence embodiment the associated description of step 503 and step 504, does not repeat them here.
Based on a upper implementation, in some optional implementations of the present embodiment, matching unit 703 may include that characteristic information obtains subelement 7031, for obtaining above-mentioned ID to be matched and other ID characteristic of correspondence information each, features described above information includes: the IP address similarity between above-mentioned ID to be matched and other ID, location information similarity;Sequence subelement 7032, for based on above-mentioned ID to be matched and each other ID characteristic of correspondence information, by the order models of training in advance, obtains ID to be matched and the probability of each other ID coupling;Coupling subelement 7033, the above-mentioned probability for determining correspondence mates with above-mentioned ID to be matched more than other ID of predetermined threshold value.Characteristic information obtains the associated description that the concrete technology effect processing and bringing of subelement 7031, sequence subelement 7032 and coupling subelement 7033 is referred in Fig. 5 correspondence embodiment corresponding implementation, does not repeat them here.
Based on a upper implementation, in some optional implementations of the present embodiment, the user operation data message in above-mentioned user's operation information set can also include: terminal type information, operation system information.And, features described above information also includes at least one in following information: the identical ip addresses quantity between above-mentioned ID to be matched and other ID, corresponding center point coordinate coincidence quantity, above-mentioned ID to be matched and the terminal type information associated by other ID, operation system information.The associated description that the concrete technology effect processing and bringing of this implementation is referred in Fig. 5 correspondence embodiment corresponding implementation, does not repeat them here.
In some optional implementations of the present embodiment, in above-mentioned user's operation information set, the ID of record includes first user mark and the second ID, and above-mentioned ID to be matched and each other ID above-mentioned are belonging respectively to first user mark and the second ID.The associated description that the concrete technology effect processing and bringing of this implementation is referred in Fig. 2 correspondence embodiment corresponding implementation, does not repeat them here.
Based on a upper implementation, in some optional implementations of the present embodiment, the ID coalignment of the present embodiment can also include: first chooses unit (not shown), for obtain the location information similarity between other ID recorded in ID to be matched and each above-mentioned user's operation information set at above-mentioned location information similarity acquiring unit after, according to the order from big to small of the location information similarity between above-mentioned ID to be matched, above-mentioned user's operation information set is chosen in the second ID of record predetermined quantity the second ID successively, obtain candidate's the second ID set.And, matching unit 703 can be also used for the location information similarity between according to each second ID in above-mentioned ID to be matched and above-mentioned candidate the second ID set, it is determined that identifies, with above-mentioned first user to be matched, the second ID mated.The associated description that the concrete technology effect processing and bringing of this implementation is referred in Fig. 2 correspondence embodiment corresponding implementation, does not repeat them here.
Based on a upper implementation, in some optional implementations of the present embodiment, location information similarity acquiring unit 702 can be also used at above-mentioned matching unit according to the location information similarity between each second ID in above-mentioned ID to be matched and above-mentioned candidate the second ID set, before determining the second ID mated with above-mentioned first user to be matched mark, for each second ID in above-mentioned candidate the second ID set, obtain the location information similarity between above-mentioned second ID and each first user mark.And, the ID coalignment of the present embodiment can also include: second chooses unit (not shown), for choosing predetermined quantity first user mark successively according to the order from big to small of the location information similarity between above-mentioned second ID, obtain candidate's first user logo collection;Candidate's filter element (not shown), for at above-mentioned matching unit according to the location information similarity between each second ID in above-mentioned ID to be matched and above-mentioned candidate the second ID set, before determining the second ID mated with above-mentioned first user to be matched mark, when above-mentioned ID to be matched is not in above-mentioned candidate's first user logo collection, above-mentioned second ID is deleted from above-mentioned candidate the second ID set.The associated description that the concrete technology effect processing and bringing of this implementation is referred in Fig. 2 correspondence embodiment corresponding implementation, does not repeat them here.
The ID match party device that the present embodiment provides, obtain in user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location by positioning information acquisition unit 701, supplement and location information that perfect ID is corresponding;And position region and the weight in each region, location by what the information similarity acquiring unit 702 IP address associated by ID in location was distributed, obtain the location information similarity between other ID of record in ID to be matched and each above-mentioned user's operation information set, and by matching unit 703 according to location information similarity, determine other ID mated with above-mentioned ID to be matched, it is achieved that accurately and reliably ID is mated.
Below with reference to Fig. 8, it illustrates the structural representation of the computer system 600 being suitable to the server for realizing the embodiment of the present application.
As shown in Figure 8, computer system 800 includes CPU (CPU) 801, its can according to the program being stored in read only memory (ROM) 802 or from storage part 808 be loaded into the program random access storage device (RAM) 803 and perform various suitable action and process.In RAM803, also storage has system 800 to operate required various programs and data.CPU801, ROM802 and RAM803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to bus 804.
It is connected to I/O interface 805: include the storage part 806 of hard disk etc. with lower component;And include the communications portion 807 of the NIC of such as LAN card, modem etc..Communications portion 807 performs communication process via the network of such as the Internet.Driver 808 is connected to I/O interface 805 also according to needs.Detachable media 809, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged in driver 808 as required, in order to the computer program read from it is mounted into storage part 806 as required.
Especially, according to embodiment of the disclosure, the process described above with reference to flow chart may be implemented as computer software programs.Such as, embodiment of the disclosure and include a kind of computer program, it includes the computer program being tangibly embodied on machine readable media, and described computer program comprises the program code for performing the method shown in flow chart.In such embodiments, this computer program can pass through communications portion 807 and be downloaded and installed from network, and/or is mounted from detachable media 809.When this computer program is performed by CPU (CPU) 601, perform the above-mentioned functions limited in the present processes.
Flow chart in accompanying drawing and block diagram, it is illustrated that according to the system of the various embodiment of the application, the architectural framework in the cards of method and computer program product, function and operation.In this, flow chart or each square frame in block diagram can represent a part for a module, program segment or code, and a part for described module, program segment or code comprises the executable instruction of one or more logic function for realizing regulation.It should also be noted that at some as in the realization replaced, the function marked in square frame can also to be different from the order generation marked in accompanying drawing.Such as, two square frames succeedingly represented can essentially perform substantially in parallel, and they can also perform sometimes in the opposite order, and this determines according to involved function.It will also be noted that, the combination of the square frame in each square frame in block diagram and/or flow chart and block diagram and/or flow chart, can realize by the special hardware based system of the function or operation that perform regulation, or can realize with the combination of specialized hardware Yu computer instruction.
It is described in unit involved in the embodiment of the present application to be realized by the mode of software, it is also possible to realized by the mode of hardware.Described unit can also be arranged within a processor, for instance, it is possible to it is described as: a kind of processor includes location information acquisition unit, location information similarity acquiring unit and matching unit.Wherein, the title of these unit is not intended that the restriction to this unit itself under certain conditions, for instance, matching unit is also described as " according to location information similarity, it is determined that with the unit of other ID that described ID to be matched is mated ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, this nonvolatile computer storage media can be the nonvolatile computer storage media comprised in device described in above-described embodiment;Can also be individualism, be unkitted the nonvolatile computer storage media allocating in terminal.Above-mentioned nonvolatile computer storage media storage has one or more program, when one or multiple program are performed by an equipment, make described equipment: the user's operation information set prestored is analyzed, obtain in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, wherein, the user's operation information in described user's operation information set includes following information: ID, IP address, anchor point coordinate;What the IP address associated by ID was distributed positions region and the weight in each region, location, obtains the location information similarity between other ID of record in ID to be matched and each described user's operation information set;According to location information similarity, it is determined that other ID mated with described ID to be matched.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Skilled artisan would appreciate that, invention scope involved in the application, it is not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, when also should be encompassed in without departing from described inventive concept simultaneously, other technical scheme being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed.Such as features described above and (but not limited to) disclosed herein have the technical characteristic of similar functions and replace mutually and the technical scheme that formed.

Claims (20)

1. an ID matching process, it is characterised in that described method includes:
The user's operation information set prestored is analyzed, obtain in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, wherein, the user's operation information in described user's operation information set includes following information: ID, IP address, anchor point coordinate;
What the IP address associated by ID was distributed positions region and the weight in each region, location, obtains the location information similarity between other ID of record in ID to be matched and each described user's operation information set;
According to location information similarity, it is determined that other ID mated with described ID to be matched.
2. method according to claim 1, it is characterized in that, the described user's operation information set to prestoring is analyzed, and obtains in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, including:
Obtain each anchor point coordinate set associated by IP address of record in described user's operation information set;
For each described IP address, the anchor point coordinate set associated by described IP address is carried out cluster analysis, obtain at least one corresponding cluster, as the described IP region, location being distributed;
For each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
3. method according to claim 2, it is characterised in that described for each described IP address, it is determined that the weight in each region, location that described IP address is distributed, including:
The location areal being distributed is deleted more than the IP address of predeterminable range threshold value more than the distance average of the anchor point coordinate in predetermined number threshold value or region, location with center point coordinate;
For remaining each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
4. method according to claim 2, it is characterised in that the described weight determining each region, location that described IP address is distributed, including:
The number of the anchor point coordinate in each region, location being distributed according to described IP address and scope, it is determined that the initial weight in each region, location;
The center point coordinate in each region, location being distributed the IP address associated by ID is as center point coordinate corresponding to ID, the center point coordinate that the ID of record in described user's operation information set is corresponding is carried out gridding according to geographic layout, generates at least two grid;
Obtain the initial weight sum in the region, location, center point coordinate place in each described grid that each ID of record is corresponding in described user's operation information set, as the frequency that each grid is corresponding with each ID, and obtain the initial weight sum in region, location, center point coordinate place in each grid, as total user's frequency that each grid is corresponding;
Based on the described frequency, calculated the weight in each region, location by TF-IDF algorithm.
5. according to the arbitrary described method of claim 1-4, it is characterised in that described method also includes:
Calculate the IP address similarity between described ID to be matched and each other ID;And
Described according to location information similarity, it is determined that other ID mated with described ID to be matched, including:
According to the location information similarity between described ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with described ID to be matched.
6. method according to claim 5, it is characterized in that, described according to the location information similarity between described ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with described ID to be matched, including:
Obtaining described ID to be matched and other ID characteristic of correspondence information each, described characteristic information includes: the IP address similarity between described ID to be matched and other ID, location information similarity;
Based on described ID to be matched and each other ID characteristic of correspondence information, by the order models of training in advance, obtain ID to be matched and the probability of each other ID coupling;
Determine that the described probability of correspondence mates with described ID to be matched more than other ID of predetermined threshold value.
7. method according to claim 6, it is characterised in that the user operation data message in described user's operation information set also includes: terminal type information, operation system information;And
Described characteristic information also includes at least one in following information: the identical ip addresses quantity between described ID to be matched and other ID, corresponding center point coordinate coincidence quantity, described ID to be matched and the terminal type information associated by other ID, operation system information.
8. according to the arbitrary described method of claim 1-4, it is characterized in that, in described user's operation information set, the ID of record includes first user mark and the second ID, and described ID to be matched and each other ID described are belonging respectively to first user mark and the second ID.
9. method according to claim 8, it is characterised in that after the location information similarity between other ID recorded in obtaining ID to be matched and each described user's operation information set, described method also includes:
According to the order from big to small of the location information similarity between described ID to be matched, described user's operation information set is chosen in the second ID of record predetermined quantity the second ID successively, obtains candidate's the second ID set;
And
Described according to location information similarity, it is determined that other ID mated with described ID to be matched, including:
According to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, it is determined that identify, with described first user to be matched, the second ID mated.
10. method according to claim 9, it is characterized in that, according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, before determining the second ID mated with described first user to be matched mark, described method also includes:
For each second ID in described candidate the second ID set, obtain the location information similarity between described second ID and each first user mark;
Choose predetermined quantity first user mark successively according to the order from big to small of the location information similarity between described second ID, obtain candidate's first user logo collection;
If described ID to be matched is not in described candidate's first user logo collection, then described second ID is deleted from described candidate the second ID set.
11. an ID coalignment, it is characterised in that described device includes:
Location information acquisition unit, for the user's operation information set prestored is analyzed, obtain in described user's operation information set the weight at least one region, location that each procotol IP address of record is distributed and each region, location, wherein, the user's operation information in described user's operation information set includes following information: ID, IP address, anchor point coordinate;
Location information similarity acquiring unit, the region, location being distributed for the IP address associated by ID and the weight in each region, location, obtain the location information similarity between other ID of record in ID to be matched and each described user's operation information set;
Matching unit, for according to location information similarity, it is determined that other ID mated with described ID to be matched.
12. device according to claim 11, it is characterised in that described location information acquisition unit includes:
Coordinate set obtains subelement, for obtaining each anchor point coordinate set associated by IP address of record in described user's operation information set;
Cluster subelement, for for each described IP address, the anchor point coordinate set associated by described IP address being carried out cluster analysis, obtains at least one corresponding cluster, as the described IP region, location being distributed;
Weight determines subelement, for for each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
13. device according to claim 12, it is characterised in that described weight determines that subelement includes:
Extensive IP removes module, for being deleted more than the IP address of predeterminable range threshold value more than the distance average of the anchor point coordinate in predetermined number threshold value or region, location with center point coordinate by the location being distributed areal;
Weight determination module, for for remaining each described IP address, it is determined that the weight in each region, location that described IP address is distributed.
14. device according to claim 12, it is characterised in that described weight determines that subelement includes:
Initial weight determines module, for number and the scope of the anchor point coordinate in each region, location of being distributed according to described IP address, it is determined that the initial weight in each region, location;
Gridding module, for the center point coordinate in each region, location that the IP address associated by ID is distributed as center point coordinate corresponding to ID, the center point coordinate that the ID of record in described user's operation information set is corresponding is carried out gridding according to geographic layout, generates at least two grid;
Frequency acquisition module, for obtaining the initial weight sum in the region, location, center point coordinate place in each described grid that each ID of record is corresponding in described user's operation information set, as the frequency that each grid is corresponding with each ID, and obtain the initial weight sum in region, location, center point coordinate place in each grid, as total user's frequency that each grid is corresponding;
Weight computation module, for based on the described frequency, calculating the weight of each cluster by TF-IDF algorithm.
15. according to the arbitrary described device of claim 11-14, it is characterised in that described device also includes:
IP similarity calculated, for calculating the IP address similarity between described ID to be matched and each other ID;And
Described matching unit is additionally operable to according to the location information similarity between described ID to be matched and each other ID and IP address similarity, it is determined that other ID mated with described ID to be matched.
16. device according to claim 15, it is characterised in that described matching unit includes:
Characteristic information obtains subelement, for obtaining described ID to be matched and other ID characteristic of correspondence information each, described characteristic information includes: the IP address similarity between described ID to be matched and other ID, location information similarity;
Sequence subelement, for based on described ID to be matched and each other ID characteristic of correspondence information, by the order models of training in advance, obtains ID to be matched and the probability of each other ID coupling;
Coupling subelement, the described probability for determining correspondence mates with described ID to be matched more than other ID of predetermined threshold value.
17. device according to claim 16, it is characterised in that the user operation data message in described user's operation information set also includes: terminal type information, operation system information;And
Described characteristic information also includes at least one in following information: the identical ip addresses quantity between described ID to be matched and other ID, corresponding center point coordinate coincidence quantity, described ID to be matched and the terminal type information associated by other ID, operation system information.
18. according to the arbitrary described device of claim 11-14, it is characterized in that, in described user's operation information set, the ID of record includes first user mark and the second ID, and described ID to be matched and each other ID described are belonging respectively to first user mark and the second ID.
19. device according to claim 18, it is characterised in that described device also includes:
First chooses unit, for obtain the location information similarity between other ID recorded in ID to be matched and each described user's operation information set at described location information similarity acquiring unit after, according to the order from big to small of the location information similarity between described ID to be matched, described user's operation information set is chosen in the second ID of record predetermined quantity the second ID successively, obtains candidate's the second ID set;And
Described matching unit is additionally operable to the location information similarity between according to each second ID in described ID to be matched and described candidate the second ID set, it is determined that identify, with described first user to be matched, the second ID mated.
20. device according to claim 19, it is characterized in that, described location information similarity acquiring unit is additionally operable at described matching unit according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, before determining the second ID mated with described first user to be matched mark, for each second ID in described candidate the second ID set, obtain the location information similarity between described second ID and each first user mark;And
Described device also includes:
Second chooses unit, identifies for choosing predetermined quantity first user successively according to the order from big to small of the location information similarity between described second ID, obtains candidate's first user logo collection;
Candidate's filter element, for at described matching unit according to the location information similarity between each second ID in described ID to be matched and described candidate the second ID set, before determining the second ID mated with described first user to be matched mark, when described ID to be matched is not in described candidate's first user logo collection, described second ID is deleted from described candidate the second ID set.
CN201610172168.XA 2016-03-24 2016-03-24 User identifier matching process and device Active CN105721629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610172168.XA CN105721629B (en) 2016-03-24 2016-03-24 User identifier matching process and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610172168.XA CN105721629B (en) 2016-03-24 2016-03-24 User identifier matching process and device

Publications (2)

Publication Number Publication Date
CN105721629A true CN105721629A (en) 2016-06-29
CN105721629B CN105721629B (en) 2019-04-26

Family

ID=56159077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610172168.XA Active CN105721629B (en) 2016-03-24 2016-03-24 User identifier matching process and device

Country Status (1)

Country Link
CN (1) CN105721629B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228187A (en) * 2016-07-21 2016-12-14 贵州力创科技发展有限公司 Individual recognizer model based on multiple user's detail data and treatment technology
CN106789411A (en) * 2016-12-07 2017-05-31 北京亚鸿世纪科技发展有限公司 The acquisition method and device of IP data are enlivened in a kind of computer room
CN109005513A (en) * 2018-06-26 2018-12-14 北京酷云互动科技有限公司 Mobile phone terminal correlating method and mobile phone terminal interconnected system
CN109104506A (en) * 2017-06-20 2018-12-28 腾讯科技(深圳)有限公司 The determination method, apparatus and computer readable storage medium of domain name mapping rule
CN109447114A (en) * 2018-09-25 2019-03-08 北京酷云互动科技有限公司 The appraisal procedure and assessment system of the degree of association between place
US10348745B2 (en) 2017-01-05 2019-07-09 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address
CN110493368A (en) * 2019-08-21 2019-11-22 北京明略软件***有限公司 The matching process and device of device identification
CN111127094A (en) * 2019-12-19 2020-05-08 秒针信息技术有限公司 Account matching method and device, electronic equipment and storage medium
WO2021093308A1 (en) * 2019-11-13 2021-05-20 百度在线网络技术(北京)有限公司 Method and apparatus for extracting poi name, device, and computer storage medium
CN117172792A (en) * 2023-11-02 2023-12-05 赞塔(杭州)科技有限公司 Customer information management method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409868A (en) * 2008-12-01 2009-04-15 腾讯科技(深圳)有限公司 Method, system and equipment for matching object in mobile terminal
CN102056079A (en) * 2009-10-30 2011-05-11 ***通信集团上海有限公司 Method, device and system for determining information to be pushed
US20120174205A1 (en) * 2010-12-31 2012-07-05 International Business Machines Corporation User profile and usage pattern based user identification prediction
CN105187237A (en) * 2015-08-12 2015-12-23 百度在线网络技术(北京)有限公司 Method and device for searching associated user identifications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409868A (en) * 2008-12-01 2009-04-15 腾讯科技(深圳)有限公司 Method, system and equipment for matching object in mobile terminal
CN102056079A (en) * 2009-10-30 2011-05-11 ***通信集团上海有限公司 Method, device and system for determining information to be pushed
US20120174205A1 (en) * 2010-12-31 2012-07-05 International Business Machines Corporation User profile and usage pattern based user identification prediction
CN105187237A (en) * 2015-08-12 2015-12-23 百度在线网络技术(北京)有限公司 Method and device for searching associated user identifications

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228187A (en) * 2016-07-21 2016-12-14 贵州力创科技发展有限公司 Individual recognizer model based on multiple user's detail data and treatment technology
CN106789411B (en) * 2016-12-07 2020-01-21 北京亚鸿世纪科技发展有限公司 Method and device for acquiring active IP data in machine room
CN106789411A (en) * 2016-12-07 2017-05-31 北京亚鸿世纪科技发展有限公司 The acquisition method and device of IP data are enlivened in a kind of computer room
US11394728B2 (en) 2017-01-05 2022-07-19 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address
US10348745B2 (en) 2017-01-05 2019-07-09 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address
CN109104506A (en) * 2017-06-20 2018-12-28 腾讯科技(深圳)有限公司 The determination method, apparatus and computer readable storage medium of domain name mapping rule
CN109005513B (en) * 2018-06-26 2021-03-19 北京酷云互动科技有限公司 Mobile phone terminal association method and mobile phone terminal association system
CN109005513A (en) * 2018-06-26 2018-12-14 北京酷云互动科技有限公司 Mobile phone terminal correlating method and mobile phone terminal interconnected system
CN109447114A (en) * 2018-09-25 2019-03-08 北京酷云互动科技有限公司 The appraisal procedure and assessment system of the degree of association between place
CN110493368A (en) * 2019-08-21 2019-11-22 北京明略软件***有限公司 The matching process and device of device identification
WO2021093308A1 (en) * 2019-11-13 2021-05-20 百度在线网络技术(北京)有限公司 Method and apparatus for extracting poi name, device, and computer storage medium
US11768892B2 (en) 2019-11-13 2023-09-26 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for extracting name of POI, device and computer storage medium
CN111127094A (en) * 2019-12-19 2020-05-08 秒针信息技术有限公司 Account matching method and device, electronic equipment and storage medium
CN111127094B (en) * 2019-12-19 2023-08-25 秒针信息技术有限公司 Account matching method and device, electronic equipment and storage medium
CN117172792A (en) * 2023-11-02 2023-12-05 赞塔(杭州)科技有限公司 Customer information management method and device

Also Published As

Publication number Publication date
CN105721629B (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN105721629A (en) User identifier matching method and device
CN106446228B (en) Method and device for collecting and analyzing WEB page data
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
CN104077723B (en) A kind of social networks commending system and method
CN107515915A (en) User based on user behavior data identifies correlating method
CN110516173B (en) Illegal network station identification method, illegal network station identification device, illegal network station identification equipment and illegal network station identification medium
CN111815169B (en) Service approval parameter configuration method and device
CN109740129B (en) Report generation method, device and equipment based on blockchain and readable storage medium
CN108255706A (en) Edit methods, device, terminal device and the storage medium of automatic test script
CN111177481B (en) User identifier mapping method and device
CN110414613B (en) Method, device and equipment for clustering regions and computer readable storage medium
CN104965846B (en) Visual human's method for building up in MapReduce platform
CN111612085A (en) Method and device for detecting abnormal point in peer-to-peer group
CN111626767A (en) Resource data distribution method, device and equipment
CN107948312B (en) Information classification and release method and system with position points as information access ports
CN105988998B (en) Relational network construction method and device
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN108959289B (en) Website category acquisition method and device
CN107368407A (en) Information processing method and device
CN110532254A (en) The method and apparatus of fused data table
CN115936758A (en) Intelligent customer-extending method based on big data and related device
CN105450678B (en) A kind of information determining method and device
CN110019531A (en) A kind of method and apparatus obtaining analogical object set
CN114781517A (en) Risk identification method and device and terminal equipment
CN111860655A (en) User processing method, device and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant