Summary of the invention
This specification is designed to provide a kind of independent Statistics of accessing population method, apparatus, electronic equipment, reduces independent visitor
The volume of transmitted data of statistics improves the data-handling efficiency and calculating upper limit of independent Statistics of accessing population.
First aspect this specification embodiment provides a kind of independent Statistics of accessing population method, including:
At least one mark obtains node and obtains the user identifier for accessing website to be counted, respectively by the user identifier into
Row duplicate removal obtains duplicate removal user identifier;
At least one described mark obtains node and the duplicate removal user identifier is sent to mark merge node respectively;
The duplicate removal user identifier that the mark merge node will receive merges, and determines the website to be counted
Independent visitor information.
Further, in another embodiment of the method, at least one described mark obtains node respectively will be described
User identifier carries out duplicate removal, obtains duplicate removal user identifier, including:
At least one mark, which obtains node and is utilized respectively sets cardinal method, carries out duplicate removal for the user identifier, described in acquisition
Duplicate removal user identifier.
Further, described that the user identifier is subjected to duplicate removal respectively in another embodiment of the method, it obtains
Duplicate removal user identifier, including:
It is utilized respectively Bloom filter and the user identifier is subjected to duplicate removal, obtain the duplicate removal user identifier.
Further, in another embodiment of the method, the method also includes:
Primary independent Statistics of accessing population is carried out in real time or every preset time, updates the independent visitor information.
Further, in another embodiment of the method, at least one described mark obtains node and obtains user's mark
Know, including:
At least one described mark obtains the page browsing data that node obtains the website to be counted respectively, according to described
Page browsing data obtain the user identifier.
Further, in another embodiment of the method, the user identifier includes:Access the website to be counted
Client device identification.
Second aspect, this specification embodiment additionally provide a kind of independent Statistics of accessing population method, including:
Identifier acquisition module, for obtaining the user identifier for accessing website to be counted;
Deduplication module is identified, for the user identifier to be carried out duplicate removal, obtains duplicate removal user identifier;
Sending module is identified, for the duplicate removal user identifier to be sent to mark merge node;
It identifies merging module and determines the net to be counted for merging the duplicate removal user identifier received
The independent visitor information stood.
Further, in another embodiment of described device, the mark deduplication module is specifically used for:
The user identifier is subjected to duplicate removal using sets cardinal method, obtains the duplicate removal user identifier.
Further, in another embodiment of described device, the mark deduplication module is specifically used for:
The user identifier is subjected to duplicate removal using Bloom filter, obtains the duplicate removal user identifier.
Further, in another embodiment of described device, the mark obtains node and the mark merge node
It is also used to:
For carrying out primary independent Statistics of accessing population in real time or every preset time, the independent visitor information is updated.
Further, in another embodiment of described device, the identifier acquisition module is specifically used for:
The page browsing data for obtaining the website to be counted obtain the user according to the page browsing data and mark
Know.
Further, in another embodiment of described device, the user identifier includes:Access the website to be counted
Client device identification.
The third aspect, this specification embodiment additionally provide a kind of independent Statistics of accessing population system, including:Identify merge node
It is identified at least one and obtains node, the mark merge node and the mark obtain node and respectively include at least one processing
Device and memory for storage processor executable instruction, wherein realized when the processor executes described instruction and include
Following steps:
The mark obtains realization when the processor in node executes described instruction and includes the following steps:
Mark obtains node and obtains the user identifier for accessing website to be counted, and the user identifier is carried out duplicate removal respectively,
Obtain duplicate removal user identifier;
The duplicate removal user identifier is sent to mark merge node;
Processor in the mark merge node is realized when executing described instruction to be included the following steps:
The duplicate removal user identifier received is merged, determines the independent visitor information of the website to be counted.
Fourth aspect, this specification embodiment additionally provide a kind of independent Statistics of accessing population method, including:
The user identifier that node obtains access website to be counted is obtained using mark, respectively obtains the mark in node
User identifier carry out duplicate removal, obtain duplicate removal user identifier;
Each mark is obtained into the duplicate removal user identifier in node and is sent to mark merge node;
The duplicate removal user identifier in the mark merge node is merged, determines the only of the website to be counted
Vertical visitor information.
5th aspect, this specification embodiment additionally provide a kind of independent Statistics of accessing population processing equipment, including:
At least one processor and memory for storage processor executable instruction, described in the processor executes
It realizes and includes the following steps when instruction:
The user identifier that node obtains access website to be counted is obtained using mark, respectively obtains the mark in node
User identifier carry out duplicate removal, obtain duplicate removal user identifier;
Each mark is obtained into the duplicate removal user identifier in node and is sent to mark merge node;
The duplicate removal user identifier in the mark merge node is merged, determines the only of the website to be counted
Vertical visitor information.
Independent Statistics of accessing population method, apparatus, the processing equipment, system of this specification offer obtain node using mark and obtain
The user identifier that takes the user identifier for accessing website to be counted, and will acquire carries out duplicate removal processing, obtains duplicate removal user identifier.
Identify merge node obtain upstream mark obtain node obtain duplicate removal user identifier, and will acquire to expression go multiple knot into
Row merges, and determines the independent visitor information of website to be counted.Mark obtains node before user identifier transmission to getting
User identifier has carried out deduplication operation, when mark obtains node and transmits user identifier to downstream node, by the user of full dose
Managing detailed catalogue transmission is identified, the data transmission of the user identifier after being converted into duplicate removal reduces volume of transmitted data, improves independence
The calculating upper limit of Statistics of accessing population, improves data transmission efficiency.And it is possible to not need outer memory module, can also be not required to
Change original independent Statistics of accessing population system, reduces the cost of independent Statistics of accessing population.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation
Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described
Embodiment be only this specification a part of the embodiment, instead of all the embodiments.The embodiment of base in this manual,
Every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all should belong to
The range of this specification protection.
Independent visitor can indicate different, by internet access, the natural person of one webpage of browsing or client.Example
Such as:One local area network is externally an identical IP (Internet Protocol Address, Internet protocol address), but
It is to have 10 people while accessing, then independent visitor is 10, and unique IP is only 1;If a user, when online, frequently goes offline, dialing
It opens for 10 times by statistics website, at this point, independent visitor is only calculated as 1, and IP number is then counted as 10.
The number of users to browse web sites can be counted by the quantity of the independent visitor of statistics website, be the big number of website
Data basis is provided according to analysis.
In this specification embodiment, node is obtained using mark and obtains user identifier, and in the case where user identifier to be sent to
Duplicate removal processing is carried out to the user identifier acquired before one node, the user identifier after duplicate removal is sent to next node i.e.
Merge node is identified, further the user identifier after duplicate removal is merged using mark merge node, statistics determines independence
Visitor information.The user identifier detailed data of full dose is transmitted, the subscriber identity data transmission after being converted to duplicate removal reduces
Volume of transmitted data improves data transmission efficiency, improves independent Statistics of accessing population calculating upper limit.Meanwhile this specification embodiment
The independent Statistics of accessing population method provided, does not need External memory equipment, does not need to increase new device, reduces independent visitor's system
The cost of meter.
Specifically, Fig. 1 is the workflow schematic diagram of change exchange system in this specification one embodiment, such as Fig. 1 institute
Show, the independent Statistics of accessing population method provided in this specification one embodiment may include:
Step 102, at least one mark obtain node and obtain the user identifier for accessing website to be counted, respectively by the use
Family mark carries out duplicate removal, obtains duplicate removal user identifier.
Fig. 2 is the principle framework figure of independent Statistics of accessing population in this specification embodiment, as shown in Fig. 2, this specification is implemented
The independent Statistics of accessing population method provided in example can obtain node by mark and obtain the user identifier for accessing website to be counted, use
Family mark may include the client for accessing website to be counted device identification (such as:Machine number), the registration information of user etc..Such as
Shown in Fig. 2, mark obtains node can be 2 or 2 or more under normal conditions, and certainly, the mark for removing failure obtains node
And it is not involved in the expression acquisition node of independent Statistics of accessing population.Certainly, according to actual needs, it is also possible to which a mark obtains section
Point.Each mark obtains the user identifier that node is got may be different, and each mark obtains node and getting user identifier
Afterwards, duplicate removal processing can be carried out to the user identifier got respectively, removes duplicate user's mark in the user identifier got
Know, obtains duplicate removal user identifier.
Such as:There are 2 marks to obtain node in Fig. 2,2 marks obtain node and obtain user identifier respectively, some users
The webpage of website to be counted may be repeatedly browsed in a client, mark obtains the user identifier that node is got may
There are multiple duplicate user identifiers.Such as:One of mark obtains the user identifier that node is got:(u0,u2,u5,
U1, u2, u6, u4, u0, u6, u3, u7), it can be seen that wherein user identifier u0, u2, u6 is duplicated, which obtains node
Duplicate removal processing can be carried out to the mark got, duplicate user identifier only be retained one, extra user identifier carries out
It deletes, obtaining duplicate removal user identifier is:u0,u2,u5,u1,u6,u4,u3,u7.Likewise, another mark acquisition node can
To carry out identical operation, duplicate removal user identifier is obtained.It should be noted that user identifier u0 in this specification embodiment,
U2 etc. is only schematical expression, and the practical representation of user identifier can be configured according to actual needs, such as:It can be with
It is machine number or the others device identification etc. of the client to browse web sites, this specification embodiment is not especially limited.
In this specification one or more embodiment, the acquisition methods of user identifier may include:The mark obtains section
Point obtains the page browsing data of website to be counted, obtains the user identifier according to the page browsing data.Page browsing
Data can indicate the page browsing amount (PV, page view) of website to be counted, and user is each time to webpage each in website
Access can be recorded once, and multiple access of the user to the same page, amount of access can add up.Mark obtains node can be with
The user identifier for accessing website to be counted each time is obtained according to the page browsing data of website to be counted.Different marks obtains
Node may obtain the page browsing data of the different web pages of website to be counted, further count and access the website to be counted
The user identifier of the webpage.Multiple marks can be set and obtain node, be respectively used to obtain the use of website different web pages to be counted
Family mark.At this point, each mark obtains node can obtain the user identifier of access different web pages simultaneously, it can also be different
Time obtains the user identifier of access different web pages, and this specification embodiment is not especially limited.Utilize the page of website to be counted
Face browses data acquisition user identifier, and method is simple, easy to operate.
Certainly, different marks obtains node can also obtain the different time sections access same webpage in website to be counted with user
Or the user identifier of all webpages, such as:Mark obtains node 1 for obtaining 8:00-9:User's mark of 00 access website to be counted
Know, mark, which obtains node 2, can be used for obtaining 9:00-10:The user identifier of 00 access website to be counted.It can be according to practical need
Mark is set and obtains the rule that node obtains user identifier, this specification embodiment is not especially limited.
Step 104, at least one described mark obtain node and the duplicate removal user identifier are sent to mark merging respectively
Node.
As shown in Fig. 2, mark obtains node after the user identifier that will acquire carries out duplicate removal processing, it can be by acquisition
Duplicate removal user identifier is respectively sent to the mark merge node in downstream.In this specification one or more embodiment, mark merges
Node can be with only one, and the duplicate removal user identifier that each mark obtains node and can will acquire is respectively sent to mark conjunction
And node.Mark obtains the duplicate removal user identifier that can will acquire simultaneously of node and is sent to mark merge node, can also be
The duplicate removal user identifier that different time points will acquire is sent to mark merge node, and mark obtains node and can get
Duplicate removal user identifier is sent to mark merge node in real time after duplicate removal user identifier, duplicate removal user identifier can also got
Duplicate removal user identifier is sent to mark merge node after a certain period of time afterwards, this specification embodiment is not especially limited.
The duplicate removal user identifier that step 106, the mark merge node will receive merges, and determines independence
Visitor information.
The duplicate removal user identifier that mark merge node is got is to obtain node by multiple marks to send, and mark merges section
May still there be duplicate user identifier in the duplicate removal user identifier that point receives.Such as:Mark obtains the user that node 1 will acquire
After identifying duplicate removal, the duplicate removal user identifier of acquisition includes:(u0,u2,u5,u1,u6,u4,u3,u7);Mark obtains node 2 and will obtain
After the user identifier duplicate removal got, the duplicate removal user identifier of acquisition includes:(u2,u4,u0,u8,u3,u1,u7,u3).Mark is closed
And each mark can be obtained the duplicate removal user that node is sent when the duplicate removal user identifier that will acquire merges by node
Mark carries out deduplication operation again, i.e., duplicate user identifier is only retained one, delete extra user identifier.Such as:It is above-mentioned
Duplicate removal user identifier (u0, u2, u5, u1, u6, u4, u3, u7) that embodiment receives and (u2, u4, u0, u8, u3, u1, u7,
U3), it is after merging:(u0,u2,u5,u1,u6,u4,u3,u7,u8).After duplicate removal user identifier is merged, it can determine wait unite
The independent visitor information of website is counted, independent visitor information may include the user identifier for accessing the independent visitor of website to be counted
And/or the quantity of independent visitor.The independent visitor information for the website to be counted determined can be saved, as shown in Fig. 2,
The independent visitor information of determining website to be counted can be saved into database, be that the data of subsequent website to be counted are analyzed
Data basis is provided.
In this specification one or more embodiment, the user identifier of website to be counted can be obtained in real time, carried out independent
The real-time statistics of visitor, real-time statistics, which can be understood as recognizing when user accesses website to be counted, carries out independent visitor's
Statistical updating, or it can be appreciated that real-time monitoring website to be counted access situation, recognized user access it is to be counted
The statistics of independent visitor is at the appointed time carried out when website in section.This specification one embodiment can also be supervised every preset time
The access situation for surveying primary website to be counted carries out primary independent Statistics of accessing population.Such as:Acquisition node is identified every preset time to obtain
The user identifier of primary website to be counted is taken, and carries out duplicate removal, duplicate removal user identifier is obtained, duplicate removal user identifier is sent to mark
Know merge node, the duplicate removal user identifier that mark merge node will acquire merges, and determines the independence of website to be counted
Visitor information obtains updated independent visitor information.The setting of preset time can according to need selection, and this specification is implemented
Example is not especially limited.The independent visitor information for counting acquisition each time can be saved, be subsequent website to be counted
Data analysis provides data basis.
In addition, in this specification one or more embodiment, can with the independent visitor information of real-time statistics website to be counted,
The independent visitor information of designated time period website to be counted can also be counted, to analyze website to be counted in different time sections
Access situation.
The independent Statistics of accessing population method that this specification embodiment provides obtains node using mark and obtains access net to be counted
The user identifier stood, and the user identifier that will acquire carries out duplicate removal processing, obtains duplicate removal user identifier.Mark merge node obtains
Take upstream mark obtain node obtain duplicate removal user identifier, and will acquire to expression go multiple knot to merge, determine
The independent visitor information of website to be counted.Mark obtains node and carries out before user identifier transmission to the user identifier got
Deduplication operation realizes when mark acquisition node transmits user identifier to downstream node and believes the user identifier detail of full dose
Breath transmission, the data transmission of the user identifier after being converted into duplicate removal, reduces volume of transmitted data, improves independent Statistics of accessing population
Calculating upper limit improves data transmission efficiency.And it is possible to outer memory module is not needed, it is original it is also not necessary to change
Independent Statistics of accessing population system, reduce the cost of independent Statistics of accessing population.
It is on the basis of the above embodiments, described that the user identifier is subjected to duplicate removal in this specification one embodiment,
Duplicate removal user identifier is obtained, may include:
The user identifier is subjected to duplicate removal using sets cardinal method, obtains the duplicate removal user identifier.
Sets cardinal method can be used to estimate the number of element different in a set (or number), not be total amount of data
Estimation, nor radix accurately calculates, but with the thought of probabilistic algorithm, to use low spatial and time cost, very with one
Low degree of error carrys out the radix of estimated data.Radix can indicate the number of element (or number) different in a set.It is each
Mark, which obtains node, can use sets cardinal method to the user identifier progress duplicate removal processing got, generate cardinal data block,
It may include the duplicate removal user identifier after duplicate removal in cardinal data block.The user identifier got is gone using sets cardinal
Weight, it can be understood as the user identifier detail of the full dose of script is subjected to Information Compression (such as:Originally 1,000,000,000 user identifier is bright
Carefully, about 50GB can be compressed to 64KB size), it is possible to reduce subsequent volume of transmitted data.
In this specification one embodiment, when carrying out duplicate removal to user identifier using sets cardinal method, mark merges section
The combinable property that point can use sets cardinal merges the duplicate removal user identifier got.
Such as:Mark obtains the user identifier that node 1 will acquire and carries out the duplicate removal user obtained after sets cardinal duplicate removal
Be identified as (u0, u2, u5, u1, u6, u4, u3, u7), mark obtain node 2 will acquire to user identifier carry out sets cardinal go
The duplicate removal user identifier obtained after weight is (u2, u4, u0, u8, u3, u1, u7, u3), and mark merge node utilizes sets cardinal method
Combinable property can be merged after user identifier be:(u0, u2, u5, u1, u6, u4, u3, u7, u8), independent visitor
Quantity be 9.
In another embodiment of this specification, it can also use Bloom filter that the user identifier is carried out duplicate removal, obtain
Obtain the duplicate removal user identifier.Bloom filter can be a very long binary vector and a series of random mapping functions,
It can be used for retrieving an element whether in a set.Mark acquisition node can be used Bloom filter and successively judge to use
Family identifies whether in duplicate removal user identifier set, realizes the duplicate removal to user identifier.Certainly, it can also select according to actual needs
It selects other methods and duplicate removal is carried out to user identifier, this specification embodiment is not especially limited.
This specification embodiment using sets cardinal method or Bloom filter may be implemented to the user identifier got into
Row duplicate removal realizes the compression processing of full dose user identity information, reduces subsequent data transmission amount, improves independent visitor's system
The calculating upper limit and data-handling efficiency of meter.
Fig. 3 is a kind of principle framework schematic diagram of independent Statistics of accessing population in the prior art, as shown in figure 3, in the prior art,
In the independent Statistics of accessing population method for carrying out duplicate removal using external storage, mapper node is needed (such as:Mapper0 node,
Mapper1 node) user identifier is obtained, and identical user identifier is sent to downstream in all user identifiers that will acquire
Identical merge node is (such as:Merge0 node, merge1 node).Such as:Mapper0 node will acquire (u0, u4, u2,
U6 ...) user identifier is sent in merge0 node, and (u1, u3, u7, u5 ...) user identifier that will acquire is sent to merge1
In node.(u2, u4, u0, u8 ...) user identifier that mapper1 node will acquire is sent in merge0 node, will acquire
To (u3, u1, u7, u3 ...) user identifier be sent in merge1 node.Merge0 node and merge1 node pass through outside
Equipment is stored, is sent to statistics node by duplicate user identifier duplicate removal, then by the user identifier of duplicate removal, statistics node carries out phase
Add, determine independent visitor information, stores into database.User identifier duplicate removal is carried out by External memory equipment, outside is deposited
It is too big to store up pressure, when number of users magnitude is very big, calculates response (reading the response time of storage user id) and carrying cost
Too high (needing to store the user identifier of full dose user, 1,000,000,000 user identifier detail about needs the space 50GB), can support
The independent visitor of 100w/s calculates, and main bottleneck is the response speed of memory node.Also, mapper node is saved to merge
Point send subscriber identity data when, transmission be full dose subscriber identity data, volume of transmitted data is bigger, influence data transmission
Efficiency.
The independent Statistics of accessing population method that this specification embodiment provides does the subscriber identity data transmission plan of full dose
Optimization and transformation, the data transmission of the user identifier after being converted to duplicate removal is (such as:The transmission of sets cardinal data), reduce number
According to transmission quantity.Such as:The independent visitor of 1000w/s calculates, it is contemplated that needs bandwidth 500MB/s bandwidth, this specification embodiment can
To use the sets cardinal of 64KB, it is assumed that have 100 concurrent nodes, it is only necessary to which 6.4MB/s bandwidth can complete independent visitor
Statistics calculates.The independent Statistics of accessing population that the test effect of this specification one embodiment can achieve 5000w/s calculates.
Various embodiments are described in a progressive manner for the above method in this specification, identical between each embodiment
Similar part may refer to each other, and each embodiment focuses on the differences from other embodiments.Correlation
Place illustrates referring to the part of embodiment of the method.
Based on independent Statistics of accessing population method described above, this specification one or more embodiment also provides a kind of independence
Statistics of accessing population device.The device may include system (including the distribution for having used this specification embodiment the method
System), software (application), module, component, server, client etc. and combine the necessary device for implementing hardware.Based on same
Innovation thinking, the device in one or more embodiments that this specification embodiment provides is as described in the following examples.Due to
Before the implementation that device solves the problems, such as is similar to method, therefore the implementation of the specific device of this specification embodiment may refer to
The implementation of method is stated, overlaps will not be repeated.Used below, predetermined function may be implemented in term " unit " or " module "
The combination of the software and/or hardware of energy.It is hard although device described in following embodiment is preferably realized with software
The realization of the combination of part or software and hardware is also that may and be contemplated.
Specifically, Fig. 4 is the modular structure schematic diagram of independent Statistics of accessing population device one embodiment that this specification provides,
As shown in figure 4, the independent Statistics of accessing population device provided in this specification includes:Identifier acquisition module 41, mark deduplication module 42,
Sending module 43, mark merging module 44 are identified, wherein:
Identifier acquisition module 41 can be used for obtaining the user identifier for accessing website to be counted;
Deduplication module 42 is identified, can be used for the user identifier carrying out duplicate removal, obtain duplicate removal user identifier;
Sending module 43 is identified, can be used for the duplicate removal user identifier being sent to mark merge node;
Identify merging module 44, the duplicate removal user identifier that can be used for receive merges, determine it is described to
Count the independent visitor information of website.
The independent Statistics of accessing population device that this specification embodiment provides obtains node using mark and obtains access net to be counted
The user identifier stood, and the user identifier that will acquire carries out duplicate removal processing, obtains duplicate removal user identifier.Mark merge node obtains
Take upstream mark obtain node obtain duplicate removal user identifier, and will acquire to expression go multiple knot to merge, determine
The independent visitor information of website to be counted.Mark obtains node and carries out before user identifier transmission to the user identifier got
Deduplication operation passes the user identifier managing detailed catalogue of full dose when mark obtains node and transmits user identifier to downstream node
Defeated, the data transmission of the user identifier after being converted into duplicate removal reduces volume of transmitted data, improves the calculating of independent Statistics of accessing population
The upper limit improves data transmission efficiency.And it is possible to outer memory module is not needed, it is original only it is also not necessary to change
Vertical Statistics of accessing population system, reduces the cost of independent Statistics of accessing population.
On the basis of the above embodiments, the mark deduplication module is specifically used for:
The user identifier is subjected to duplicate removal using sets cardinal method, obtains the duplicate removal user identifier.
On the basis of the above embodiments, the mark deduplication module is specifically used for:
The user identifier is subjected to duplicate removal using Bloom filter, obtains the duplicate removal user identifier.
The independent Statistics of accessing population device that this specification embodiment provides, can be real using sets cardinal method or Bloom filter
Duplicate removal now is carried out to the user identifier got, the compression processing of full dose user identity information is realized, reduces follow-up data
Transmission quantity improves the calculating upper limit and data-handling efficiency of independent Statistics of accessing population.
On the basis of the above embodiments, the mark obtains node and the mark merge node is also used to:
For carrying out primary independent Statistics of accessing population in real time or every preset time, the independent visitor information is updated.
This specification embodiment may be implemented real-time or regular independent Statistics of accessing population, be the data of subsequent website to be counted
Analysis provides data basis.
On the basis of the above embodiments, the identifier acquisition module is specifically used for:
The page browsing data for obtaining the website to be counted obtain the user according to the page browsing data and mark
Know.
This specification embodiment provides data basis for the data analysis of subsequent website to be counted.
On the basis of the above embodiments, the user identifier includes:Access setting for the client of the website to be counted
Standby mark.
This specification embodiment can be counted accurately using the device identification for the client for accessing website to be counted
Visitor's quantity of website to be counted is accessed, provides data basis for the data analysis of subsequent website to be counted.
It should be noted that device described above can also include other embodiment party according to the description of embodiment of the method
Formula.Concrete implementation mode is referred to the description of related method embodiment, does not repeat one by one herein.
Fig. 5 is the structural schematic diagram for the independent Statistics of accessing population system that this specification embodiment provides, as shown in figure 5, this theory
Independent Statistics of accessing population system may include that mark merge node and at least one mark obtain node in bright book one embodiment, solely
There is usually one the mark merge node and the mark obtain node and respectively include at least one processing vertical merge node
Device and memory for storage processor executable instruction, wherein realized when the processor executes described instruction and include
Following steps:
The mark obtains realization when the processor in node executes described instruction and includes the following steps:
Mark obtains node and obtains the user identifier for accessing website to be counted, and the user identifier is carried out duplicate removal respectively,
Obtain duplicate removal user identifier;
The duplicate removal user identifier is sent to mark merge node;
Processor in the mark merge node is realized when executing described instruction to be included the following steps:
The duplicate removal user identifier received is merged, determines the independent visitor information of the website to be counted.
The independent Statistics of accessing population system that this specification embodiment provides obtains node using mark and obtains access net to be counted
The user identifier stood, and the user identifier that will acquire carries out duplicate removal processing, obtains duplicate removal user identifier.Mark merge node obtains
Take upstream mark obtain node obtain duplicate removal user identifier, and will acquire to expression go multiple knot to merge, determine
The independent visitor information of website to be counted.Mark obtains node and carries out before user identifier transmission to the user identifier got
Deduplication operation passes the user identifier managing detailed catalogue of full dose when mark obtains node and transmits user identifier to downstream node
Defeated, the data transmission of the user identifier after being converted into duplicate removal reduces volume of transmitted data, improves the calculating of independent Statistics of accessing population
The upper limit improves data transmission efficiency.
Fig. 6 is the flow diagram for the independent Statistics of accessing population method that the another embodiment of this specification provides, as shown in fig. 6,
This specification embodiment provide independent Statistics of accessing population method may include:
Step 602:The user identifier that node obtains access website to be counted is obtained using mark, respectively obtains the mark
It takes the user identifier in node to carry out duplicate removal, obtains duplicate removal user identifier.
The definition of user identifier can refer to the record of above-described embodiment, and details are not described herein again.This specification embodiment can
With using the user identifier for obtaining node and obtaining access website to be counted is identified, the specific method for obtaining user identifier can be referred to
The record of above-described embodiment, such as:It is obtained by obtaining page browsing data, this specification embodiment does not repeat.Get use
After the mark of family, mark can be obtained into the user identifier in node respectively and carry out duplicate removal processing, such as:Sets cardinal can be passed through
The methods of method, Bloom filter carry out duplicate removal processing to the user identifier got, and it is corresponding to obtain each mark acquisition node
Duplicate removal user identifier.The quantity that mark obtains node can be 2 or 2 or more, certainly, according to actual needs, or 1
It is a.
Step 604:Each mark is obtained into the duplicate removal user identifier in node and is sent to mark merge node.
After getting duplicate removal user identifier, each mark can be obtained into the duplicate removal user identifier in node and be sent to mark
Merge node.It may include participating in all marks of independent Statistics of accessing population to obtain nodes that each mark, which obtains node, can also be with
By no acquisition user identifier, without carry out user's duplicate removal, the special circumstances such as failure mark acquisition knot-removal after
Mark obtains node.
Step 606:The duplicate removal user identifier in the mark merge node is merged, is determined described to be counted
The independent visitor information of website.
The duplicate removal user identifier identified in merge node is merged, can indicate that each mark, which is obtained node, to be sent
Duplicate removal user identifier carry out further duplicate removal, such as:The combinable property that can use sets cardinal will be in duplicate removal user identifier
Duplicate user identifier is deleted, and the independent visitor information of website to be counted is obtained.Independent visitor information may include accessing wait unite
Device identification, the registration information of user of client of independent visitor etc. of website are counted, it specifically can be with reference to above-described embodiment
It records, details are not described herein again.
The independent Statistics of accessing population method that this specification embodiment provides obtains node using mark and obtains access net to be counted
The user identifier stood, and the user identifier that will acquire carries out duplicate removal processing, obtains duplicate removal user identifier.Mark merge node obtains
Take upstream mark obtain node obtain duplicate removal user identifier, and will acquire to expression go multiple knot to merge, determine
The independent visitor information of website to be counted.Mark obtains node and carries out before user identifier transmission to the user identifier got
Deduplication operation passes the user identifier managing detailed catalogue of full dose when mark obtains node and transmits user identifier to downstream node
Defeated, the data transmission of the user identifier after being converted into duplicate removal reduces volume of transmitted data, improves the calculating of independent Statistics of accessing population
The upper limit improves data transmission efficiency.And it is possible to outer memory module is not needed, it is original only it is also not necessary to change
Vertical Statistics of accessing population system, reduces the cost of independent Statistics of accessing population.
Method or apparatus described in above-described embodiment that this specification provides can realize that business is patrolled by computer program
It collects and records on a storage medium, the storage medium can be read and be executed with computer, realize this specification embodiment institute
The effect of description scheme.Therefore, this specification also provides a kind of independent Statistics of accessing population processing equipment, including at processor and storage
The memory for managing device executable instruction is realized when described instruction is executed by the processor and is included the following steps:
The user identifier that node obtains access website to be counted is obtained using mark, respectively obtains the mark in node
User identifier carry out duplicate removal, obtain duplicate removal user identifier;
Each mark is obtained into the duplicate removal user identifier in node and is sent to mark merge node;
The duplicate removal user identifier in the mark merge node is merged, determines the only of the website to be counted
Vertical visitor information.
The storage medium may include the physical unit for storing information, usually by after information digitalization again with benefit
The media of the modes such as electricity consumption, magnetic or optics are stored.The storage medium, which has, may include:Letter is stored in the way of electric energy
The device of breath such as, various memory, such as RAM, ROM;The device of information is stored in the way of magnetic energy such as, hard disk, floppy disk, magnetic
Band, core memory, magnetic bubble memory, USB flash disk;Using optical mode storage information device such as, CD or DVD.Certainly, there are also it
Readable storage medium storing program for executing of his mode, such as quantum memory, graphene memory etc..
It should be noted that processing equipment described above can also include other implement according to the description of embodiment of the method
Mode.Concrete implementation mode is referred to the description of related method embodiment, does not repeat one by one herein.
Embodiment of the method provided by this specification embodiment can mobile terminal, terminal, server or
It is executed in similar arithmetic unit.For running on the server, Fig. 7 is a kind of independent visitor using the embodiment of the present invention
The hardware block diagram of statistical server.As shown in fig. 7, server 10 may include one or more (only showing one in figure)
(processor 100 can include but is not limited to the processing dress of Micro-processor MCV or programmable logic device FPGA etc. to processor 100
Set), memory 200 for storing data and the transmission module 300 for communication function.This neighborhood those of ordinary skill
It is appreciated that structure shown in Fig. 7 is only to illustrate, the structure of above-mentioned electronic device is not caused to limit.For example, server
10 may also include the more or less component than shown in Fig. 7, such as can also include other processing hardware, such as database
Or multi-level buffer, GPU, or with the configuration different from shown in Fig. 7.
Memory 200 can be used for storing the software program and module of application software, such as the search in the embodiment of the present invention
Corresponding program instruction/the module of method, the software program and module that processor 100 is stored in memory 200 by operation,
Thereby executing various function application and data processing.Memory 200 may include high speed random access memory, may also include non-volatile
Property memory, such as one or more magnetic storage device, flash memory or other non-volatile solid state memories.In some realities
In example, memory 200 can further comprise the memory remotely located relative to processor 100, these remote memories can be with
Pass through network connection to terminal 10.The example of above-mentioned network includes but is not limited to internet, intranet, local
Net, mobile radio communication and combinations thereof.
Transmission module 300 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of terminal 10 provide.In an example, transmission module 300 includes that a network is suitable
Orchestration (Network Interface Controller, NIC), can be connected by base station with other network equipments so as to
Internet is communicated.In an example, transmission module 300 can be radio frequency (Radio Frequency, RF) module,
For wirelessly being communicated with internet.
This specification also provides a kind of independent Statistics of accessing population system, and the system can be individual independent Statistics of accessing population system
System, can also apply in a variety of independent Statistics of accessing population processing systems.The system can be individual server, can also be with
It include the use of one or more the methods of this specification or server cluster, the system of one or more embodiment devices
(including distributed system), software (application), practical operation device, logic gates device, quantum computer etc. and combine must
The terminal installation for the implementation hardware wanted.The exception sales person identification system may include at least one processor and storage
The memory of computer executable instructions, the processor realize above-mentioned any one or multiple implementations when executing described instruction
The step of method described in example, such as following steps may be implemented:
The user identifier that node obtains access website to be counted is obtained using mark, respectively obtains the mark in node
User identifier carry out duplicate removal, obtain duplicate removal user identifier;
Each mark is obtained into the duplicate removal user identifier in node and is sent to mark merge node;
The duplicate removal user identifier in the mark merge node is merged, determines the only of the website to be counted
Vertical visitor information.
It should be noted that system described above can also include others according to the description of method or Installation practice
Embodiment, concrete implementation mode are referred to the description of related method embodiment, do not repeat one by one herein.
It should be noted that this specification device or system described above according to the description of related method embodiment also
It may include other embodiments, concrete implementation mode is referred to the description of embodiment of the method, does not go to live in the household of one's in-laws on getting married one by one herein
It states.All the embodiments in this specification are described in a progressive manner, and same and similar part is mutual between each embodiment
Mutually referring to each embodiment focuses on the differences from other embodiments.Especially for hardware+program
For class, storage medium+program embodiment, since it is substantially similar to the method embodiment, so be described relatively simple, it is related
Place illustrates referring to the part of embodiment of the method.
Although being mentioned in this specification embodiment content a certain range of using median calculating guideline discount, acquisition
The operations such as acquisition, definition, interaction, calculating, the judgement of trading object and Transaction Information etc and data description, still, this theory
Bright book embodiment is not limited to comply with standard situation described in data model/template or this specification embodiment.
Certain professional standards use embodiment modified slightly in customized mode or the practice processes of embodiment description
May be implemented above-described embodiment it is identical, it is equivalent or it is close or deformation after it is anticipated that implementation result.Using these modifications or deformation
The embodiment of the acquisitions such as data acquisition, storage, judgement, processing mode afterwards still may belong to the optional implementation of this specification
Within aspects.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, vehicle-mounted human-computer interaction device, cellular phone, camera phone, smart phone, individual
Digital assistants, media player, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or
The combination of any equipment in these equipment of person.
For convenience of description, it is divided into various modules when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each module can be realized in the same or multiple software and or hardware when specification one or more, it can also be with
The module for realizing same function is realized by the combination of multiple submodule or subelement etc..Installation practice described above is only
It is only illustrative, for example, in addition the division of the unit, only a kind of logical function partition can have in actual implementation
Division mode, such as multiple units or components can be combined or can be integrated into another system or some features can be with
Ignore, or does not execute.Another point, shown or discussed mutual coupling, direct-coupling or communication connection can be logical
Some interfaces are crossed, the indirect coupling or communication connection of device or unit can be electrical property, mechanical or other forms.
It is also known in the art that other than realizing controller in a manner of pure computer readable program code, it is complete
Entirely can by by method and step carry out programming in logic come so that controller with logic gate, switch, specific integrated circuit, programmable
Logic controller realizes identical function with the form for being embedded in microcontroller etc..Therefore this controller is considered one kind
Hardware component, and the structure that the device for realizing various functions that its inside includes can also be considered as in hardware component.Or
Person even, can will be considered as realizing the device of various functions either the software module of implementation method can be hardware again
Structure in component.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method or equipment of element.
It will be understood by those skilled in the art that this specification one or more embodiment can provide as method, system or calculating
Machine program product.Therefore, this specification one or more embodiment can be used complete hardware embodiment, complete software embodiment or
The form of embodiment combining software and hardware aspects.Moreover, this specification one or more embodiment can be used at one or
It is multiple wherein include computer usable program code computer-usable storage medium (including but not limited to magnetic disk storage,
CD-ROM, optical memory etc.) on the form of computer program product implemented.
This specification one or more embodiment can computer executable instructions it is general on
It hereinafter describes, such as program module.Generally, program module includes executing particular task or realization particular abstract data type
Routine, programs, objects, component, data structure etc..This this specification one can also be practiced in a distributed computing environment
Or multiple embodiments, in these distributed computing environments, by being held by the connected remote processing devices of communication network
Row task.In a distributed computing environment, program module can be located at the local and remote computer including storage equipment
In storage medium.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ",
The description of " specific example " or " some examples " etc. means specific features described in conjunction with this embodiment or example, structure, material
Or feature is contained at least one embodiment or example of this specification.In the present specification, to the signal of above-mentioned term
Property statement must not necessarily be directed to identical embodiment or example.Moreover, specific features, structure, material or the spy of description
Point may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other,
Those skilled in the art can be by different embodiments or examples described in this specification and different embodiments or examples
Feature is combined.
The foregoing is merely the embodiments of this specification, are not limited to this specification.For art technology
For personnel, this specification can have various modifications and variations.It is all made any within the spirit and principle of this specification
Modification, equivalent replacement, improvement etc., should be included within the scope of the claims of this specification.