CN106610991A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN106610991A
CN106610991A CN201510696540.2A CN201510696540A CN106610991A CN 106610991 A CN106610991 A CN 106610991A CN 201510696540 A CN201510696540 A CN 201510696540A CN 106610991 A CN106610991 A CN 106610991A
Authority
CN
China
Prior art keywords
user
network access
network
identification model
behavioral data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510696540.2A
Other languages
Chinese (zh)
Inventor
胡立芳
唐珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510696540.2A priority Critical patent/CN106610991A/en
Publication of CN106610991A publication Critical patent/CN106610991A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a data processing method and device. The method comprises the steps of: acquiring a network access recognition model of each user in a user group using the same network, wherein the network access recognition model at least comprises a network access recognition factor determined after the user carries out page access via the same network within a preset time period; receiving network access behavior data of the current user also using the same network; and analyzing the network access behavior data of the current user by using the network access recognition model of each user to determine the user with the highest matching degree with the current user in the user group. The method and the device solve the technical problem that the recognition precision of the network access behavior data is low because only the network access behavior data of the same network can be obtained in the prior art.

Description

Data processing method and device
Technical field
The present invention relates to internet arena, in particular to a kind of data processing method and device.
Background technology
With the acceleration of internet development, network behavior has become the important channel of research consumer.At present the source of subscriber network access behavioral data mainly has:1) networks congestion control sample Monitoring Data;2) network service quotient data;3) Website server daily record data;4) data that the third party service provider passes through page-tag technical limit spacing;5) other modes.
1) and 2) source is wherein very important two ways, and these data have the characteristics that:1) the internet access path of embodiment netizen that can be more complete, to more valuable the characteristics of overall understanding current netizen;2) data are obtained in units of family;3) it is convenient to combine the subjective mode such as questionnaire survey, can be with multiple data sources binding analysis.
But such data source is wanted to produce higher data value, most important exactly to need the data in units of family to be further separated to everyone.
A kind of method for realizing internet user access Statistic Analysis is prior art describes, the program passes through the information for extracting and recording the subscriber computer representated by each visiting subscriber's object, according to the similarity of the information of user class user object is merged.The information that subscriber computer representated by different Cookie is recorded by learning process and the navigation patterns that Cookie occurred, user's similarity degree is judged, would be possible to be that the Cookie that same user produces is merged, and the navigation patterns record statistical analysiss visitation frequency and navigation patterns custom based on Cookie after merging.
Thus, the scheme of Cookie is merged according to the degree of association based on user's machine information and navigation patterns of description of the prior art it was determined that prior art at least has following several defects:1) applicable data source is limited;2) separation accuracy is limited;3) Systematic Errors can be increasing, the chance without modification.
For being only capable of obtaining the network access behavioral data of consolidated network in prior art, the not high problem of network access behavioral data accuracy of identification not yet proposes at present effective solution.
The content of the invention
A kind of data processing method and device are embodiments provided, at least to solve to be only capable of the network access behavioral data for obtaining consolidated network in prior art, the not high technical problem of network access behavioral data accuracy of identification.
A kind of one side according to embodiments of the present invention, there is provided data processing method, including:Obtain using consolidated network customer group in each user network access identification model, wherein, network access identification model is carried out after page access in preset time period including at least user by consolidated network, determined by network access recognize factor;Receive the network access behavioral data of the same active user using consolidated network;Using the network access identification model of each user the network access behavioral data of active user is analyzed respectively, determine in customer group with the matching degree highest user of active user.
Another aspect according to embodiments of the present invention, additionally provides a kind of data processing equipment, including:Acquisition module, the network access identification model of each user in for obtaining the customer group using consolidated network, wherein, network access identification model is carried out after page access in preset time period including at least user by consolidated network, determined by network access identification factor;Receiver module, for receiving the network access behavioral data of the same active user for using consolidated network;Determining module, for being analyzed to the network access behavioral data of active user respectively using the network access identification model of each user, determine in customer group with the matching degree highest user of active user.
In embodiments of the present invention, the network access identification model of each user in by obtaining the customer group using consolidated network, and the network access behavioral data of the same active user using consolidated network of reception, using the network access identification model of each user the network access behavioral data of active user is analyzed respectively, determine in customer group with the matching degree highest user of active user.Therefore, such scheme can realize the purpose of the network access behavioral data for recognizing different user, because user belongs to using the customer group of consolidated network, further can also realize that the network access behavioral data in units of customer group is separated to the purpose of each user in customer group, it is only capable of obtaining the network access behavioral data of consolidated network, the not high technical problem of network access behavioral data accuracy of identification in so as to solve prior art.It follows that the scheme that the embodiment of the present application is provided can further be separated to individual the data in units of family, the accuracy of identification of network access behavioral data is improved, with higher data value.
Description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, and the schematic description and description of the present invention is used to explain the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of data processing method according to embodiments of the present invention;
Fig. 2 is the flow chart of a kind of optional data processing method according to embodiments of the present invention;
Fig. 3 is a kind of schematic diagram of data processing equipment according to embodiments of the present invention;
Fig. 4 is the schematic diagram of a kind of optional data processing equipment according to embodiments of the present invention;
Fig. 5 is the schematic diagram of a kind of optional data processing equipment according to embodiments of the present invention;
Fig. 6 is the schematic diagram of a kind of optional data processing equipment according to embodiments of the present invention;
Fig. 7 is the schematic diagram of a kind of optional data processing equipment according to embodiments of the present invention;
Fig. 8 is a kind of optional data processing schematic diagram according to embodiments of the present invention;And
Fig. 9 is a kind of optional data processing schematic diagram according to embodiments of the present invention.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a present invention part, rather than the embodiment of whole.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made should all belong to the scope of protection of the invention.
It should be noted that description and claims of this specification and term " first ", " second " in above-mentioned accompanying drawing etc. are the objects for distinguishing similar, without being used to describe specific order or precedence.It should be appreciated that the data for so using can be exchanged in the appropriate case, so that embodiments of the invention described herein can be implemented with the order in addition to those for illustrating here or describing.In addition, term " comprising " and " having " and their any deformation, it is intended to cover non-exclusive including, for example, process, method, system, product or the equipment for containing series of steps or unit is not necessarily limited to those steps clearly listed or unit, but may include other steps clearly do not list or intrinsic for these processes, method, product or equipment or unit.
The part noun occurred during being described to the embodiment of the present invention below or term are applied to description below:
Networks congestion control sample is monitored:It is primarily referred to as choosing certain sample, the data acquisition of persistence is carried out to its network access behavior, monitoring client during the hardware devices such as router or equipment for surfing the net can be relied on is completed, and the sample based on such certain scale can complete to be monitored and analyzed huge netizen's group behavior.
Internet service provider:ISP (abbreviation of Internet Service Provider) is referred to, Internet Service Provider is literally meaned.The type of Internet service provider includes physical network network operator NP, IAP IAP, Web server etc..
Embodiment 1
According to embodiments of the present invention, there is provided a kind of data processing method embodiment, it should be noted that, can perform in the such as computer system of one group of computer executable instructions the step of the flow process of accompanying drawing is illustrated, and, although showing logical order in flow charts, in some cases, can be with the step shown or described by performing different from order herein.
Fig. 1 is a kind of flow chart of data processing method according to embodiments of the present invention, as shown in figure 1, the method comprises the steps:
Step S102, obtain the network access identification model of each user in the customer group using consolidated network, wherein, network access identification model is carried out after page access in preset time period including at least user by consolidated network, determined by network access identification factor.
Specifically, the network access identification model in above-mentioned steps shows that user carries out the access habits of page access, such as type of the webpage that user Jing frequentations are asked, accesses the information such as the duration and frequency of the type webpage.
In a kind of optional scheme, the network access behavioral data for obtaining all users in the family is extracted from one family router, the network access behavioral data is analyzed, determine the network access identification model of each user.
Step S104, receives the network access behavioral data of the same active user using consolidated network.
Specifically, the network access behavioral data in above-mentioned steps can be the data that user carries out page access generation, and above-mentioned data can be stored in router or Internet service provider's server.Active user can be any one user in the customer group for use consolidated network in any one user, or non-user group, and such as customer group is three members A, B and C in one family, and active user is guest D.
Herein it should be noted that, the execution sequence of the application above-mentioned steps S102 and S104 can be exchanged with each other, i.e. in another scene that the present invention can be provided, after network access being carried out using current network active user is got, determine the network access identification model of each user in stored customer group according to heterogeneous networks data again, wherein, the network that each user in customer group is used is identical with the network that above-mentioned active user is used.Optional other embodiment of the invention will not be described here.
Step S106, is analyzed respectively using the network access identification model of each user to the network access behavioral data of active user, determine in customer group with the matching degree highest user of active user.
Specifically, using the network access identification model of each user the network access behavioral data of active user is analyzed respectively, obtains the matching degree of each user and the active user in customer group, the active user can be matching degree highest user in customer group.
In a kind of optional scheme, obtain three member A in the family using same router, the network access identification model of B and C, receive the network access behavioral data using the user D of same router, A is used respectively, tri- network access identification models of B and C are analyzed to the network access behavioral data of active user, determine that A is 20% with active user's matching degree, determine that B is 75% with active user's matching degree, determine that C is 48% with active user's matching degree, wherein, matching degree highest user is B, thus may determine that active user is user B, i.e. current network access behavioral data is the data that user B accession pages are produced.
In the above embodiments of the present application, the network access identification model of each user in by obtaining the customer group using consolidated network, and the network access behavioral data of the same active user using consolidated network of reception, using the network access identification model of each user the network access behavioral data of active user is analyzed respectively, determine in customer group with the matching degree highest user of active user.Therefore, such scheme can realize the purpose of the network access behavioral data for recognizing different user, because user belongs to using the customer group of consolidated network, further can also realize that the network access behavioral data in units of customer group is separated to the purpose of each user in customer group, it is only capable of obtaining the network access behavioral data of consolidated network, the not high technical problem of network access behavioral data accuracy of identification in so as to solve prior art.It follows that the scheme that the embodiment of the present application is provided can further be separated to individual the data in units of family, the accuracy of identification of network access behavioral data is improved, with higher data value.
Alternatively, in the above embodiments of the present application, network access identification factor includes following any one or more parameters determined by user to access pages:The access times of accession page, access duration, jump out rate, access frequency and access depth.
Specifically, the access times of accession page can be the access times that user accesses specific website in special time period in such scheme, and the special time can be one day or one week.The time that duration can be that user accesses every time specific website is accessed, the time that video website is for example accessed every time is 2 hours.The rate of jumping out can be that user's access website only accesses the probability that a page just leaves.Access frequency can be user and access interlude between specific webpage, for example, every other day access primary video website.It can be that user accesses the continuous page number of specific website to access depth, and it is 10 that such as user accesses the continuous page number of video website.
By such scheme, due to the network access identification factor difference that different user accession page determines, therefore can pass through to generate the identification model that factor is recognized comprising network access, network access behavioral data is analyzed, realize the purpose of the network access behavioral data of identification different user.
Alternatively, in the above embodiments of the present application, step S102 is obtained and comprised the steps using the network access identification model of each user in the customer group of consolidated network:
Step S1022, obtains the network access behavioral data of each user in preset time period using consolidated network.
Specifically, preset time period can be the time period of user's setting before the network access behavioral data for receiving active user in above-mentioned steps, can will use the network access behavioral data of each user of consolidated network as sample data in preset time period, obtain the network access identification model of each user.
Step S1024, according to the network access behavioral data of each user in preset time period, determines the network access identification factor and corresponding weight of each user in customer group.
Specifically, the weight in above-mentioned steps can be obtained according to the access habits of each user, the corresponding weighted of different user identical network access identification factor, the different corresponding weighted of network access identification factor of same subscriber.Network access identification factor in above-mentioned steps can be access frequency, access duration and access depth.But not limited to this, including the network access identification factor of other specification can also realize the purpose of the present embodiment.
In a kind of optional scheme, can be according to the similarity between the network access behavioral data of each user, determine the page type of each user to access pages, for example, the page type of user's A accession pages is sport category and news category, the page type of user's B accession pages is shopping class and video class, and the page type of user's C accession pages is web game class.After the page type to each user to access pages is classified, determine network access identification factor and corresponding weight of each user in every kind of page type, for example, the network access identification factor of video class is to access duration and access frequency, the corresponding weight of network access identification factor of user A is minimum, the corresponding weight highest of network access identification factor of user B.
Step S1026, according to the network access of each user factor x is recognizediWith corresponding weight ki, according to formula yi=k1x1+k2x2+…+knxnGenerate network access identification model y of each useri, wherein, i is natural number.
Specifically, n can be that networking accesses the number of parameters that identification factor is included in above-mentioned steps, and for example, network access identification factor is access frequency, accesses duration and access depth, then n is 3.
By above-mentioned steps S1022 to step S1026, by the network access behavioral data for obtaining each user in preset time period, determine the network access identification factor and corresponding weight of each user in customer group, the network access identification model of each user is generated according to formula, so as to realize obtaining the purpose of the network access identification model of each user in the customer group for using consolidated network.
Alternatively, in the above embodiments of the present application, step S106 is analyzed respectively using the network access identification model of each user to the network access behavioral data of active user, is determined in customer group and is comprised the steps with the matching degree highest user of active user:
Step S1062, from the network access behavioral data of active user, extraction obtains active user and carries out the identification factor of the network access after page access.
In a kind of optional scheme, from the network access behavioral data of active user, the page type of user to access pages is extracted, such as page type of active user's accession page is video class, then the network access identification factor of active user is to access duration and access frequency.
Step S1064, using network access identification model y of each useriThe network access identification factor of active user is processed, network access identification model y of any one user of active user's correspondence is calculatediThe value that predicts the outcome.
Specifically, the value that predicts the outcome in above-mentioned steps can be the prediction probability value that active user is any one user.
In a kind of optional scheme, the network access identification factor of active user is substituted into into network access identification model y of each useri, obtain network access identification model y of any one user of active user's correspondenceiThe value that predicts the outcome, for example, the network access identification factor of active user is to access duration and access frequency, substitute into network access identification model of the page type for each user of video class, be calculated active user correspondence user A predicts the outcome value for 25%, the value that predicts the outcome of active user correspondence user B is 80%, and active user correspondence user C's predicts the outcome value for 65%.
Step S1066, the value that will predict the outcome highest user is defined as and active user matching degree highest user.
In a kind of optional scheme, by network access identification model y for comparing any one user of active user's correspondenceiThe value that predicts the outcome, the value that will predict the outcome highest user is defined as and active user matching degree highest user, and for example, the value that predicts the outcome of active user correspondence user B is 80% highest, it is thus determined that user B is and active user matching degree highest user.
By above-mentioned steps S1062 to step S1066, extract the network access identification factor of active user, calculate the value that predicts the outcome of any one user of active user's correspondence, it is determined that the value highest user that predicts the outcome is and active user matching degree highest user, so as to realize the purpose of the network access behavioral data of stable and efficient identification user.
Alternatively, in the above embodiments of the present application, using regression algorithm network access identification factor x of each user is determinediCorresponding weight ki
Specifically, above-mentioned regression algorithm can be logistic regression algorithm, but not limited to this, and other regression algorithms can also realize the purpose of the present embodiment.The purpose of the present embodiment, such as SPSS, Stat, SAS, R, rapidminer and Python etc. can also be realized using data analysiss or data mining software.
In a kind of optional scheme, according to each subscriber network access behavioral data, the weighted value that all-network accesses identification factor is obtained using logistic regression algorithm, further determine that the relation between each user and its network access behavior, the user in customer group so as to set up the regression equation (i.e. network access identification model) of a good relationship, corresponding to the network access behavioral data of the active user received for prediction.
Alternatively, logistic regression algorithm is comprised the following steps that by the maximum likelihood estimation of observation sample come selection parameter:
The first step, in rapidminer systems, imports the network access behavioral data of each user for needing analysis as sample data.
Second step, according to the sample data for importing, sets up Logic Regression Models, and the result example for exporting each factor weight value is as shown in table 1:
Table 1
Factor Attribute Weighted value Weight
Access times x of accession page1 k1
Access duration x2 k2
Jump out rate x3 k3
Access frequency x4 k4
Access depth x5 k5
By such scheme, the weight corresponding to more accurate network access identification factor can be obtained using regression algorithm.
Alternatively, in the above embodiments of the present application, it is defined as with after active user matching degree highest user in the step S1066 value highest user that will predict the outcome, said method also comprises the steps:
Whether step S110, judge the value that predicts the outcome of any one user less than predetermined threshold value.
Specifically, the value that predicts the outcome of any one user can be the value that predicts the outcome with active user matching degree highest user in above-mentioned steps, and predetermined threshold value can be 70%, but be not limited only to this, and other predetermined threshold values can also meet the purpose of the present embodiment.
In a kind of optional scheme, user A's predicts the outcome value for 25%, user B's predicts the outcome value for 80%, user C's predicts the outcome value for 65%, it is determined that being user B with active user's matching degree highest user, because 80% more than 70%, therefore judge the value that predicts the outcome of user B more than predetermined threshold value.
Step S112, in the case where the value that predicts the outcome is less than predetermined threshold value, corrects network access identification model of the value less than the user of predetermined threshold value that predict the outcome, until the value that predicts the outcome is more than or equal to predetermined threshold value.
In a kind of optional scheme, user A's predicts the outcome value for 25%, user B's predicts the outcome value for 65%, user C's predicts the outcome value for 45%, it is determined that being user B with active user's matching degree highest user, because 65% is less than 70%, judge the value that predicts the outcome of user B more than predetermined threshold value, the network access identification model of amendment user B, until the value that predicts the outcome of user B is more than or equal to predetermined threshold value.
By above-mentioned steps S110 to step S112, by judging the value that predicts the outcome of any one user whether less than predetermined threshold value, and in the case where the value that predicts the outcome is less than predetermined threshold value, network access identification model of the value less than the user of predetermined threshold value that predict the outcome is corrected, until the value that predicts the outcome is more than or equal to predetermined threshold value.Therefore, such scheme is by persistently correcting the network access identification model of each user, it is to avoid impact of the small probability event to network access identification model, it is ensured that higher accuracy of identification.
Alternatively, in the above embodiments of the present application, the value that predicts the outcome is corrected in step S112 and is comprised the steps less than the network access identification model of the user of predetermined threshold value:
Step S1122, according to the network access behavioral data of active user, network access identification factor and corresponding weight, obtain the network access identification model of revised each user determined by amendment preset time period.
In a kind of optional scheme, in the case where the value that predicts the outcome of user B is less than predetermined threshold value, only according to the network access behavioral data of active user, weight corresponding to the network access identification factor of the user B determined in amendment preset time period, further obtains the network access identification model of revised user B.
By above-mentioned steps S1122, according to the network access behavioral data of active user, network access identification factor and corresponding weight determined by amendment preset time period, the network access identification model of revised each user is obtained, so as to realize correcting the purpose of network access identification model of the value less than the user of predetermined threshold value that predict the outcome.
Alternatively, in the above embodiments of the present application, the value that predicts the outcome is corrected in step S112 and is comprised the steps less than the network access identification model of the user of predetermined threshold value:
Step S1124, on the basis of the time for carrying out page access by active user, according to the network access behavioral data and the network access behavioral data of active user of each user, it is determined that the network access identification factor and corresponding weight after amendment in preset time period, generates the network access identification model of revised each user.
In a kind of optional scheme, in the case where the value that predicts the outcome of user B is less than predetermined threshold value, the network access behavioral data of active user is added in the network access behavioral data of each user, obtain new sample data, the network access identification factor and corresponding weight of user B in new preset time period are determined according to new sample data, the network access identification model of revised user B is further obtained.
By above-mentioned steps S1124, on the basis of the time for carrying out page access by active user, according to the network access behavioral data and the network access behavioral data of active user of each user, it is determined that the network access identification factor and corresponding weight after amendment in preset time period, the network access identification model of revised each user is generated, so as to realize correcting the purpose of network access identification model of the value less than the user of predetermined threshold value that predict the outcome.
Fig. 2 is the flow chart of a kind of optional data processing method according to embodiments of the present invention, as shown in Fig. 2 a kind of detailed step of optional application scenarios is:
S21:The input of data source comprising critical field.
Specifically, the network access behavioral data of certain amount of domestic consumer is obtained from router, above-mentioned critical field includes accession page and following any one or more parameters:The access times of accession page, access duration, jump out rate, access frequency and access depth.
S22:Differentiation belongs to the data of different home and the multiple equipment in family.
Specifically, the network access behavioral data of certain amount of domestic consumer can be made a distinction according to the IP address of router, obtains belonging to the network access behavioral data of the multiple equipment in same family.
S23:The network access identification model of each user in founding a family.
Specifically, extract one section of sample data and be directed to smartphone data, each mobile phone can be considered an independent user, kinsfolk's number can be guessed accordingly;For panel computer data, according to the similarity with smartphone data, can be merged on cellphone subscriber, it is impossible to which the data of matching can be defaulted as a user, the typically children in family;For PC data, according to the similarity with panel computer and smartphone data, separation is integrated into each user;The network access identification factor and its weight for distinguishing user is found according to sample data, the network access identification model of each user is set up.
S24:Subsequent acquisition data are predicted using network access identification model.
Specifically, the implementation of the step is identical with the implementation of step S106 in above-described embodiment, the network access behavioral data of active user is analyzed using the network access identification model of each user, the value that predicts the outcome of the network access identification model of corresponding any one user of active user is calculated, the value that will predict the outcome highest user is defined as and active user matching degree highest user.
S25:Judge prediction probability whether less than certain threshold value.
Specifically, the implementation of the step is similar to the implementation of step S110 in above-described embodiment.In the case where prediction probability is less than certain threshold value, into step S26.In the case where prediction probability is more than or equal to certain threshold value, into step S27.
S26:Correction model.
Specifically, the implementation of the step is similar to the implementation of step S112 in above-described embodiment, and therefore not to repeat here.
S27:Terminate.
Specifically, in the case where prediction probability is more than or equal to certain threshold value, the network access behavioral data of identifying user is completed.
Embodiment 2
According to embodiments of the present invention, there is provided a kind of data processing equipment embodiment.
Fig. 3 is a kind of schematic diagram of data processing equipment according to embodiments of the present invention, as shown in figure 3, the device includes:Acquisition module 31, receiver module 33 and determining module 35, wherein,
Acquisition module 31, the network access identification model of each user in for obtaining the customer group using consolidated network, wherein, network access identification model is carried out after page access in preset time period including at least user by consolidated network, determined by network access identification factor.
Specifically, the network access identification model in above-mentioned acquisition module 31 shows that user carries out the access habits of page access, such as type of the webpage that user Jing frequentations are asked, accesses the information such as the duration and frequency of the type webpage.
In a kind of optional scheme, acquisition module 31 extracts the network access behavioral data for obtaining all users in the family from one family router, and the network access behavioral data is analyzed, and determines the network access identification model of each user.
Receiver module 33, for receiving the network access behavioral data of the same active user for using consolidated network.
Specifically, the network access behavioral data in above-mentioned receiver module 33 can be the data that user carries out page access generation, and above-mentioned data can be stored in router or Internet service provider's server.Active user can be any one user in the customer group for use consolidated network in any one user, or non-user group, and such as customer group is three members A, B and C in one family, and active user is guest D.
Determining module 35, for being analyzed to the network access behavioral data of active user respectively using the network access identification model of each user, determine in customer group with the matching degree highest user of active user.
Specifically, determining module 35 is analyzed respectively using the network access identification model of each user to the network access behavioral data of active user, the matching degree of each user and the active user in customer group is obtained, the active user can be matching degree highest user in customer group.
In a kind of optional scheme, acquisition module 31 obtains three member A in the family for using same router, the network access identification model of B and C, receiver module 33 receives the network access behavioral data using the user D of same router, determining module 35 uses respectively A, tri- network access identification models of B and C are analyzed to the network access behavioral data of active user, determine that A is 20% with active user's matching degree, determine that B is 75% with active user's matching degree, determine that C is 48% with active user's matching degree, wherein, matching degree highest user is B, thus may determine that active user is user B, i.e. current network access behavioral data is the data that user B accession pages are produced.
In the above embodiments of the present application, the network access identification model of each user in the customer group for using consolidated network is obtained by acquisition module, and the network access behavioral data of the same active user using consolidated network is received by receiver module, determining module is analyzed respectively using the network access identification model of each user to the network access behavioral data of active user, determine in customer group with the matching degree highest user of active user.Therefore, such scheme can realize the purpose of the network access behavioral data for recognizing different user, because user belongs to using the customer group of consolidated network, further can also realize that the network access behavioral data in units of customer group is separated to the purpose of each user in customer group, it is only capable of obtaining the network access behavioral data of consolidated network, the not high technical problem of network access behavioral data accuracy of identification in so as to solve prior art.It follows that the scheme that the embodiment of the present application is provided can further be separated to individual the data in units of family, the accuracy of identification of network access behavioral data is improved, with higher data value.
Alternatively, in the above embodiments of the present application, network access identification factor includes following any one or more parameters determined by user to access pages:The access times of accession page, access duration, jump out rate, access frequency and access depth.
Specifically, the access times of accession page can be the access times that user accesses specific website in special time period in such scheme, and the special time can be one day or one week.The time that duration can be that user accesses every time specific website is accessed, the time that video website is for example accessed every time is 2 hours.The rate of jumping out can be that user's access website only accesses the probability that a page just leaves.Access frequency can be user and access interlude between specific webpage, for example, every other day access primary video website.It can be that user accesses the continuous page number of specific website to access depth, and it is 10 that such as user accesses the continuous page number of video website.
By such scheme, due to the network access identification factor difference that different user accession page determines, therefore can pass through to generate the identification model that factor is recognized comprising network access, network access behavioral data is analyzed, realize the purpose of the network access behavioral data of identification different user.
Alternatively, as shown in figure 4, in the above embodiments of the present application, above-mentioned acquisition module 31 includes:
Acquisition submodule 311, for obtaining the network access behavioral data of each user that consolidated network is used in preset time period.
Specifically, preset time period can be the time period of user's setting before the network access behavioral data for receiving active user in above-mentioned acquisition submodule 311, can will use the network access behavioral data of each user of consolidated network as sample data in preset time period, obtain the network access identification model of each user.
First determination sub-module 313, for according to the network access behavioral data of each user in preset time period, determining the network access identification factor and corresponding weight of each user in customer group.
Specifically, the weight in above-mentioned first determination sub-module 313 can be obtained according to the access habits of each user, the corresponding weighted of different user identical network access identification factor, the different corresponding weighted of network access identification factor of same subscriber.Network access identification factor can be access frequency, access duration and access depth.But not limited to this, including the network access identification factor of other specification can also realize the purpose of the present embodiment.
In a kind of optional scheme, first determination sub-module can be according to the similarity between the network access behavioral data of each user, determine the page type of each user to access pages, for example, the page type of user's A accession pages is sport category and news category, the page type of user's B accession pages is shopping class and video class, and the page type of user's C accession pages is web game class.After the page type to each user to access pages is classified, determine network access identification factor and corresponding weight of each user in every kind of page type, for example, the network access identification factor of video class is to access duration and access frequency, the corresponding weight of network access identification factor of user A is minimum, the corresponding weight highest of network access identification factor of user B.
Submodule 315 is generated, for recognizing factor x according to the network access of each useriWith corresponding weight xi, according to formula yi=k1x1+k2x2+…+knxnGenerate network access identification model y of each useri, wherein, i is natural number.
Specifically, n can be that networking accesses the number of parameters that identification factor is included in above-mentioned generation submodule 315, and for example, network access identification factor is access frequency, accesses duration and access depth, then n is 3.
By such scheme, the network access behavioral data of each user in preset time period is obtained by acquisition submodule, acquisition submodule determines the network access identification factor and corresponding weight of each user in customer group, the network access identification model that submodule generates each user according to formula is generated, so as to realize obtaining the purpose of the network access identification model of each user in the customer group for using consolidated network.
Alternatively, as shown in figure 5, in the above embodiments of the present application, above-mentioned determining module 35 includes:
Extracting sub-module 351, for from the network access behavioral data of active user, extraction to obtain active user and carries out the identification factor of the network access after page access.
In a kind of optional scheme, extracting sub-module is from the network access behavioral data of active user, the page type of user to access pages is extracted, such as page type of active user's accession page is video class, then the network access identification factor of active user is to access duration and access frequency.
Calculating sub module 353, for using network access identification model y of each useriThe network access identification factor of active user is processed, network access identification model y of any one user of active user's correspondence is calculatediThe value that predicts the outcome.
Specifically, the value that predicts the outcome in above-mentioned calculating sub module 353 can be the prediction probability value that active user is any one user.
In a kind of optional scheme, the network access identification factor of active user is substituted into calculating sub module network access identification model y of each useri, obtain network access identification model y of any one user of active user's correspondenceiThe value that predicts the outcome, for example, the network access identification factor of active user is to access duration and access frequency, substitute into network access identification model of the page type for each user of video class, be calculated active user correspondence user A predicts the outcome value for 25%, the value that predicts the outcome of active user correspondence user B is 80%, and active user correspondence user C's predicts the outcome value for 65%.
Second determination sub-module 355, the value that will predict the outcome highest user is defined as and active user matching degree highest user.
In a kind of optional scheme, the second determination sub-module corresponds to network access identification model y of any one user by comparing active useriThe value that predicts the outcome, the value that will predict the outcome highest user is defined as and active user matching degree highest user, and for example, the value that predicts the outcome of active user correspondence user B is 80% highest, it is thus determined that user B is and active user matching degree highest user.
By such scheme, extracting sub-module extracts the network access identification factor of active user, calculating sub module calculates the value that predicts the outcome of any one user of active user's correspondence, second determination sub-module determines that the value highest user that predicts the outcome is and active user matching degree highest user, so as to realize stable and efficient data processing purpose.
Alternatively, as shown in fig. 6, in the above embodiments of the present application, said apparatus also include:
Processing module 37, for determining that the network access of each user recognizes factor x using regression algorithmiCorresponding weight ki
Specifically, above-mentioned regression algorithm can be logistic regression algorithm, but not limited to this, and other regression algorithms can also realize the purpose of the present embodiment.The purpose of the present embodiment, such as SPSS, Stat, SAS, R, rapidminer and Python etc. can also be realized using data analysiss or data mining software.
In a kind of optional scheme, according to each subscriber network access behavioral data, the weighted value that all-network accesses identification factor is obtained using logistic regression algorithm, further determine that the relation between each user and its network access behavior, the user in customer group so as to set up the regression equation (i.e. network access identification model) of a good relationship, corresponding to the network access behavioral data of the active user received for prediction.
Alternatively, logistic regression algorithm is comprised the following steps that by the maximum likelihood estimation of observation sample come selection parameter:
The first step, in rapidminer systems, imports the network access behavioral data of each user for needing analysis as sample data.
Second step, according to the sample data for importing, sets up Logic Regression Models, and the result example for exporting each factor weight value is as shown in table 1.
By such scheme, processing module can obtain the weight corresponding to more accurate network access identification factor using regression algorithm.
Alternatively, as shown in fig. 7, in the above embodiments of the present application, said apparatus also include:
Judge module 32, for judging the value that predicts the outcome of any one user whether less than predetermined threshold value.
Specifically, the value that predicts the outcome of any one user can be the value that predicts the outcome with active user matching degree highest user in above-mentioned judge module 32, predetermined threshold value can be 70%, but be not limited only to this, and other predetermined threshold values can also meet the purpose of the present embodiment.
In a kind of optional scheme, user A's predicts the outcome value for 25%, user B's predicts the outcome value for 80%, user C's predicts the outcome value for 65%, second determination sub-module determines that with active user's matching degree highest user be user B, because 80% more than 70%, therefore judge module judges the value that predicts the outcome of user B more than predetermined threshold value.
Correcting module 34, in the case of in the value that predicts the outcome less than predetermined threshold value, corrects network access identification model of the value less than the user of predetermined threshold value that predict the outcome, until the value that predicts the outcome is more than or equal to predetermined threshold value.
In a kind of optional scheme, user A's predicts the outcome value for 25%, user B's predicts the outcome value for 65%, user C's predicts the outcome value for 45%, second determination sub-module determines that with active user's matching degree highest user be user B, and because 65% is less than 70%, judge module judges the value that predicts the outcome of user B more than predetermined threshold value, the network access identification model of correcting module amendment user B, until the value that predicts the outcome of user B is more than or equal to predetermined threshold value.
By such scheme, judge the value that predicts the outcome of any one user whether less than predetermined threshold value by judge module, and in the case where the value that predicts the outcome is less than predetermined threshold value, correcting module amendment predict the outcome value less than predetermined threshold value user network access identification model, until predict the outcome value be more than or equal to predetermined threshold value.Therefore, such scheme is by persistently correcting the network access identification model of each user, it is to avoid impact of the small probability event to network access identification model, it is ensured that higher accuracy of identification.
Alternatively, as shown in figure 8, in a kind of optional embodiment, above-mentioned correcting module 34 includes:
First amendment submodule 341, for according to the network access behavioral data of active user, correcting the network access identification factor and corresponding weight determined in preset time period, obtains the network access identification model of revised each user.
In a kind of optional scheme, in the case where the value that predicts the outcome of user B is less than predetermined threshold value, network access behavioral data of the first amendment submodule only according to active user, weight corresponding to the network access identification factor of the user B determined in amendment preset time period, further obtains the network access identification model of revised user B.
By such scheme, network access behavioral data of the first amendment submodule according to active user, network access identification factor and corresponding weight determined by amendment preset time period, the network access identification model of revised each user is obtained, so as to realize correcting the purpose of network access identification model of the value less than the user of predetermined threshold value that predict the outcome.
Alternatively, as shown in figure 9, in an alternative embodiment, above-mentioned correcting module 34 includes:
Second amendment submodule 343, for page access is carried out by active user time on the basis of, according to the network access behavioral data and the network access behavioral data of active user of each user, it is determined that the network access identification factor and corresponding weight after amendment in preset time period, generates the network access identification model of revised each user.
In a kind of optional scheme, in the case where the value that predicts the outcome of user B is less than predetermined threshold value, second amendment submodule is added to the network access behavioral data of active user in the network access behavioral data of each user, obtain new sample data, the network access identification factor and corresponding weight of user B in new preset time period are determined according to new sample data, the network access identification model of revised user B is further obtained.
By such scheme, on the basis of the time that second amendment submodule carries out page access by active user, according to the network access behavioral data and the network access behavioral data of active user of each user, it is determined that the network access identification factor and corresponding weight after amendment in preset time period, the network access identification model of revised each user is generated, so as to realize correcting the purpose of network access identification model of the value less than the user of predetermined threshold value that predict the outcome.
The resolver of the law judgement document includes processor and memorizer, above-mentioned acquisition module, receiver module, determining module, processing module, judge module, correcting module etc. are stored in memory as program unit, by computing device storage said procedure unit in memory.Above-mentioned first preset rules, the second preset rules may be stored in memorizer.
Kernel is included in processor, is gone in memorizer to transfer corresponding program unit by kernel.Kernel can arrange one or more, and by adjusting kernel parameter content of text is parsed.
Memorizer potentially includes the volatile memory in computer-readable medium, the form such as random access memory (RAM) and/or Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM), memorizer includes at least one storage chip.
Present invention also provides a kind of embodiment of computer program, when performing in data handling equipment, it is adapted for carrying out initializing the program code of there are as below methods step:Obtain using consolidated network customer group in each user network access identification model, wherein, network access identification model is carried out after page access in preset time period including at least user by consolidated network, determined by network access recognize factor;Receive the network access behavioral data of the same active user using consolidated network;Using the network access identification model of each user the network access behavioral data of active user is analyzed respectively, determine in customer group with the matching degree highest user of active user.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and without the part described in detail in certain embodiment, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, can realize by another way.Wherein, device embodiment described above is only schematic, the division of such as described unit, it can be a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.Another, shown or discussed coupling each other or direct-coupling or communication connection can be INDIRECT COUPLING or the communication connections by some interfaces, unit or module, can be electrical or other forms.
The unit as separating component explanation can be or may not be physically separate, can be as the part that unit shows or may not be physical location, you can be located at a place, or can also be distributed on multiple units.Some or all of unit therein can according to the actual needs be selected to realize the purpose of this embodiment scheme.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, or unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized and as independent production marketing or when using using in the form of SFU software functional unit, in being stored in a computer read/write memory medium.Based on such understanding, the part or all or part of the technical scheme that technical scheme substantially contributes in other words to prior art can be embodied in the form of software product, the computer software product is stored in a storage medium, including some instructions are used so that all or part of step of computer equipment (can be personal computer, server or network equipment etc.) execution each embodiment methods described of the invention.And aforesaid storage medium includes:USB flash disk, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD etc. are various can be with the medium of store program codes.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; some improvements and modifications can also be made, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of data processing method, it is characterised in that include:
Obtain the network access identification model of each user in the customer group using consolidated network, wherein, the network access identification model is carried out after page access in preset time period including at least user by the consolidated network, determined by network access identification factor;
Receive the network access behavioral data of the active user equally using the consolidated network;And
Using the network access identification model of each user the network access behavioral data of the active user is analyzed respectively, determine in the customer group with the matching degree highest user of the active user.
2. method according to claim 1, it is characterised in that the network access identification factor includes following any one or more parameters determined by user to access pages:The access times of accession page, access duration, jump out rate, access frequency and access depth.
3. method according to claim 2, it is characterised in that the acquisition includes using in the customer group of consolidated network the step of the network access identification model of each user:
Obtain the network access behavioral data of each user that the consolidated network is used in the preset time period;
The network access behavioral data of each user according in the preset time period, determines the network access identification factor and corresponding weight of each user in the customer group;And
Factor x is recognized according to the network access of each useriWith corresponding weight ki, according to formula yi=k1x1+k2x2+…+knxnGenerate network access identification model y of each useri, wherein, i is natural number.
4. method according to claim 3, it is characterized in that, the network access identification model using each user is analyzed respectively to the network access behavioral data of the active user, includes the step of determine in the customer group matching degree highest user with the active user:
From the network access behavioral data of the active user, extraction obtains the active user and carries out the identification factor of the network access after page access;
Using network access identification model y of each useriThe network access identification factor of the active user is processed, network access identification model y of any one user of active user's correspondence is calculatediThe value that predicts the outcome;And
The value highest user that predicts the outcome is defined as and active user's matching degree highest user.
5. the method according to any one in claim 3 to 4, it is characterised in that determine network access identification factor x of each user using regression algorithmiCorresponding weight ki
6. method according to claim 5, it is characterised in that the value highest user that predicts the outcome is defined as with after active user's matching degree highest user, methods described also includes:
Judge the value that predicts the outcome of any one user whether less than predetermined threshold value;Wherein,
In the case where the value that predicts the outcome is less than the predetermined threshold value, the value that predicts the outcome described in amendment is less than the network access identification model of the user of the predetermined threshold value, until the value that predicts the outcome is more than or equal to the predetermined threshold value.
7. method according to claim 6, it is characterised in that the value that predicts the outcome described in the amendment less than the user of the predetermined threshold value network access identification model the step of include:
According to the network access behavioral data of the active user, network access identification factor and corresponding weight determined by the preset time period are corrected, obtain the network access identification model of revised each user.
8. method according to claim 6, it is characterised in that the value that predicts the outcome described in the amendment less than the user of the predetermined threshold value network access identification model the step of include:
On the basis of the time for carrying out the page access by the active user, according to the network access behavioral data and the network access behavioral data of the active user of each user, it is determined that the network access identification factor and corresponding weight after amendment in preset time period, generates the network access identification model of revised each user.
9. a kind of data processing equipment, it is characterised in that include:
Acquisition module, the network access identification model of each user in for obtaining the customer group using consolidated network, wherein, the network access identification model is carried out after page access in preset time period including at least user by the consolidated network, determined by network access identification factor;
Receiver module, for receiving the network access behavioral data of the active user for equally using the consolidated network;And
Determining module, for being analyzed to the network access behavioral data of the active user respectively using the network access identification model of each user, determine in the customer group with the matching degree highest user of the active user.
10. device according to claim 9, it is characterised in that the network access identification factor includes following any one or more parameters determined by user to access pages:The access times of accession page, access duration, jump out rate, access frequency and access depth.
CN201510696540.2A 2015-10-23 2015-10-23 Data processing method and device Pending CN106610991A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510696540.2A CN106610991A (en) 2015-10-23 2015-10-23 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510696540.2A CN106610991A (en) 2015-10-23 2015-10-23 Data processing method and device

Publications (1)

Publication Number Publication Date
CN106610991A true CN106610991A (en) 2017-05-03

Family

ID=58613575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510696540.2A Pending CN106610991A (en) 2015-10-23 2015-10-23 Data processing method and device

Country Status (1)

Country Link
CN (1) CN106610991A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612707A (en) * 2017-08-04 2018-01-19 上海斐讯数据通信技术有限公司 The preprocess method and system of the homologous sample data classification storage in Industry-oriented field
CN110347959A (en) * 2019-06-27 2019-10-18 杭州数跑科技有限公司 Anonymous recognition methods, device, computer equipment and storage medium
CN112182023A (en) * 2020-09-25 2021-01-05 中国科学院信息工程研究所 Big data access control method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456199A (en) * 2010-10-18 2012-05-16 北京学之途网络科技有限公司 Method and device for expanding internet user sample set and acquiring attribute parameter
CN103248955A (en) * 2013-04-22 2013-08-14 深圳Tcl新技术有限公司 Identity recognition method and device based on intelligent remote control system
US20140207518A1 (en) * 2013-01-23 2014-07-24 24/7 Customer, Inc. Method and Apparatus for Building a User Profile, for Personalization Using Interaction Data, and for Generating, Identifying, and Capturing User Data Across Interactions Using Unique User Identification
CN104318138A (en) * 2014-09-30 2015-01-28 杭州同盾科技有限公司 Method and device for verifying identity of user

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456199A (en) * 2010-10-18 2012-05-16 北京学之途网络科技有限公司 Method and device for expanding internet user sample set and acquiring attribute parameter
US20140207518A1 (en) * 2013-01-23 2014-07-24 24/7 Customer, Inc. Method and Apparatus for Building a User Profile, for Personalization Using Interaction Data, and for Generating, Identifying, and Capturing User Data Across Interactions Using Unique User Identification
CN103248955A (en) * 2013-04-22 2013-08-14 深圳Tcl新技术有限公司 Identity recognition method and device based on intelligent remote control system
CN104318138A (en) * 2014-09-30 2015-01-28 杭州同盾科技有限公司 Method and device for verifying identity of user

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107612707A (en) * 2017-08-04 2018-01-19 上海斐讯数据通信技术有限公司 The preprocess method and system of the homologous sample data classification storage in Industry-oriented field
CN107612707B (en) * 2017-08-04 2021-04-09 深圳市其乐游戏科技有限公司 Preprocessing method and system for classified storage of homologous sample data in industry field
CN110347959A (en) * 2019-06-27 2019-10-18 杭州数跑科技有限公司 Anonymous recognition methods, device, computer equipment and storage medium
CN112182023A (en) * 2020-09-25 2021-01-05 中国科学院信息工程研究所 Big data access control method and device, electronic equipment and storage medium
CN112182023B (en) * 2020-09-25 2023-04-11 中国科学院信息工程研究所 Big data access control method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10789311B2 (en) Method and device for selecting data content to be pushed to terminal, and non-transitory computer storage medium
CN110909205B (en) Video cover determination method and device, electronic equipment and readable storage medium
US10298705B2 (en) Recommendation method and device
US9864951B1 (en) Randomized latent feature learning
KR101620748B1 (en) Item recommendation method and apparatus
CN104317959A (en) Data mining method and device based on social platform
EP4198775A1 (en) Abnormal user auditing method and apparatus, electronic device, and storage medium
CN105005582A (en) Recommendation method and device for multimedia information
CN108304426B (en) Identification obtaining method and device
CN105847127A (en) User attribute information determination method and server
WO2007127957A2 (en) System and method for flagging information content
CN105590240A (en) Discrete calculating method of brand advertisement effect optimization
CN113383362B (en) User identification method and related product
CN106874165B (en) Webpage detection method and device
WO2017052953A1 (en) Client-side web usage data collection
CN110856037A (en) Video cover determination method and device, electronic equipment and readable storage medium
CN112612826B (en) Data processing method and device
CN112801155B (en) Business big data analysis method based on artificial intelligence and server
JP2011227721A (en) Interest extraction device, interest extraction method, and interest extraction program
CN106610991A (en) Data processing method and device
CN110708360A (en) Information processing method and system and electronic equipment
CN103544150A (en) Method and system for providing recommendation information for mobile terminal browser
CN113792212A (en) Multimedia resource recommendation method, device, equipment and storage medium
CN116629937A (en) Marketing strategy recommendation method and device
CN106933905B (en) Method and device for monitoring webpage access data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20170503

RJ01 Rejection of invention patent application after publication