CN104750752B - A kind of determining method and apparatus for the preferences user group that surfs the Internet - Google Patents

A kind of determining method and apparatus for the preferences user group that surfs the Internet Download PDF

Info

Publication number
CN104750752B
CN104750752B CN201310752439.5A CN201310752439A CN104750752B CN 104750752 B CN104750752 B CN 104750752B CN 201310752439 A CN201310752439 A CN 201310752439A CN 104750752 B CN104750752 B CN 104750752B
Authority
CN
China
Prior art keywords
url
user
keyword
user group
inverted index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310752439.5A
Other languages
Chinese (zh)
Other versions
CN104750752A (en
Inventor
徐萌
何鸿凌
王彦峰
钱岭
孙少凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201310752439.5A priority Critical patent/CN104750752B/en
Publication of CN104750752A publication Critical patent/CN104750752A/en
Application granted granted Critical
Publication of CN104750752B publication Critical patent/CN104750752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a kind of determining methods and apparatus for the preferences user group that surfs the Internet, pass through the technical solution proposed using the embodiment of the present invention, it needs to be determined that online preferences user group when, keyword according to corresponding to online preferences user group determines corresponding target URL, and the inverted index information with reference to corresponding to target URL, it determines to form the online preferences user group to the user that the access times of target URL meet corresponding to each user identifier of user's screening conditions, so as to, make full use of inverted index information high-performance, the feature of high flexibility ratio, realize the quick obtaining of online preferences user group, avoid the consumption of system resource caused by mass data record and matching, improve the treatment effeciency of online preferences user group determination process and screening accuracy.

Description

A kind of determining method and apparatus for the preferences user group that surfs the Internet
Technical field
The present invention relates to network technique field more particularly to a kind of determining methods and apparatus for the preferences user group that surfs the Internet.
Background technology
In existing technical solution, it may be generally based upon web page contents and carry out customer behavior analysis, as long as user is upper Webpage is browsed in network process, system can be based on analysis user using the access network address of mobile phone or broadband access network, according to URL library Carry out analysing in depth matching and sort out, sum up the hobby attribute of user, so as to according to the hobby of user on website personalized ground Show to its valuable content.
Wherein, concrete implementation example is as follows:
Step A, selected one or more descriptor, such as x86, BMW, schoolmate etc., it is defeated as search key Enter search engine, so as to get a series of relevant web page address list of this keyword;
Step B, the address list in step A, the daily record behavior accessed with user match, and find according to a set pattern Then access the user group of these address lists.
Such user group is to the above-mentioned interested user group of selected descriptor.
In the implementation of the present invention, inventor has found to have at least the following problems in the prior art:
Data volume is big.With current user volume state, the data scale of daily record data is very huge, and rapid development, such as Fruit matches with the relevant web page address list of keyword, especially also needs in the case of matching certain rule, it will into One step is as follows there are problem:
A) it is very poor directly to do operation associated performance, on the one hand, the data scale of daily record data is very huge, on the other hand, The quantity for the web page address being associated with it, then can because of the variation of selected keyword difference and search rule, and Generate violent fluctuation, the stability of data scale is very poor, and the difference of the two data scale be also it is very huge, with For the portfolio of one province, 17,000,000,000 daily record datas can be generated daily, along with calculating cycle, such as one week or one A month, table was huge.And the quantity for the web page address being associated may then only have 2,000,000,000 or so.Each user group Acquisition will carry out the operation associated of the two big tables.
B) the result storage redundancy after being associated with is big, and still with above-mentioned data instance, 8 times of storages of the capacity of 2,000,000,000 tables are superfluous It is remaining(170/20=8), also, the daily record data moment of user all updating, if it is desired to carry out some cycles user behavior group It obtains, then needs to preserve a large amount of daily records, cause the consumption of a large amount of memory spaces.
Invention content
The embodiment of the present invention is designed to provide a kind of determining method and apparatus for the preferences user group that surfs the Internet, Ke Yigeng Add the determining online preferences user group of accurate quick.
In order to achieve the above object, an embodiment of the present invention provides a kind of determining method for the preferences user group that surfs the Internet, packets It includes:
User's internet log record to be analyzed is traversed, is generated respectively each included by user's internet log record Inverted index information corresponding to URL, wherein, the inverted index information corresponding to a URL, which specifically includes, accesses the URL's User identifier and the user identifier are to the access characteristic information of the URL;
When it needs to be determined that during online preferences user group, selecting one or more corresponding to the online preferences user group A keyword, and corresponding target URL is determined according to the keyword of selection;
According to the inverted index information corresponding to identified target URL, determine to believe the access feature of the target URL Breath meets user's composition online preferences user group corresponding to each user identifier of user's screening conditions.
Preferably, it is described when it needs to be determined that during online preferences user group, selecting institute of the online preferences user group right The one or more keywords answered, and corresponding target URL is determined according to the keyword of selection, it specifically includes:
Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets the The URL of one URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index corresponding to a keyword Information specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage; Or,
According to the Webpage searching result of selected keyword in a search engine, determine to meet the 2nd URL screening conditions Webpage URL be the keyword corresponding to target URL.
Preferably, it is described when it needs to be determined that during online preferences user group, selecting institute of the online preferences user group right The one or more keywords answered, and corresponding target URL is determined according to the keyword of selection, it further includes:
Service feature information according to corresponding to selected keyword screens identified target URL.
Preferably, traversal user's internet log record to be analyzed generates user's internet log record respectively In inverted index information corresponding to included each URL, further include:
According to the needs of different analytical cycles, corresponding under different time intervals fall is generated respectively to same URL Index information is arranged, and carries different timestamp informations respectively.
Preferably, the inverted index information determined by the basis corresponding to target URL is determined to the target URL Access characteristic information meet corresponding to each user identifier of user's screening conditions user composition it is described online preferences user group Body specifically includes:
The timestamp information of inverted index information and its carrying according to corresponding to identified target URL, determines to institute State the access times of target URL and meet access cycle corresponding to each user identifier of user's screening conditions user composition described in Surf the Internet preferences user group.
Further, the embodiment of the present invention also proposed a kind of network equipment, including:
Generation module for traversing user's internet log record to be analyzed, generates user's internet log note respectively Inverted index information in record corresponding to included each URL, wherein, the inverted index information corresponding to a URL is specifically wrapped Include the access characteristic information of the user identifier for accessing the URL and the user identifier to the URL;
URL screening modules, for it needs to be determined that during online preferences user group, selecting the online preferences user group Corresponding one or more keywords, and corresponding target URL is determined according to the keyword of selection;
User's screening module, for being generated according to the generation module and mesh determined by the URL screening modules The corresponding inverted index information of URL is marked, determines to meet each of user's screening conditions to the access characteristic information of the target URL User's composition online preferences user group corresponding to user identifier.
Preferably, the URL screening modules, are specifically used for:
Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets the The URL of one URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index corresponding to a keyword Information specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage; Or,
According to the Webpage searching result of selected keyword in a search engine, determine to meet the 2nd URL screening conditions Webpage URL be the keyword corresponding to target URL.
Preferably, the URL screening modules, are additionally operable to:
Service feature information according to corresponding to selected keyword screens identified target URL.
Preferably, the generation module, is additionally operable to:
According to the needs of different analytical cycles, corresponding under different time intervals fall is generated respectively to same URL Index information is arranged, and carries different timestamp informations respectively.
Preferably, user's screening module, is specifically used for:
The target URL according to determined by being generated according to the generation module with the URL screening modules is corresponding Inverted index information and its timestamp information carried, determine to the access times of the target URL and meet use access cycle User's composition online preferences user group corresponding to each user identifier of family screening conditions.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
Description of the drawings
Fig. 1 is a kind of flow diagram of the determining method of preferences user group that surfs the Internet provided in an embodiment of the present invention;
Fig. 2 is the determining method of the online preferences user group in a kind of concrete application scene provided in an embodiment of the present invention Flow diagram;
The structure diagram of a kind of network equipment that Fig. 3 is proposed by the embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the present invention, the technical solution in the present invention is clearly and completely described, is shown So, described embodiment is only the section Example of the present invention, instead of all the embodiments.Based on the reality in the present invention Example is applied, all other embodiment that those of ordinary skill in the art are obtained without making creative work all belongs to In the scope of protection of the invention.
As shown in Figure 1, the flow diagram of the determining method for online preferences user group provided in an embodiment of the present invention, This method specifically includes:
Step S101, user's internet log record to be analyzed is traversed, is generated respectively in user's internet log record Inverted index information corresponding to included each URL.
Wherein, the inverted index information corresponding to a URL specifically includes the user identifier for accessing the URL, Yi Jisuo State access characteristic information of the user identifier to the URL.
In specific application scenarios, the user identifier in the access characteristic information of the URL to that can enter oneself for the examination access Number, the access time of last time, access time first time etc. can characterize the information to the URL access characteristics, specific to believe Breath content can be adjusted according to actual needs, and such variation can't influence protection scope of the present invention.
It should be noted that the processing of this step, is actually being ready for further user's screening for subsequent processing Foundation, therefore, it is necessary to be completed before subsequent step performs.
In order to realize such requirement, in specific application scenarios, corresponding statistical disposition period, period can be set Property to user's internet log record analyze and process, and obtain corresponding inverted index information.
Setting in this way, on the one hand, can be updated according to certain period to inverted index information, be follow-up Step provides more anxious accurate analysis foundation, on the other hand, can be to avoid caused by the centralized processing mass data of burst System processing load, the handling result of this step will not be waited for because of subsequent step and leads to Time Delay of Systems.
On the basis of this step process is completed, when it needs to be determined that during online preferences user group, performing step S102.
Step S102, one or more keywords corresponding to the online preferences user group are selected, and according to selection Keyword determine corresponding target URL.
It should be noted that according to the specific difference for determining target URL modes, this step can pass through following two sides Formula is realized:
Mode one, the inverted index information according to corresponding to selected keyword, determine the keyword goes out occurrence The URL that number meets the first URL screening conditions is the target URL corresponding to the keyword.
Wherein, the inverted index information corresponding to a keyword specifically includes the URL of the webpage containing the keyword, And occurrence number of the keyword in the webpage.
Such processing is equally employed similar to the inverted index information technology in step S101, and a keyword is existed The number occurred in webpage corresponding to one URL is counted, and sorted out according to keyword, compared with prior art, Reduce huge data processing pressure and associated data amount caused by being recorded to whole log informations.
It is further noted that the first URL screening conditions mentioned in the method are to reject interference information And a kind of threshold condition set, specifically, can be the minimum occurrence number of the keyword(So as to reject the too low net of word frequency Page record)Or type of webpage information(The type of webpage for being included in statistical result is not intended to so as to reject)Or other Data filtering condition, so as to avoid recording to data caused by data statistics result with the too low webpage of keyword relevance Interference.
In specific application scenarios, the content of the first URL screening conditions can be configured as needed, such change Change can't influence protection scope of the present invention.
Mode two, the Webpage searching result according to selected keyword in a search engine determine to meet the 2nd URL sieves The URL for selecting the webpage of condition is the target URL corresponding to the keyword.
Compared with mode one, there is quantity independent of specific keyword and carries out URL screenings in the method, but utilizes and search The function of search held up is indexed, URL is screened from the angle of webpage and keyword relevance.
It is further noted that the 2nd URL screening conditions mentioned in the method are to reject interference information And a kind of threshold condition set, specifically, can be ordinal positions of the URL in search result(Because search result is general According to search for information the degree of association either accesses temperature be ranked up so as to reject relevance it is too low or access temperature it is too low Webpage record)Or type of webpage information(The type of webpage for being included in statistical result is not intended to so as to reject), either Other data filtering conditions, so as to avoid recording to caused by data statistics result with the too low webpage of keyword relevance Data are interfered.
In specific application scenarios, the content of the 2nd URL screening conditions can be configured as needed, such change Change can't influence protection scope of the present invention.
Further, it is contemplated that business feature possessed by keyword itself, it can be right according to selected keyword institute The service feature information answered screens identified target URL, so as to further improve finally determining target URL.
For example, can further determine that other keywords according to other related informations corresponding to keyword, so as to, according to Keyword combination carries out the further screening of URL, can also determine the web page class associated by it according to the property of keyword itself Type, so as to carry out further type of webpage screening to URL.
Specifically, above-mentioned service feature information is not limited only to the above-mentioned content enumerated, it is every can be to URL into advancing one Step precision screening, so as to which the processing mode for improving the accuracy of finally determining target URL can be applied of the invention real It applies in the technical solution that example is proposed, such variation can't influence protection scope of the present invention.
Step S103, the inverted index information according to corresponding to identified target URL is determined to the target URL's It accesses the user that characteristic information meets corresponding to each user identifier of user's screening conditions and forms the online preferences user group.
It should be noted that user's screening conditions mentioned in this step are one set to reject interference information Kind threshold condition, specifically, can be access times(It is recorded so as to reject access of the access times less than a certain numerical value), also may be used To be access time interval(So as to reject the excessive access record in access time interval)Or other data filtering conditions, So as to which user's contingency be avoided to access or accidentally access etc. cannot embody user to the access record of the preference of corresponding website to data Data caused by statistical result are interfered.
It should be further noted that in view of the influence of the length for statistical result in timing statistics section, Ke Yi According to the needs of different analytical cycles in step S101, same URL is generated respectively corresponding under different time intervals Inverted index information, and different timestamp informations is carried respectively.
On this basis, the processing of step S103 can be specifically adjusted to:
The timestamp information of inverted index information and its carrying according to corresponding to identified target URL, determines to institute State the access times of target URL and meet access cycle corresponding to each user identifier of user's screening conditions user composition described in Surf the Internet preferences user group.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
The processing procedure of above-mentioned technical proposal is described in detail, but be not limited to down with specific embodiment below State embodiment.
As shown in Fig. 2, for the online preferences user group in a kind of concrete application scene provided in an embodiment of the present invention The flow diagram of method is determined, for the two ways mentioned by above-mentioned step S102, in the present embodiment specifically in a manner of Processing procedure is described for one, but this can't influence protection scope of the present invention.
Specifically, this method specifically includes:
Step S201, according to webpage information, the inverted index information corresponding to keyword is generated.
In specific application scenarios, the processing procedure of this step by the keyword that is pre-selected and can pass through net The webpage information that network obtains is realized.
The acquisition modes of specific webpage information can be the targeted information acquisition that specified a certain range of webpage carries out, Can also be the popularity information collection that all webpages are carried out, and specific acquisition of information means then can be according to actual needs It is selected, such variation can't influence protection scope of the present invention.
In the present embodiment, in a manner of named web page for carry out this step realize process explanation.
First, 3 webpages are specified(In practical applications, the quantity of named web page is much larger than this, and the present embodiment is intended merely to Facilitate explanation and use such quantity information, have no effect on protection domain):
Webpage A, webpage B, webpage C.
Then, it is determined that keyword to be counted, i.e., keyword key1, key2 may be included in above-mentioned each webpage, key3。
For quick indexing, first webpage is segmented, and counts word frequency, the form for establishing inverted index information is as follows:
Keyword:(Web page address 1:Word frequency, classification etc.);(Web page address 2:Word frequency);(Web page address 3:Word frequency).
Such as:
Key1:(Webpage A:5, amusement);(Webpage C:2, sport)
Key2:(Webpage B:1, news);(Webpage C:4, amusement)
Key3:(Webpage A:1, finance);(Webpage B:2, finance)
Specifically, when carrying out physical store, stored according to the pointer information after key, so, into row information During association, a string next of information can be obtained rapidly by key1.
It is exemplified below:
Games for university students:
(http://www.sz2011.org/:5, sport);(http://zhidao.***.com/question/ 4602235:7, sport).
Such information represents the inverted index information of keyword " Games for university students ", wherein, it is in URL addresseshttp:// www.sz2011.org/Webpage in, " Games for university students " word occurs 5 times, and is in URL addresseshttp:// zhidao.***.com/question/4602235Webpage in, " Games for university students " word occurs 7 times.
Step S202, it is recorded according to user's internet log, generation user accesses the inverted index information of URL.
Due to be in subsequent processing according to URL extract user information, so, in this step, need traversal one time with The internet log record at family, inverted index is established for the relationship of URL in log recording and user.
In specific application scenarios, the substantially form of log recording is as follows:
Field Example
Time 2013-7-112:00.987
End time 2013-7-112:01.876
userID User A
Access URL http://www.soopat.com/Home/Result?Sort=&
Uplink or downlinks
Flow
Application type Using(Wechat, microblogging, qq), webpage etc.
By traversing above-mentioned log information, a kind of user oriented inverted index information is established, form is specific It is as follows:
URL:(User ID:Access times, the last access time, initial access time, access duration time).
Similar with the inverted index information corresponding to aforementioned keyword during physical store, user accesses URL Inverted index information be also to be stored according to the form of key-value, wherein, key URL, value are(User ID:It visits Ask number, duration)List.
It can specifically be exemplified below:
http://www.soopat.com/Home/ResultSort=&:(User A:4,1s);(User B:2,10s).
http://www.chinanews.com/shipin/2013/08-13/news2771.shtml:(User C:5).
Such information represents " http respectively://www.soopat.com/Home/ResultThe row's of falling rope of Sort=& " Fuse ceases and " http:The inverted index of //www.chinanews.com/shipin/2013/08-13/news2771.shtml " Information.
Wherein, it is http for URL addresses://www.soopat.com/Home/ResultThe webpage of Sort=&, user A has accessed 4 times, and user B has accessed 2 times, and is http for URL addresses://www.chinanews.com/shipin/ 2013/08-13/news2771.shtml webpage, user C has accessed 5 times.
It should be noted that in this step, the inverted index information of URL and user are used for preserving URL and user's access Between incidence relation, and there is bigger difference in the row of falling of keyword and webpage in this relationship and step S201.It is crucial The inverted index information of word and webpage is relatively stablized(For webpage once generating, body matter variation will not very greatly), generally press It is updated according to some cycles, does not need to preserve multiple versions, be directly subject to latest data.And URL and user fall It arranges in index information, over time with the increase of user's internet behavior, corresponding user's internet log records content meeting Larger variation is generated, needs to preserve the index information of multiple versions.
In specific application scenarios, the inverted index information of URL and user can be updated according to day data, be passed through Timestamp can be shown that the date of this data, such as:The situation of the same day and one week can be preserved simultaneously, correspondingly, will be simultaneously There are the identical data records of two key, but its timestamp is different.
When data are read, after first can obtaining result according to key, filtered accordingly using timestamp.
After two above inverted index, the target of extraction online preferences user group can be rapidly achieved.
Step S203, according to the needs of online preferences user group to be determined, keyword is selected.
Step S204, the inverted index information generated according to step S201 determines to need the target URL of selection.
Step S205, the inverted index information generated according to step S202 determines to meet system corresponding to target URL The user of rule.
Certainly, step S203 to step S205 is the operation on backstage, during the foregrounding of specific manifestation, accordingly Realization it is specific as follows:
System inputs:Keyword
System convention:Such as it is more than 3 times that related web page was accessed in one week.It is limited in certain class website, such as novel Class.
System exports:The user group of keyword related web page was accessed in the recent period
For such processing procedure, concrete example is as follows:
Input keyword:BMW x5;
Export result:The user list of information corresponding to the preference keyword.
Internal system realizes that flow is as follows:
Step A, after receiving keyword, keyword is navigated in the webpage inverted index information generated in step S201 Corresponding all URL.
Step B, according to preset screening rule, URL ranges can be suitably reduced, such as top100 or 1000 is taken to be used as phase The url list held inside the Pass.
Step C, the url list that will be determined in step B, one by one in the step S202 URL generated and the row's of falling rope of user It is searched in fuse breath.
Step D, it is screened according to system convention, such as access times, period etc..
Step E, it is required user group to determine the user list after screening.
Direct the group of user is carried out it is further noted that above-mentioned as basic function according to keyword Disclosure satisfy that the requirement of the overwhelming majority, still, for service-oriented personalized user group further extracted it is excellent Change, following optimal screening processing can be carried out.
Such as:
For keyword " I is singer ", this is a required label of service feature, can actually correspond to " Hunan The such a series of keywords of satellite TV " & " I is singer " & " Saturday evening 10 points ".
For another example:
" swordsman's class " is the label needed for service feature, can actually be corresponded to " swordsman ", be may be defined as in system convention Categories of websites is novel class.
By above-mentioned mapping definition method, it can accomplish service-oriented, the configuration rule in flexible setting system, more It is convenient.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
Further, in order to realize above-mentioned technical solution, the embodiment of the present invention further provides a kind of network equipment, Its structure diagram is as shown in figure 3, specifically include:
Generation module 31 for traversing user's internet log record to be analyzed, generates user's internet log respectively Inverted index information in record corresponding to included each URL, wherein, the inverted index information corresponding to a URL is specific User identifier and the user identifier including the access URL are to the access characteristic information of the URL;
URL screening modules 32, for it needs to be determined that online preferences user group when, select it is described online preferences user group One or more keywords corresponding to body, and corresponding target URL is determined according to the keyword of selection;
User's screening module 33, for being generated according to the generation module 31 with 32 institute of URL screening modules really The corresponding inverted index information of fixed target URL determines that meeting the access characteristic information of the target URL user screens item User's composition online preferences user group corresponding to each user identifier of part.
Preferably, the URL screening modules 32, are specifically used for:
Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets the The URL of one URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index corresponding to a keyword Information specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage; Or,
According to the Webpage searching result of selected keyword in a search engine, determine to meet the 2nd URL screening conditions Webpage URL be the keyword corresponding to target URL.
Preferably, the URL screening modules 32, are additionally operable to:
Service feature information according to corresponding to selected keyword screens identified target URL.
Preferably, the generation module 31, is additionally operable to:
According to the needs of different analytical cycles, corresponding under different time intervals fall is generated respectively to same URL Index information is arranged, and carries different timestamp informations respectively.
Preferably, user's screening module 33, is specifically used for:
The target URL according to determined by being generated according to the generation module 31 with the URL screening modules 32 is opposite The inverted index information answered and its timestamp information carried determine to accord with the access times of the target URL and access cycle Share user's composition online preferences user group corresponding to each user identifier of family screening conditions.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can be by Software adds the mode of required general hardware platform to realize, naturally it is also possible to which by hardware, but the former is more in many cases Good embodiment.Based on such understanding, what technical scheme of the present invention substantially in other words contributed to the prior art Part can be embodied in the form of software product, which is stored in a storage medium, if including Dry instruction is used so that a computer equipment(Can be personal computer, server or the network equipment etc.)Perform this hair Method described in bright each embodiment.
It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, module or stream in attached drawing Journey is not necessarily implemented necessary to the present invention.
It will be appreciated by those skilled in the art that the module in device in embodiment can describe be divided according to embodiment It is distributed in the device of embodiment, respective change can also be carried out and be located in one or more devices different from the present embodiment.On The module for stating embodiment can be merged into a module, can also be further split into multiple submodule.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Disclosed above is only several specific embodiments of the present invention, and still, the present invention is not limited to this, any ability What the technical staff in domain can think variation should all fall into protection scope of the present invention.

Claims (8)

  1. A kind of 1. determining method for the preferences user group that surfs the Internet, which is characterized in that including:
    User's internet log record to be analyzed is traversed, generates each URL included in user's internet log record respectively Corresponding inverted index information, wherein, the inverted index information corresponding to a URL specifically includes the use for accessing the URL Family identifies and the user identifier is to the access characteristic information of the URL;Also, according to the needs of different analytical cycles, It generates inverted index information corresponding under different time intervals respectively to same URL, and carries different timestamps respectively Information;
    When it needs to be determined that during online preferences user group, select corresponding to the online preferences user group one or more closes Keyword, and corresponding target URL is determined according to the keyword of selection;
    According to the inverted index information corresponding to identified target URL, determine to accord with the access characteristic information of the target URL Share user's composition online preferences user group corresponding to each user identifier of family screening conditions.
  2. 2. the method as described in claim 1, which is characterized in that it is described when it needs to be determined that online preferences user group when, selection One or more keywords corresponding to the online preferences user group, and corresponding target is determined according to the keyword of selection URL is specifically included:
    Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets first The URL of URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index letter corresponding to a keyword Breath specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;Or,
    According to the Webpage searching result of selected keyword in a search engine, the net for meeting the 2nd URL screening conditions is determined The URL of page is the target URL corresponding to the keyword.
  3. 3. method as claimed in claim 2, which is characterized in that it is described when it needs to be determined that online preferences user group when, selection One or more keywords corresponding to the online preferences user group, and corresponding target is determined according to the keyword of selection URL is further included:
    Service feature information according to corresponding to selected keyword screens identified target URL.
  4. 4. the method as described in claim 1, which is characterized in that the row's of falling rope corresponding to target URL determined by the basis Fuse ceases, and determines to meet the access characteristic information of the target URL use corresponding to each user identifier of user's screening conditions The family composition online preferences user group, specifically includes:
    The timestamp information of inverted index information and its carrying according to corresponding to identified target URL, determines to the mesh The user for marking the access times of URL and meeting access cycle corresponding to each user identifier of user's screening conditions forms the online Preferences user group.
  5. 5. a kind of network equipment, which is characterized in that including:
    Generation module for traversing user's internet log record to be analyzed, generates in user's internet log record respectively Inverted index information corresponding to included each URL, wherein, the inverted index information corresponding to a URL specifically includes visit Ask the URL user identifier and the user identifier to the access characteristic information of the URL;The generation module, is also used In:According to the needs of different analytical cycles, inverted index corresponding under different time intervals is generated respectively to same URL Information, and different timestamp informations is carried respectively;
    URL screening modules, for it needs to be determined that during online preferences user group, selecting institute of the online preferences user group right The one or more keywords answered, and corresponding target URL is determined according to the keyword of selection;
    User's screening module, for being generated according to the generation module and target URL determined by the URL screening modules Corresponding inverted index information determines each user for meeting the access characteristic information of the target URL user's screening conditions The corresponding user's composition online preferences user group of mark.
  6. 6. the network equipment as claimed in claim 5, which is characterized in that the URL screening modules are specifically used for:
    Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets first The URL of URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index letter corresponding to a keyword Breath specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;Or,
    According to the Webpage searching result of selected keyword in a search engine, the net for meeting the 2nd URL screening conditions is determined The URL of page is the target URL corresponding to the keyword.
  7. 7. the network equipment as claimed in claim 6, which is characterized in that the URL screening modules are additionally operable to:
    Service feature information according to corresponding to selected keyword screens identified target URL.
  8. 8. the network equipment as claimed in claim 5, which is characterized in that user's screening module is specifically used for:
    The corresponding rows of falling of the target URL according to determined by being generated according to the generation module with the URL screening modules Index information and its timestamp information carried determine to the access times of the target URL and meet user's sieve access cycle The user corresponding to each user identifier of condition is selected to form the online preferences user group.
CN201310752439.5A 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet Active CN104750752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310752439.5A CN104750752B (en) 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310752439.5A CN104750752B (en) 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet

Publications (2)

Publication Number Publication Date
CN104750752A CN104750752A (en) 2015-07-01
CN104750752B true CN104750752B (en) 2018-06-15

Family

ID=53590447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310752439.5A Active CN104750752B (en) 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet

Country Status (1)

Country Link
CN (1) CN104750752B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145934B (en) * 2017-12-22 2019-05-21 北京数安鑫云信息技术有限公司 User behavior data processing method, medium, equipment and device based on log
CN109299084B (en) * 2018-10-24 2022-04-01 北京小米移动软件有限公司 User portrait data filtering method and device
CN112291622B (en) * 2020-10-30 2022-05-27 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN102402566A (en) * 2011-08-09 2012-04-04 江苏欣网视讯科技有限公司 Web user behavior analysis method based on Chinese webpage automatic classification technology
CN103338260A (en) * 2013-07-04 2013-10-02 武汉世纪金桥安全技术有限公司 Distributed analytical system and analytical method for URL logs in network auditing
CN103383685A (en) * 2012-05-02 2013-11-06 腾讯科技(深圳)有限公司 Method and device for keyword attribute quantification based on user click data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8010527B2 (en) * 2007-06-29 2011-08-30 Fuji Xerox Co., Ltd. System and method for recommending information resources to user based on history of user's online activity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN102402566A (en) * 2011-08-09 2012-04-04 江苏欣网视讯科技有限公司 Web user behavior analysis method based on Chinese webpage automatic classification technology
CN103383685A (en) * 2012-05-02 2013-11-06 腾讯科技(深圳)有限公司 Method and device for keyword attribute quantification based on user click data
CN103338260A (en) * 2013-07-04 2013-10-02 武汉世纪金桥安全技术有限公司 Distributed analytical system and analytical method for URL logs in network auditing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于数据挖掘的手机上网用户偏好应用模型和套餐生舱模型研究;易红;《中国优秀硕士学位论文全文数据库》;20130115;第22页-第27页第4.2节 *

Also Published As

Publication number Publication date
CN104750752A (en) 2015-07-01

Similar Documents

Publication Publication Date Title
US11734289B2 (en) Methods, systems, and media for providing a media search engine
US8751511B2 (en) Ranking of search results based on microblog data
JP5860456B2 (en) Determination and use of search term weighting
US20170255652A1 (en) Method for dynamically matching images with content items based on keywords in response to search queries
CN107038207A (en) A kind of data query method, data processing method and device
US20130185429A1 (en) Processing Store Visiting Data
US10275472B2 (en) Method for categorizing images to be associated with content items based on keywords of search queries
US10235387B2 (en) Method for selecting images for matching with content based on metadata of images and content in real-time in response to search queries
CN109063158B (en) Method, device, system and medium for inquiring website access ranking information
CN104750752B (en) A kind of determining method and apparatus for the preferences user group that surfs the Internet
CN110008393B (en) Method and equipment for acquiring website information
CN104123321B (en) A kind of determining method and device for recommending picture
CN106897313B (en) Mass user service preference evaluation method and device
CN104202418A (en) Method and system for recommending commercial content distribution network for content provider
US10594809B2 (en) Aggregation of web interactions for personalized usage
Holzmann et al. On the applicability of delicious for temporal search on web archives
CN106055572B (en) Page conversion parameter processing method and device
CN110913249A (en) Program recommendation method and system
CN108304545A (en) A kind of URL log storing methods and device
CN108959579A (en) A kind of system obtaining user and Document personalization feature
US20080195635A1 (en) Path indexing for network data
Chiniah et al. Categorising AWS Common Crawl dataset using mapreduce
US9996621B2 (en) System and method for retrieving internet pages using page partitions
CN110188301A (en) Information aggregation method and device for website
Hintze et al. Picky: Efficient and reproducible sharing of large datasets using merkle-trees

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant