CN104750752B - A kind of determining method and apparatus for the preferences user group that surfs the Internet - Google Patents
A kind of determining method and apparatus for the preferences user group that surfs the Internet Download PDFInfo
- Publication number
- CN104750752B CN104750752B CN201310752439.5A CN201310752439A CN104750752B CN 104750752 B CN104750752 B CN 104750752B CN 201310752439 A CN201310752439 A CN 201310752439A CN 104750752 B CN104750752 B CN 104750752B
- Authority
- CN
- China
- Prior art keywords
- url
- user
- keyword
- user group
- inverted index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention discloses a kind of determining methods and apparatus for the preferences user group that surfs the Internet, pass through the technical solution proposed using the embodiment of the present invention, it needs to be determined that online preferences user group when, keyword according to corresponding to online preferences user group determines corresponding target URL, and the inverted index information with reference to corresponding to target URL, it determines to form the online preferences user group to the user that the access times of target URL meet corresponding to each user identifier of user's screening conditions, so as to, make full use of inverted index information high-performance, the feature of high flexibility ratio, realize the quick obtaining of online preferences user group, avoid the consumption of system resource caused by mass data record and matching, improve the treatment effeciency of online preferences user group determination process and screening accuracy.
Description
Technical field
The present invention relates to network technique field more particularly to a kind of determining methods and apparatus for the preferences user group that surfs the Internet.
Background technology
In existing technical solution, it may be generally based upon web page contents and carry out customer behavior analysis, as long as user is upper
Webpage is browsed in network process, system can be based on analysis user using the access network address of mobile phone or broadband access network, according to URL library
Carry out analysing in depth matching and sort out, sum up the hobby attribute of user, so as to according to the hobby of user on website personalized ground
Show to its valuable content.
Wherein, concrete implementation example is as follows:
Step A, selected one or more descriptor, such as x86, BMW, schoolmate etc., it is defeated as search key
Enter search engine, so as to get a series of relevant web page address list of this keyword;
Step B, the address list in step A, the daily record behavior accessed with user match, and find according to a set pattern
Then access the user group of these address lists.
Such user group is to the above-mentioned interested user group of selected descriptor.
In the implementation of the present invention, inventor has found to have at least the following problems in the prior art:
Data volume is big.With current user volume state, the data scale of daily record data is very huge, and rapid development, such as
Fruit matches with the relevant web page address list of keyword, especially also needs in the case of matching certain rule, it will into
One step is as follows there are problem:
A) it is very poor directly to do operation associated performance, on the one hand, the data scale of daily record data is very huge, on the other hand,
The quantity for the web page address being associated with it, then can because of the variation of selected keyword difference and search rule, and
Generate violent fluctuation, the stability of data scale is very poor, and the difference of the two data scale be also it is very huge, with
For the portfolio of one province, 17,000,000,000 daily record datas can be generated daily, along with calculating cycle, such as one week or one
A month, table was huge.And the quantity for the web page address being associated may then only have 2,000,000,000 or so.Each user group
Acquisition will carry out the operation associated of the two big tables.
B) the result storage redundancy after being associated with is big, and still with above-mentioned data instance, 8 times of storages of the capacity of 2,000,000,000 tables are superfluous
It is remaining(170/20=8), also, the daily record data moment of user all updating, if it is desired to carry out some cycles user behavior group
It obtains, then needs to preserve a large amount of daily records, cause the consumption of a large amount of memory spaces.
Invention content
The embodiment of the present invention is designed to provide a kind of determining method and apparatus for the preferences user group that surfs the Internet, Ke Yigeng
Add the determining online preferences user group of accurate quick.
In order to achieve the above object, an embodiment of the present invention provides a kind of determining method for the preferences user group that surfs the Internet, packets
It includes:
User's internet log record to be analyzed is traversed, is generated respectively each included by user's internet log record
Inverted index information corresponding to URL, wherein, the inverted index information corresponding to a URL, which specifically includes, accesses the URL's
User identifier and the user identifier are to the access characteristic information of the URL;
When it needs to be determined that during online preferences user group, selecting one or more corresponding to the online preferences user group
A keyword, and corresponding target URL is determined according to the keyword of selection;
According to the inverted index information corresponding to identified target URL, determine to believe the access feature of the target URL
Breath meets user's composition online preferences user group corresponding to each user identifier of user's screening conditions.
Preferably, it is described when it needs to be determined that during online preferences user group, selecting institute of the online preferences user group right
The one or more keywords answered, and corresponding target URL is determined according to the keyword of selection, it specifically includes:
Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets the
The URL of one URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index corresponding to a keyword
Information specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;
Or,
According to the Webpage searching result of selected keyword in a search engine, determine to meet the 2nd URL screening conditions
Webpage URL be the keyword corresponding to target URL.
Preferably, it is described when it needs to be determined that during online preferences user group, selecting institute of the online preferences user group right
The one or more keywords answered, and corresponding target URL is determined according to the keyword of selection, it further includes:
Service feature information according to corresponding to selected keyword screens identified target URL.
Preferably, traversal user's internet log record to be analyzed generates user's internet log record respectively
In inverted index information corresponding to included each URL, further include:
According to the needs of different analytical cycles, corresponding under different time intervals fall is generated respectively to same URL
Index information is arranged, and carries different timestamp informations respectively.
Preferably, the inverted index information determined by the basis corresponding to target URL is determined to the target URL
Access characteristic information meet corresponding to each user identifier of user's screening conditions user composition it is described online preferences user group
Body specifically includes:
The timestamp information of inverted index information and its carrying according to corresponding to identified target URL, determines to institute
State the access times of target URL and meet access cycle corresponding to each user identifier of user's screening conditions user composition described in
Surf the Internet preferences user group.
Further, the embodiment of the present invention also proposed a kind of network equipment, including:
Generation module for traversing user's internet log record to be analyzed, generates user's internet log note respectively
Inverted index information in record corresponding to included each URL, wherein, the inverted index information corresponding to a URL is specifically wrapped
Include the access characteristic information of the user identifier for accessing the URL and the user identifier to the URL;
URL screening modules, for it needs to be determined that during online preferences user group, selecting the online preferences user group
Corresponding one or more keywords, and corresponding target URL is determined according to the keyword of selection;
User's screening module, for being generated according to the generation module and mesh determined by the URL screening modules
The corresponding inverted index information of URL is marked, determines to meet each of user's screening conditions to the access characteristic information of the target URL
User's composition online preferences user group corresponding to user identifier.
Preferably, the URL screening modules, are specifically used for:
Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets the
The URL of one URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index corresponding to a keyword
Information specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;
Or,
According to the Webpage searching result of selected keyword in a search engine, determine to meet the 2nd URL screening conditions
Webpage URL be the keyword corresponding to target URL.
Preferably, the URL screening modules, are additionally operable to:
Service feature information according to corresponding to selected keyword screens identified target URL.
Preferably, the generation module, is additionally operable to:
According to the needs of different analytical cycles, corresponding under different time intervals fall is generated respectively to same URL
Index information is arranged, and carries different timestamp informations respectively.
Preferably, user's screening module, is specifically used for:
The target URL according to determined by being generated according to the generation module with the URL screening modules is corresponding
Inverted index information and its timestamp information carried, determine to the access times of the target URL and meet use access cycle
User's composition online preferences user group corresponding to each user identifier of family screening conditions.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root
Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group
Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions
Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online
The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves
The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
Description of the drawings
Fig. 1 is a kind of flow diagram of the determining method of preferences user group that surfs the Internet provided in an embodiment of the present invention;
Fig. 2 is the determining method of the online preferences user group in a kind of concrete application scene provided in an embodiment of the present invention
Flow diagram;
The structure diagram of a kind of network equipment that Fig. 3 is proposed by the embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the present invention, the technical solution in the present invention is clearly and completely described, is shown
So, described embodiment is only the section Example of the present invention, instead of all the embodiments.Based on the reality in the present invention
Example is applied, all other embodiment that those of ordinary skill in the art are obtained without making creative work all belongs to
In the scope of protection of the invention.
As shown in Figure 1, the flow diagram of the determining method for online preferences user group provided in an embodiment of the present invention,
This method specifically includes:
Step S101, user's internet log record to be analyzed is traversed, is generated respectively in user's internet log record
Inverted index information corresponding to included each URL.
Wherein, the inverted index information corresponding to a URL specifically includes the user identifier for accessing the URL, Yi Jisuo
State access characteristic information of the user identifier to the URL.
In specific application scenarios, the user identifier in the access characteristic information of the URL to that can enter oneself for the examination access
Number, the access time of last time, access time first time etc. can characterize the information to the URL access characteristics, specific to believe
Breath content can be adjusted according to actual needs, and such variation can't influence protection scope of the present invention.
It should be noted that the processing of this step, is actually being ready for further user's screening for subsequent processing
Foundation, therefore, it is necessary to be completed before subsequent step performs.
In order to realize such requirement, in specific application scenarios, corresponding statistical disposition period, period can be set
Property to user's internet log record analyze and process, and obtain corresponding inverted index information.
Setting in this way, on the one hand, can be updated according to certain period to inverted index information, be follow-up
Step provides more anxious accurate analysis foundation, on the other hand, can be to avoid caused by the centralized processing mass data of burst
System processing load, the handling result of this step will not be waited for because of subsequent step and leads to Time Delay of Systems.
On the basis of this step process is completed, when it needs to be determined that during online preferences user group, performing step S102.
Step S102, one or more keywords corresponding to the online preferences user group are selected, and according to selection
Keyword determine corresponding target URL.
It should be noted that according to the specific difference for determining target URL modes, this step can pass through following two sides
Formula is realized:
Mode one, the inverted index information according to corresponding to selected keyword, determine the keyword goes out occurrence
The URL that number meets the first URL screening conditions is the target URL corresponding to the keyword.
Wherein, the inverted index information corresponding to a keyword specifically includes the URL of the webpage containing the keyword,
And occurrence number of the keyword in the webpage.
Such processing is equally employed similar to the inverted index information technology in step S101, and a keyword is existed
The number occurred in webpage corresponding to one URL is counted, and sorted out according to keyword, compared with prior art,
Reduce huge data processing pressure and associated data amount caused by being recorded to whole log informations.
It is further noted that the first URL screening conditions mentioned in the method are to reject interference information
And a kind of threshold condition set, specifically, can be the minimum occurrence number of the keyword(So as to reject the too low net of word frequency
Page record)Or type of webpage information(The type of webpage for being included in statistical result is not intended to so as to reject)Or other
Data filtering condition, so as to avoid recording to data caused by data statistics result with the too low webpage of keyword relevance
Interference.
In specific application scenarios, the content of the first URL screening conditions can be configured as needed, such change
Change can't influence protection scope of the present invention.
Mode two, the Webpage searching result according to selected keyword in a search engine determine to meet the 2nd URL sieves
The URL for selecting the webpage of condition is the target URL corresponding to the keyword.
Compared with mode one, there is quantity independent of specific keyword and carries out URL screenings in the method, but utilizes and search
The function of search held up is indexed, URL is screened from the angle of webpage and keyword relevance.
It is further noted that the 2nd URL screening conditions mentioned in the method are to reject interference information
And a kind of threshold condition set, specifically, can be ordinal positions of the URL in search result(Because search result is general
According to search for information the degree of association either accesses temperature be ranked up so as to reject relevance it is too low or access temperature it is too low
Webpage record)Or type of webpage information(The type of webpage for being included in statistical result is not intended to so as to reject), either
Other data filtering conditions, so as to avoid recording to caused by data statistics result with the too low webpage of keyword relevance
Data are interfered.
In specific application scenarios, the content of the 2nd URL screening conditions can be configured as needed, such change
Change can't influence protection scope of the present invention.
Further, it is contemplated that business feature possessed by keyword itself, it can be right according to selected keyword institute
The service feature information answered screens identified target URL, so as to further improve finally determining target URL.
For example, can further determine that other keywords according to other related informations corresponding to keyword, so as to, according to
Keyword combination carries out the further screening of URL, can also determine the web page class associated by it according to the property of keyword itself
Type, so as to carry out further type of webpage screening to URL.
Specifically, above-mentioned service feature information is not limited only to the above-mentioned content enumerated, it is every can be to URL into advancing one
Step precision screening, so as to which the processing mode for improving the accuracy of finally determining target URL can be applied of the invention real
It applies in the technical solution that example is proposed, such variation can't influence protection scope of the present invention.
Step S103, the inverted index information according to corresponding to identified target URL is determined to the target URL's
It accesses the user that characteristic information meets corresponding to each user identifier of user's screening conditions and forms the online preferences user group.
It should be noted that user's screening conditions mentioned in this step are one set to reject interference information
Kind threshold condition, specifically, can be access times(It is recorded so as to reject access of the access times less than a certain numerical value), also may be used
To be access time interval(So as to reject the excessive access record in access time interval)Or other data filtering conditions,
So as to which user's contingency be avoided to access or accidentally access etc. cannot embody user to the access record of the preference of corresponding website to data
Data caused by statistical result are interfered.
It should be further noted that in view of the influence of the length for statistical result in timing statistics section, Ke Yi
According to the needs of different analytical cycles in step S101, same URL is generated respectively corresponding under different time intervals
Inverted index information, and different timestamp informations is carried respectively.
On this basis, the processing of step S103 can be specifically adjusted to:
The timestamp information of inverted index information and its carrying according to corresponding to identified target URL, determines to institute
State the access times of target URL and meet access cycle corresponding to each user identifier of user's screening conditions user composition described in
Surf the Internet preferences user group.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root
Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group
Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions
Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online
The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves
The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
The processing procedure of above-mentioned technical proposal is described in detail, but be not limited to down with specific embodiment below
State embodiment.
As shown in Fig. 2, for the online preferences user group in a kind of concrete application scene provided in an embodiment of the present invention
The flow diagram of method is determined, for the two ways mentioned by above-mentioned step S102, in the present embodiment specifically in a manner of
Processing procedure is described for one, but this can't influence protection scope of the present invention.
Specifically, this method specifically includes:
Step S201, according to webpage information, the inverted index information corresponding to keyword is generated.
In specific application scenarios, the processing procedure of this step by the keyword that is pre-selected and can pass through net
The webpage information that network obtains is realized.
The acquisition modes of specific webpage information can be the targeted information acquisition that specified a certain range of webpage carries out,
Can also be the popularity information collection that all webpages are carried out, and specific acquisition of information means then can be according to actual needs
It is selected, such variation can't influence protection scope of the present invention.
In the present embodiment, in a manner of named web page for carry out this step realize process explanation.
First, 3 webpages are specified(In practical applications, the quantity of named web page is much larger than this, and the present embodiment is intended merely to
Facilitate explanation and use such quantity information, have no effect on protection domain):
Webpage A, webpage B, webpage C.
Then, it is determined that keyword to be counted, i.e., keyword key1, key2 may be included in above-mentioned each webpage,
key3。
For quick indexing, first webpage is segmented, and counts word frequency, the form for establishing inverted index information is as follows:
Keyword:(Web page address 1:Word frequency, classification etc.);(Web page address 2:Word frequency);(Web page address 3:Word frequency).
Such as:
Key1:(Webpage A:5, amusement);(Webpage C:2, sport)
Key2:(Webpage B:1, news);(Webpage C:4, amusement)
Key3:(Webpage A:1, finance);(Webpage B:2, finance)
Specifically, when carrying out physical store, stored according to the pointer information after key, so, into row information
During association, a string next of information can be obtained rapidly by key1.
It is exemplified below:
Games for university students:
(http://www.sz2011.org/:5, sport);(http://zhidao.***.com/question/ 4602235:7, sport).
Such information represents the inverted index information of keyword " Games for university students ", wherein, it is in URL addresseshttp:// www.sz2011.org/Webpage in, " Games for university students " word occurs 5 times, and is in URL addresseshttp:// zhidao.***.com/question/4602235Webpage in, " Games for university students " word occurs 7 times.
Step S202, it is recorded according to user's internet log, generation user accesses the inverted index information of URL.
Due to be in subsequent processing according to URL extract user information, so, in this step, need traversal one time with
The internet log record at family, inverted index is established for the relationship of URL in log recording and user.
In specific application scenarios, the substantially form of log recording is as follows:
Field | Example |
Time | 2013-7-112:00.987 |
End time | 2013-7-112:01.876 |
userID | User A |
Access URL | http://www.soopat.com/Home/Result?Sort=& |
Uplink or downlinks | |
Flow | |
Application type | Using(Wechat, microblogging, qq), webpage etc. |
By traversing above-mentioned log information, a kind of user oriented inverted index information is established, form is specific
It is as follows:
URL:(User ID:Access times, the last access time, initial access time, access duration time).
Similar with the inverted index information corresponding to aforementioned keyword during physical store, user accesses URL
Inverted index information be also to be stored according to the form of key-value, wherein, key URL, value are(User ID:It visits
Ask number, duration)List.
It can specifically be exemplified below:
http://www.soopat.com/Home/ResultSort=&:(User A:4,1s);(User B:2,10s).
http://www.chinanews.com/shipin/2013/08-13/news2771.shtml:(User C:5).
Such information represents " http respectively://www.soopat.com/Home/ResultThe row's of falling rope of Sort=& "
Fuse ceases and " http:The inverted index of //www.chinanews.com/shipin/2013/08-13/news2771.shtml "
Information.
Wherein, it is http for URL addresses://www.soopat.com/Home/ResultThe webpage of Sort=&, user
A has accessed 4 times, and user B has accessed 2 times, and is http for URL addresses://www.chinanews.com/shipin/
2013/08-13/news2771.shtml webpage, user C has accessed 5 times.
It should be noted that in this step, the inverted index information of URL and user are used for preserving URL and user's access
Between incidence relation, and there is bigger difference in the row of falling of keyword and webpage in this relationship and step S201.It is crucial
The inverted index information of word and webpage is relatively stablized(For webpage once generating, body matter variation will not very greatly), generally press
It is updated according to some cycles, does not need to preserve multiple versions, be directly subject to latest data.And URL and user fall
It arranges in index information, over time with the increase of user's internet behavior, corresponding user's internet log records content meeting
Larger variation is generated, needs to preserve the index information of multiple versions.
In specific application scenarios, the inverted index information of URL and user can be updated according to day data, be passed through
Timestamp can be shown that the date of this data, such as:The situation of the same day and one week can be preserved simultaneously, correspondingly, will be simultaneously
There are the identical data records of two key, but its timestamp is different.
When data are read, after first can obtaining result according to key, filtered accordingly using timestamp.
After two above inverted index, the target of extraction online preferences user group can be rapidly achieved.
Step S203, according to the needs of online preferences user group to be determined, keyword is selected.
Step S204, the inverted index information generated according to step S201 determines to need the target URL of selection.
Step S205, the inverted index information generated according to step S202 determines to meet system corresponding to target URL
The user of rule.
Certainly, step S203 to step S205 is the operation on backstage, during the foregrounding of specific manifestation, accordingly
Realization it is specific as follows:
System inputs:Keyword
System convention:Such as it is more than 3 times that related web page was accessed in one week.It is limited in certain class website, such as novel
Class.
System exports:The user group of keyword related web page was accessed in the recent period
For such processing procedure, concrete example is as follows:
Input keyword:BMW x5;
Export result:The user list of information corresponding to the preference keyword.
Internal system realizes that flow is as follows:
Step A, after receiving keyword, keyword is navigated in the webpage inverted index information generated in step S201
Corresponding all URL.
Step B, according to preset screening rule, URL ranges can be suitably reduced, such as top100 or 1000 is taken to be used as phase
The url list held inside the Pass.
Step C, the url list that will be determined in step B, one by one in the step S202 URL generated and the row's of falling rope of user
It is searched in fuse breath.
Step D, it is screened according to system convention, such as access times, period etc..
Step E, it is required user group to determine the user list after screening.
Direct the group of user is carried out it is further noted that above-mentioned as basic function according to keyword
Disclosure satisfy that the requirement of the overwhelming majority, still, for service-oriented personalized user group further extracted it is excellent
Change, following optimal screening processing can be carried out.
Such as:
For keyword " I is singer ", this is a required label of service feature, can actually correspond to " Hunan
The such a series of keywords of satellite TV " & " I is singer " & " Saturday evening 10 points ".
For another example:
" swordsman's class " is the label needed for service feature, can actually be corresponded to " swordsman ", be may be defined as in system convention
Categories of websites is novel class.
By above-mentioned mapping definition method, it can accomplish service-oriented, the configuration rule in flexible setting system, more
It is convenient.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root
Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group
Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions
Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online
The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves
The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
Further, in order to realize above-mentioned technical solution, the embodiment of the present invention further provides a kind of network equipment,
Its structure diagram is as shown in figure 3, specifically include:
Generation module 31 for traversing user's internet log record to be analyzed, generates user's internet log respectively
Inverted index information in record corresponding to included each URL, wherein, the inverted index information corresponding to a URL is specific
User identifier and the user identifier including the access URL are to the access characteristic information of the URL;
URL screening modules 32, for it needs to be determined that online preferences user group when, select it is described online preferences user group
One or more keywords corresponding to body, and corresponding target URL is determined according to the keyword of selection;
User's screening module 33, for being generated according to the generation module 31 with 32 institute of URL screening modules really
The corresponding inverted index information of fixed target URL determines that meeting the access characteristic information of the target URL user screens item
User's composition online preferences user group corresponding to each user identifier of part.
Preferably, the URL screening modules 32, are specifically used for:
Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets the
The URL of one URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index corresponding to a keyword
Information specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;
Or,
According to the Webpage searching result of selected keyword in a search engine, determine to meet the 2nd URL screening conditions
Webpage URL be the keyword corresponding to target URL.
Preferably, the URL screening modules 32, are additionally operable to:
Service feature information according to corresponding to selected keyword screens identified target URL.
Preferably, the generation module 31, is additionally operable to:
According to the needs of different analytical cycles, corresponding under different time intervals fall is generated respectively to same URL
Index information is arranged, and carries different timestamp informations respectively.
Preferably, user's screening module 33, is specifically used for:
The target URL according to determined by being generated according to the generation module 31 with the URL screening modules 32 is opposite
The inverted index information answered and its timestamp information carried determine to accord with the access times of the target URL and access cycle
Share user's composition online preferences user group corresponding to each user identifier of family screening conditions.
Compared with prior art, the technical solution that the embodiment of the present invention is proposed has the following advantages:
By the technical solution proposed using the embodiment of the present invention, it needs to be determined that during online preferences user group, root
Corresponding target URL, and the row of falling with reference to corresponding to target URL are determined according to the keyword corresponding to online preferences user group
Index information determines to meet the access times of the target URL user group corresponding to each user identifier of user's screening conditions
Into the online preferences user group, so as to make full use of inverted index information high-performance, the feature of high flexibility ratio, realize online
The quick obtaining of preferences user group avoids the consumption of system resource caused by mass data record and matching, improves
The treatment effeciency for preferences user group determination process of surfing the Internet and screening accuracy.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can be by
Software adds the mode of required general hardware platform to realize, naturally it is also possible to which by hardware, but the former is more in many cases
Good embodiment.Based on such understanding, what technical scheme of the present invention substantially in other words contributed to the prior art
Part can be embodied in the form of software product, which is stored in a storage medium, if including
Dry instruction is used so that a computer equipment(Can be personal computer, server or the network equipment etc.)Perform this hair
Method described in bright each embodiment.
It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, module or stream in attached drawing
Journey is not necessarily implemented necessary to the present invention.
It will be appreciated by those skilled in the art that the module in device in embodiment can describe be divided according to embodiment
It is distributed in the device of embodiment, respective change can also be carried out and be located in one or more devices different from the present embodiment.On
The module for stating embodiment can be merged into a module, can also be further split into multiple submodule.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Disclosed above is only several specific embodiments of the present invention, and still, the present invention is not limited to this, any ability
What the technical staff in domain can think variation should all fall into protection scope of the present invention.
Claims (8)
- A kind of 1. determining method for the preferences user group that surfs the Internet, which is characterized in that including:User's internet log record to be analyzed is traversed, generates each URL included in user's internet log record respectively Corresponding inverted index information, wherein, the inverted index information corresponding to a URL specifically includes the use for accessing the URL Family identifies and the user identifier is to the access characteristic information of the URL;Also, according to the needs of different analytical cycles, It generates inverted index information corresponding under different time intervals respectively to same URL, and carries different timestamps respectively Information;When it needs to be determined that during online preferences user group, select corresponding to the online preferences user group one or more closes Keyword, and corresponding target URL is determined according to the keyword of selection;According to the inverted index information corresponding to identified target URL, determine to accord with the access characteristic information of the target URL Share user's composition online preferences user group corresponding to each user identifier of family screening conditions.
- 2. the method as described in claim 1, which is characterized in that it is described when it needs to be determined that online preferences user group when, selection One or more keywords corresponding to the online preferences user group, and corresponding target is determined according to the keyword of selection URL is specifically included:Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets first The URL of URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index letter corresponding to a keyword Breath specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;Or,According to the Webpage searching result of selected keyword in a search engine, the net for meeting the 2nd URL screening conditions is determined The URL of page is the target URL corresponding to the keyword.
- 3. method as claimed in claim 2, which is characterized in that it is described when it needs to be determined that online preferences user group when, selection One or more keywords corresponding to the online preferences user group, and corresponding target is determined according to the keyword of selection URL is further included:Service feature information according to corresponding to selected keyword screens identified target URL.
- 4. the method as described in claim 1, which is characterized in that the row's of falling rope corresponding to target URL determined by the basis Fuse ceases, and determines to meet the access characteristic information of the target URL use corresponding to each user identifier of user's screening conditions The family composition online preferences user group, specifically includes:The timestamp information of inverted index information and its carrying according to corresponding to identified target URL, determines to the mesh The user for marking the access times of URL and meeting access cycle corresponding to each user identifier of user's screening conditions forms the online Preferences user group.
- 5. a kind of network equipment, which is characterized in that including:Generation module for traversing user's internet log record to be analyzed, generates in user's internet log record respectively Inverted index information corresponding to included each URL, wherein, the inverted index information corresponding to a URL specifically includes visit Ask the URL user identifier and the user identifier to the access characteristic information of the URL;The generation module, is also used In:According to the needs of different analytical cycles, inverted index corresponding under different time intervals is generated respectively to same URL Information, and different timestamp informations is carried respectively;URL screening modules, for it needs to be determined that during online preferences user group, selecting institute of the online preferences user group right The one or more keywords answered, and corresponding target URL is determined according to the keyword of selection;User's screening module, for being generated according to the generation module and target URL determined by the URL screening modules Corresponding inverted index information determines each user for meeting the access characteristic information of the target URL user's screening conditions The corresponding user's composition online preferences user group of mark.
- 6. the network equipment as claimed in claim 5, which is characterized in that the URL screening modules are specifically used for:Inverted index information according to corresponding to selected keyword determines that the occurrence number of the keyword meets first The URL of URL screening conditions is the target URL corresponding to the keyword, wherein, the inverted index letter corresponding to a keyword Breath specifically includes the occurrence number of the URL and the keyword of the webpage containing the keyword in the webpage;Or,According to the Webpage searching result of selected keyword in a search engine, the net for meeting the 2nd URL screening conditions is determined The URL of page is the target URL corresponding to the keyword.
- 7. the network equipment as claimed in claim 6, which is characterized in that the URL screening modules are additionally operable to:Service feature information according to corresponding to selected keyword screens identified target URL.
- 8. the network equipment as claimed in claim 5, which is characterized in that user's screening module is specifically used for:The corresponding rows of falling of the target URL according to determined by being generated according to the generation module with the URL screening modules Index information and its timestamp information carried determine to the access times of the target URL and meet user's sieve access cycle The user corresponding to each user identifier of condition is selected to form the online preferences user group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310752439.5A CN104750752B (en) | 2013-12-31 | 2013-12-31 | A kind of determining method and apparatus for the preferences user group that surfs the Internet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310752439.5A CN104750752B (en) | 2013-12-31 | 2013-12-31 | A kind of determining method and apparatus for the preferences user group that surfs the Internet |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104750752A CN104750752A (en) | 2015-07-01 |
CN104750752B true CN104750752B (en) | 2018-06-15 |
Family
ID=53590447
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310752439.5A Active CN104750752B (en) | 2013-12-31 | 2013-12-31 | A kind of determining method and apparatus for the preferences user group that surfs the Internet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104750752B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145934B (en) * | 2017-12-22 | 2019-05-21 | 北京数安鑫云信息技术有限公司 | User behavior data processing method, medium, equipment and device based on log |
CN109299084B (en) * | 2018-10-24 | 2022-04-01 | 北京小米移动软件有限公司 | User portrait data filtering method and device |
CN112291622B (en) * | 2020-10-30 | 2022-05-27 | 中国建设银行股份有限公司 | Method and device for determining favorite internet surfing time period of user |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814083A (en) * | 2010-01-08 | 2010-08-25 | 上海复歌信息科技有限公司 | Automatic webpage classification method and system |
CN102402566A (en) * | 2011-08-09 | 2012-04-04 | 江苏欣网视讯科技有限公司 | Web user behavior analysis method based on Chinese webpage automatic classification technology |
CN103338260A (en) * | 2013-07-04 | 2013-10-02 | 武汉世纪金桥安全技术有限公司 | Distributed analytical system and analytical method for URL logs in network auditing |
CN103383685A (en) * | 2012-05-02 | 2013-11-06 | 腾讯科技(深圳)有限公司 | Method and device for keyword attribute quantification based on user click data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8010527B2 (en) * | 2007-06-29 | 2011-08-30 | Fuji Xerox Co., Ltd. | System and method for recommending information resources to user based on history of user's online activity |
-
2013
- 2013-12-31 CN CN201310752439.5A patent/CN104750752B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814083A (en) * | 2010-01-08 | 2010-08-25 | 上海复歌信息科技有限公司 | Automatic webpage classification method and system |
CN102402566A (en) * | 2011-08-09 | 2012-04-04 | 江苏欣网视讯科技有限公司 | Web user behavior analysis method based on Chinese webpage automatic classification technology |
CN103383685A (en) * | 2012-05-02 | 2013-11-06 | 腾讯科技(深圳)有限公司 | Method and device for keyword attribute quantification based on user click data |
CN103338260A (en) * | 2013-07-04 | 2013-10-02 | 武汉世纪金桥安全技术有限公司 | Distributed analytical system and analytical method for URL logs in network auditing |
Non-Patent Citations (1)
Title |
---|
基于数据挖掘的手机上网用户偏好应用模型和套餐生舱模型研究;易红;《中国优秀硕士学位论文全文数据库》;20130115;第22页-第27页第4.2节 * |
Also Published As
Publication number | Publication date |
---|---|
CN104750752A (en) | 2015-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11734289B2 (en) | Methods, systems, and media for providing a media search engine | |
US8751511B2 (en) | Ranking of search results based on microblog data | |
JP5860456B2 (en) | Determination and use of search term weighting | |
US20170255652A1 (en) | Method for dynamically matching images with content items based on keywords in response to search queries | |
CN107038207A (en) | A kind of data query method, data processing method and device | |
US20130185429A1 (en) | Processing Store Visiting Data | |
US10275472B2 (en) | Method for categorizing images to be associated with content items based on keywords of search queries | |
US10235387B2 (en) | Method for selecting images for matching with content based on metadata of images and content in real-time in response to search queries | |
CN109063158B (en) | Method, device, system and medium for inquiring website access ranking information | |
CN104750752B (en) | A kind of determining method and apparatus for the preferences user group that surfs the Internet | |
CN110008393B (en) | Method and equipment for acquiring website information | |
CN104123321B (en) | A kind of determining method and device for recommending picture | |
CN106897313B (en) | Mass user service preference evaluation method and device | |
CN104202418A (en) | Method and system for recommending commercial content distribution network for content provider | |
US10594809B2 (en) | Aggregation of web interactions for personalized usage | |
Holzmann et al. | On the applicability of delicious for temporal search on web archives | |
CN106055572B (en) | Page conversion parameter processing method and device | |
CN110913249A (en) | Program recommendation method and system | |
CN108304545A (en) | A kind of URL log storing methods and device | |
CN108959579A (en) | A kind of system obtaining user and Document personalization feature | |
US20080195635A1 (en) | Path indexing for network data | |
Chiniah et al. | Categorising AWS Common Crawl dataset using mapreduce | |
US9996621B2 (en) | System and method for retrieving internet pages using page partitions | |
CN110188301A (en) | Information aggregation method and device for website | |
Hintze et al. | Picky: Efficient and reproducible sharing of large datasets using merkle-trees |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |