CN110020364A - The method and apparatus for determining the traffic source of page access - Google Patents
The method and apparatus for determining the traffic source of page access Download PDFInfo
- Publication number
- CN110020364A CN110020364A CN201711205737.7A CN201711205737A CN110020364A CN 110020364 A CN110020364 A CN 110020364A CN 201711205737 A CN201711205737 A CN 201711205737A CN 110020364 A CN110020364 A CN 110020364A
- Authority
- CN
- China
- Prior art keywords
- record
- characteristic
- page access
- click
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses the method and apparatus for the traffic source for determining page access, are related to field of computer technology.One specific embodiment of this method includes: to obtain one or more page access record in first time period and one or more ad clicks record in second time period;Extract the feature of each page access record in first time period and each ad click record in second time period, to obtain the characteristic of page access record and the characteristic of ad click record, and obtain the characteristic set being made of all characteristics;In the characteristic set, the characteristic of the corresponding the last ad click record of the characteristic recorded according to each page access judges the traffic source type of page access record.The embodiment is efficiently quick, and accurately can carry out duplicate removal to traffic log.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of methods and dress of the traffic source of determining page access
It sets.
Background technique
Website traffic refers to the amount of access of website, is browsed for describing the number of users and user of one website of access
The indexs such as webpage quantity.In order to improve website traffic, e-commerce platform is usually drained using multiple channel, such as according to
By the displaying payment drainage way such as advertisement and search advertisements, or freely using the displaying floor of electric business platform and marketing activity etc.
Drainage way.In order to guarantee flow good conversion while promoting website traffic, supplier needs the flow to each channel
The quality of the flow of overall contribution and each channel is evaluated, this just needs to establish comprehensive, unified flow channel body
System is reasonably divided and is counted to each channel bring flow and conversion, and its premise is exactly to access Website page
It is analyzed in the source of flow.
The method that the existing source to page access flow is analyzed is by data warehouse to traffic log and wide
It accuses click logs and carries out conjunctive query, the determination of subsequent traffic source is then carried out by query result.
In realizing process of the present invention, at least there are the following problems in the prior art: the side of the prior art for inventor's discovery
For method when handling billions of traffic logs and several hundred million advertisement click logs, speed is slower, and efficiency is lower.
Therefore, a kind of method and apparatus of traffic source for efficiently quickly determining Website page access are needed.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus of the traffic source of determining Website page access,
It efficiently can quickly handle a large amount of daily record data.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of stream of determining page access is provided
The method for measuring source, comprising:
It obtains one or more page access record in first time period and the one or more in second time period is wide
It accuses and clicks record;
Extract each page access record in first time period and each ad click note in second time period
The feature of record to obtain the characteristic of page access record and the characteristic of ad click record, and is obtained by all spies
Levy the characteristic set of data composition;
In the characteristic set, the characteristic recorded according to each page access is corresponding the last wide
Accuse the traffic source type that the characteristic clicked and recorded judges page access record.
Further, the feature of the page access record includes: access time and access equipment number;Ad click record
Feature include: to click time and pointing device number;
It is described judgement page access record traffic source type the step of before, to the characteristic set carry out
Sequence, to obtain one or more subregions of the set, wherein include: the one or more of identical device number in the subregion
Characteristic, the characteristic described in the subregion sort from far near according to the time;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in institute
State what sequence in characteristic set recorded before the characteristic that the page access records and apart from nearest ad click
Characteristic.
Further, the step of traffic source type of the judgement page access record includes:
Following traffic source deterministic processes are executed to each characteristic in the characteristic set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature
Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited
The traffic source type for asking record is free flow;
Payment flow judgment step: if the characteristic of current page access record and its last ad click record
Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether
In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note
The traffic source type of record is payment flow.
Further, in the payment flow judgment step further include:
Judge whether current page access record is first in its session record, if so, the preset duration is
First duration, otherwise, the preset duration are the second duration, and second duration is greater than first duration.
The method of the traffic source of determining page access provided in an embodiment of the present invention, in the extraction first time period
Each page access record and second time period in each ad click record feature the step of before, also wrap
It includes: being each page access record in the first time period according to the access time of page access record and access equipment number
Session identification is added, so that the feature of page access record further includes the session identification, wherein have identical access equipment
Number and access time all page access record in preset third duration session identification it is identical;
It for whether page access record is the judgement of first in its session record is recorded according to page access
What the session identification in characteristic carried out.
Further, the traffic source deterministic process is sequence to each characteristic in the characteristic set
According to execution, the traffic source deterministic process further include:
Define the first variable and the second variable before the free flow judgment step, the first variable and bivariate
Original state is sky;
Selected step is executed before the free flow judgment step: if current signature data are ad click records
Characteristic is then this feature data the first variable replacement, and empties the second variable;
In the free flow judgment step and in the payment flow judgment step, indicate to work as using the first variable
The characteristic of the last ad click record of preceding characteristic,
In the payment flow judgment step, judge whether page access record is that first record in its session wraps
It includes: judging whether the session identification of current signature data is equal to whether the second variable and the second variable are sky, if current signature
The session identification of data is that sky then indicates the corresponding page access note of this feature data not equal to the second variable or the second variable
Record is first record in its session,
And working as the corresponding page access record of this feature data is that first in its session records and its is corresponding most
In the first duration of the click time of the characteristic of nearly ad click record before the access time of its characteristic,
It is then the session identification of this feature data the second variable replacement.
Optionally, the characteristic is four-tuple, and the four-tuple of the page access record is < access equipment number, is visited
It asks the time, page access record, empty>, the four-tuple of the ad click record is<pointing device number, clicks the time, it is empty, extensively
Announcement click record >,
The method also includes: after determining that the traffic source type of page access record is free flow, export binary
Group<traffic log, empty>, after determining the traffic source type of page access record for payment flow, export binary group<flow
Log, the last ad click record >.
Further, the step of traffic source type of the judgement page access record includes:
The characteristic set piecemeal, wherein each piecemeal includes one or more subregions,
It includes to the piecemeal that each characteristic in characteristic set, which executes traffic source deterministic process,
In each characteristic execute the traffic source deterministic process.
To achieve the above object, other side according to an embodiment of the present invention provides a kind of determining page access
The device of traffic source, comprising:
Record obtains module, for obtaining the record of one or more page access in first time period and the second time
One or more ad clicks record in section;
Characteristic extracting module, for extracting in the record of each page access in first time period and second time period
The feature of each ad click record, to obtain the characteristic of page access record and the characteristic of ad click record
According to, and obtain the characteristic set being made of all characteristics;
Judgment module, the characteristic pair for being recorded according to each page access in the characteristic set
The characteristic for the last ad click record answered judges the traffic source type of page access record.
Further, the feature for the page access record that the characteristic extracting module is extracted include: access time and
The feature of access equipment number, the ad click record that the characteristic extracting module is extracted includes: to click time and pointing device number;
Described device further include: sorting module, for judging the stream of page access record in the characteristic set
Before the step of measuring source type, the characteristic set is ranked up, to obtain one or more subregions of the set,
It wherein, include: the one or more features data of identical device number in the subregion, the characteristic described in the subregion is pressed
It sorts from far near according to the time;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in institute
State what sequence in characteristic set recorded before the characteristic that the page access records and apart from nearest ad click
Characteristic.
Further, the judgment module is further used for executing each characteristic in the characteristic set
Following traffic source deterministic processes:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature
Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited
The traffic source type for asking record is free flow;
Payment flow judgment step: if the characteristic of current page access record and its last ad click record
Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether
In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note
The traffic source type of record is payment flow.
Further, the judgment module be further used for judging current page access record whether be in its session the
One record, if so, the preset duration is the first duration, otherwise, the preset duration is the second duration, when described second
It is long to be greater than first duration.
The device of the traffic source of determining page access provided in an embodiment of the present invention, further includes: sessionizing module is used
In each ad click note in each page access record and second time period in the extraction first time period
It is in the first time period according to the access time of page access record and access equipment number before the step of feature of record
Each page access record addition session identification, so that the feature of page access record further include: the session identification,
In, there is identical access equipment number and the session identification phase of access time all page access records in preset third duration
Together;
The session identification that the judgment module is further used in the characteristic recorded according to page access judges the page
Interview asks whether record is first in its session record.
Further, the judgment module be further used for sequence to each characteristic in the characteristic set
According to the execution traffic source deterministic process;
The traffic source deterministic process that the judgment module executes further include: fixed before the free flow judgment step
Adopted first variable and the second variable, the first variable and bivariate original state are sky;In the free flow judgment step
Execute selected step before: if current signature data are the characteristics of ad click record, being the first variable replacement should
Characteristic, and empty the second variable;
The judgment module is further used for indicating the last advertisement point of current signature data using the first variable
Hit the characteristic of record;
The judgment module be further used for judging the session identification of current signature data whether be equal to the second variable and
Whether the second variable is sky, is indicated if being sky not equal to the second variable or the second variable if the session identification of current signature data
The corresponding page access record of this feature data is first record in its session, and works as the corresponding page of this feature data
Access record is the click of the characteristic of first record in its session and its corresponding the last ad click record
It is then the session of this feature data the second variable replacement in the first duration of the time before the access time of its characteristic
Mark.
Further, the characteristic is four-tuple, and the four-tuple of the page access record is < access equipment number,
Access time, page access record, empty>, the four-tuple of the ad click record is<pointing device number, clicks the time, it is empty,
Ad click record >,
The device of the traffic source of determining page access provided in an embodiment of the present invention, further includes: output module is used for
The traffic source type for determining page access record is output binary group<traffic log after free flow, empty>, determining the page
After the traffic source type of access record is payment flow, output binary group<traffic log, the last ad click record>.
To achieve the above object, other side according to an embodiment of the present invention provides a kind of determining page access
The electronic equipment of traffic source, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing
The method that device realizes the traffic source of determining page access provided in an embodiment of the present invention.
To achieve the above object, other side according to an embodiment of the present invention provides a kind of computer-readable medium,
It is stored thereon with computer program, determining page access provided in an embodiment of the present invention is realized when described program is executed by processor
Traffic source method.
The method and apparatus of the traffic source of determining page access provided by the invention, by page access record and advertisement point
It hits record and is converted to characteristic, then the characteristic set being made of the two is ranked up, the characteristic after sequence
According in set, the feature of its corresponding the last ad click record is found according to the feature that page access records, being based on should
The last ad click record judges the traffic source type of page access with the relationship of page access record.Relative to existing
There is technology by carrying out conjunctive query to traffic log and advertisement click logs, is then based on query result and carries out traffic source point
Analysis, analysis method of the invention is efficiently quick, and accurately can carry out duplicate removal to traffic log.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment
With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the method flow diagram of the traffic source of determining page access provided in an embodiment of the present invention;
Fig. 2 is the application flow schematic diagram of the method for the traffic source of determining page access provided in an embodiment of the present invention;
Fig. 3 is the schematic device of the traffic source of determining page access provided in an embodiment of the present invention;
Fig. 4 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
The embodiment of the present invention provides a kind of method of the traffic source of determining page access, in the present invention, comes from client
Html webpage content requests of end browser are looked at as a page access PV (page view), the stream of page access
Amount source refers to that the secondary page access is that user is accessed by what kind of access channel, therebetween may be by repeatedly jumping
Turn.
In the present invention, there are two types of types for the traffic source of page access, and one is free flows, refer to the access of the page
It is the visit carried out by the operator for website or using website platform for a side of specific business without generating expense
Ask what channel generated, for example, user is accessed specifically by the homepage or website search engine of website in electric business website
Item detail page, then this time is free flow to the traffic source type of the access of commodity details page.Another kind is paid for
Flow, with free flow on the contrary, the access of the i.e. page be generated by needing to generate the access channel of expense, for example, if with
Family is ultimately linked to commodity details page by pay-per-click advertisement, then traffic source type of this time to the access of commodity details page
As payment flow.
The method of the traffic source of determining page access provided in an embodiment of the present invention, as shown in Figure 1, comprising: step
S101, step S102 and step S103, this method determine that the traffic source type of page access is freely to flow through the above steps
Amount is also paid for flow.
In step s101, it obtains in one or more page access record and the second time period in first time period
One or more ad clicks record.Wherein, page access record can be obtained from traffic log, and ad click record can
To be obtained from advertisement click logs.Due in subsequent steps determine page access traffic source type when, the page visit
Ask that record needs to be associated with the ad click record of time before it, therefore in the present invention, second time period not only includes
First time period, a period of time before can also including first time period.For example, in this step, obtaining the complete of a certain day
Traffic log is measured, and obtains the advertisement click logs of the full dose of this day and the proxima luce (prox. luc) of this day simultaneously.
In step s 102, each in each page access record and second time period in extraction first time period
The feature of a ad click record, to obtain the characteristic of page access record and the characteristic of ad click record, and
Obtain the characteristic set being made of all characteristics.Then in step s 103, in characteristic set, according to every
The characteristic of the corresponding the last ad click record of the characteristic of one page access record judges the page access
The traffic source type of record.
The method of the traffic source of determining page access provided by the invention, passes through the page access in first time period
In the characteristic set of the characteristic composition of ad click record in the characteristic and second time period of record, utilize
The feature of page access record and ad click record knows the corresponding relationship of page access record and ad click record, thus
The traffic source type of page access record is judged.Compared with the existing technology by traffic log and ad click day
Will carries out conjunctive query, is then based on query result and carries out traffic source analysis, analysis method of the invention is efficiently quick.
In step S102 of the present invention, when extracting the feature that page access record and ad click record, page access note
The feature of record includes: access time and access equipment number, i.e., the user to access to the page access the page time and
Device number.The feature of ad click record includes: to click time and pointing device number, i.e., the user's point clicked to the advertisement
Hit time and the device number of the advertisement.
The method of the traffic source of determining page access provided by the invention further include: judge page in characteristic set
Before the step of traffic source type of record is asked in interview, i.e., after step s 102, before step S103, to characteristic
Set is ranked up, to obtain one or more subregions of the set, wherein include: in subregion one of identical device number or
Multiple characteristics, characteristic sorts from far near according to the time in subregion, that is to say, that the foundation of sequence is device number
And the time, device number subregion is first pressed, the characteristic with same device number is located in the same subregion, presses again in each subregion
Time is by as far as nearly sequence, the characteristic of morning time comes front.
In step s 103, the characteristic of the corresponding the last ad click record of the characteristic of page access record
According in characteristic set sequence before the characteristic that the page access record and apart from nearest ad click note
The characteristic of record.Since sequence has been carried out in characteristic set, it can be found according to the feature that page access records
The feature of corresponding the last time ad click record, since ad click record is to record it in page access in the ranking
It is preceding and can judge page access apart from nearest, therefore based on ad click record and the relationship of page access record
Traffic source type.
In the present invention, step S103 judges the traffic source class of page access record in characteristic set
Type includes: to execute following traffic source deterministic processes to each characteristic in characteristic set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature
Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited
The traffic source type for asking record is free flow.
Payment flow judgment step: if the characteristic of current page access record and its last ad click record
Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether
In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note
The traffic source type of record is payment flow.
That is, being the judgement of payment flow for the traffic source type of page access record, satisfaction is first had to
Condition is that the characteristic of page access record is identical as the device number of characteristic of its last ad click record, i.e.,
Show that same user has carried out ad click before carrying out page access.When the conditions are satisfied, it also needs to confirm this recently
Ad click record time whether page access record before preset duration in, the time of the page access of user
If the time for clicking advertisement apart from its last time is separated by too long, the traffic source type of the page access payment is not considered as
Flow.
In the present invention, in flow judgment step of paying further include: judge whether current page access record is its session
In first record (in i.e. same session the time it is earliest page access record), if so, preset duration be the first duration,
Otherwise, preset duration is the second duration, and the second duration is greater than the first duration.Page i.e. for being located at different location in same session
Record is asked in interview, when judging whether it meets the requirements with the time that corresponding the last ad click is separated by, using different
Judgment criteria.Due to the page access for the different location being located in a session, the same ad click may be will originate from, it should
Ad click can be considered as all page access that can be influenced in the session, and have same effect to all page access
Power, but the time that may be separated by between the page access record in same session is more long, therefore, for first in session
Page access record and positioned at other page access record and the last ad click record thereafter interval time whether
The first duration and the second duration is respectively adopted as judgment criteria in the judgement met the requirements.First duration and the second duration can
To be set according to specific demand.
The method of the traffic source of determining page access provided by the invention further include: every in first time period extracting
Before the feature of each ad click record in one page access record and second time period, i.e., after step slol
Before step S102, further includes:
According to each page access note that the access time of page access record and access equipment number are in first time period
Record addition session identification, so that the feature of page access record further include: session identification, wherein there is identical access equipment
Number and access time all page access record in preset 5th duration session identification it is identical, the 5th duration is used to cut
The duration of branch's words, can be set according to specific requirements, and by the duration, the page access record of identical access equipment number is drawn
It is divided into a session.
In payment flow judgment step in step s 103, for page access record whether be in its session first
A record is judged according to the session identification in the characteristic of page access record.
Below with reference to a specific embodiment to the method for the traffic source of determining page access provided by the invention into
Row further instruction.
In the present embodiment, in step s101, obtain the full dose traffic log of a certain day, and obtain simultaneously the day and
The advertisement click logs of the full dose of the proxima luce (prox. luc) of this day.Due in the subsequent deterministic process of present embodiment, by page access
Whether record, which can be associated with the record of the ad click before it in 15 minutes or 24 hours and be used as, judges that the page access records
Whether be pay flow condition.Therefore, the advertisement click logs obtained in this step are one more than corresponding traffic log
Day.
What the device number and access time for extracting all page access record in traffic log were recorded as page access
Feature extracts the device number of all ad clicks record in advertisement click logs and clicks what the time recorded as ad click
Feature, in the present embodiment, characteristic are four-tuple, in step s 102, by all page access record and advertisement point
It hits record and is separately converted to<device number, logging time, page access record, ad click record>joint four-tuple form.
The four-tuple of page access record is<access equipment number, access time, page access record, empty>, wherein the page
Access record itself is also used as a characteristic element, wherein having session identification, the 4th element of four-tuple is sky.For example, if
The four-tuple of all page access record obtained are as follows:
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>;
<name2,2016-06-19 11:00, visitlog3, NULL>;
<name3,2016-06-19 12:00, visitlog4, NULL>.
Wherein, name indicates access equipment number, is followed by access time, visitlog representation page access record, NULL
It is expressed as sky.
The four-tuple of ad click record is<pointing device number, clicks the time, empty, ad click record>.Wherein, advertisement
It clicks record itself and is also used as a characteristic element, it is corresponding with the four-tuple of page access record, the four of ad click record
The third element of tuple is sky, so that the format system of the four-tuple of page access record and the four-tuple of ad click record
One, the two data can be merged into unified structure.
Example in correspondence, if the four-tuple of all ad clicks record obtained are as follows:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name3,2016-06-19 10:00, NULL, adlog2>;
<name4,2016-06-20 10:00, NULL, adlog3>;
<name5,2016-06-21 10:00, NULL, adlog4>.
Wherein, name indicates pointing device number, is followed by and clicks the time, and NULL is expressed as sky, and adlog indicates ad click
Record.
As shown in Fig. 2, the four-tuple of the four-tuple of above-mentioned all page access records and ad click record is closed
And obtain four-tuple set:
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>;
<name2,2016-06-19 11:00, visitlog3, NULL>;
<name3,2016-06-19 12:00, visitlog4, NULL>;
<name1,2016-06-18 10:00, NULL, adlog1>;
<name3,2016-06-19 10:00, NULL, adlog2>;
<name4,2016-06-20 10:00, NULL, adlog3>;
<name5,2016-06-21 10:00, NULL, adlog4>.
Subregion and sequence are carried out to four-tuple set, make include: in scoring area identical device number one or more features
Data, characteristic sorts from far near according to the time in subregion.Ibid example obtains 5 subregions of above-mentioned four-tuple set:
Subregion 1:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>.
Subregion 2:
<name2,2016-06-19 11:00, visitlog3, NULL>.
Subregion 3:
<name3,2016-06-19 10:00, NULL, adlog2>;
<name3,2016-06-19 12:00, visitlog4, NULL>.
Subregion 4:
<name4,2016-06-20 10:00, NULL, adlog3>.
Subregion 5:
<name5,2016-06-21 10:00, NULL, adlog4>.
In the present embodiment, above-mentioned four-tuple set piecemeal, wherein each piecemeal includes one or more subregions,
It obtains:
Piecemeal 1:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>.
Piecemeal 2:
<name2,2016-06-19 11:00, visitlog3, NULL>.
Piecemeal 3:
<name3,2016-06-19 10:00, NULL, adlog2>;
<name3,2016-06-19 12:00, visitlog4, NULL>.
Piecemeal 4:
<name4,2016-06-20 10:00, NULL, adlog3>;
<name5,2016-06-21 10:00, NULL, adlog4>.
It may include multiple subregions in one piecemeal, the method for piecemeal can be determined according to specific demand, such as can
To carry out piecemeal according to initial, then one shares 26 piecemeals.Due in actual application, in characteristic set
Characteristic is often very more, after carrying out piecemeal to characteristic set in the present embodiment, subsequent traffic source judgement
Process parallel can carry out in each piecemeal, to accelerate the processing speed of analysis method of the present invention.
It certainly, can also be without piecemeal, directly to four-tuple for the four-tuple set after having carried out subregion and having sorted
Set carries out subsequent traffic source deterministic process, carries out traffic source deterministic process to four-tuple set and flows with to piecemeal
The principle of amount source deterministic process is consistent, and in the following the description of present embodiment, is sentenced with carrying out traffic source to piecemeal
Disconnected process is described in detail traffic source deterministic process.
In the present embodiment, traffic source deterministic process is executed to each four-tuple in piecemeal, defines the first variable
With the second variable, the first variable and bivariate original state are sky, and sequence executes stream to each four-tuple in piecemeal
Source deterministic process is measured, i.e., carries out the fast interior all four-tuples of traversal point in sequence.
By taking piecemeal 1 as an example:
Piecemeal 1:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:30, visitlog2, NULL>.
Firstly, the characteristic for executing the last ad click record selectes step:
If the ad click record adlog in current four-tuple is not sky, showing that current four-tuple is is ad click note
The characteristic of record is then the four-tuple the first variable replacement, and empties the second variable.
First four-tuple<name1 in piecemeal 1,2016-06-18 10:00, NULL, adlog1>meet above-mentioned item
Part, will be four-tuple<name1 the first variable replacement at this time, 2016-06-18 10:00, NULL, adlog1>, and empty the
Two variables terminate the traffic source deterministic process to the four-tuple.Then sequence by next four-tuple in piecemeal 1 <
Name1,2016-06-18 10:10, visitlog1, the current four-tuple of NULL > conduct repeat traffic source and judged
Journey.
In subsequent free flow judgment step and in payment flow judgment step, indicated by the first variable current
The characteristic of the last ad click record of characteristic.
To current four-tuple<name1, the last advertisement point of 2016-06-18 10:10, visitlog1, NULL>execution
The characteristic for hitting record selectes step, and judgement learns that the four-tuple is unsatisfactory for condition, and free flow judgement is then executed to it
Step: if current four-tuple page access record visitlog is not sky, show that the four-tuple is the spy for being page access record
Data are levied, then judge whether the four-tuple is identical as the device number of the four-tuple of its last ad click record, if it is different,
The traffic source type for then determining page access record is free flow.In the traffic source type for determining page access record
After free flow, output binary group<traffic log, empty>.
Current four-tuple<name1,2016-06-18 10:10, visitlog1, NULL>device number and it is the last
The device number of the four-tuple (i.e. the first variable) of ad click record is identical, is unsatisfactory for the judgement item of free flow judgment step
Part continues to carry out it following payment flow judgment steps.
In payment flow judgment step, judge whether the corresponding page access record of current four-tuple is in its session
First record, specifically: judge the session identification of current four-tuple whether be equal to the second variable and the second variable whether be
Sky indicates the corresponding page of the four-tuple if being sky not equal to the second variable or the second variable if the session identification of current four-tuple
Interview asks that record is time in its session earliest record, otherwise indicates that the corresponding page access record of the four-tuple is not it
Time in session earliest record.
If current page access record is first record in its session, the last ad click record is judged
Whether the click time of characteristic (the first variable) in the first duration before the access time of current four-tuple, in this reality
It applies in mode, the first duration is selected as 15 minutes, if, it is determined that the traffic source type of page access record is payment stream
Amount, output binary group<traffic log, the last ad click record>, and be the meeting of this feature data the second variable replacement
Words mark.
If current page access record is not first record in its session, judge that the last time ad click is remembered
Whether the click time of the characteristic (the first variable) of record in the second duration before the access time of current four-tuple,
In present embodiment, the second duration is selected as 24 hours, if, it is determined that the traffic source type of page access record is payment
Flow, output binary group<traffic log, the last ad click record>.
For current four-tuple<name1,2016-06-18 10:10, visitlog1, NULL>payment flow judge step
In rapid, the second variable is sky, determines that the corresponding page access record of the four-tuple is time in its session earliest record, and
The click time of the characteristic (the first variable) of the last ad click record is before the access time of current four-tuple
15 minutes in, determine the traffic source type of the four-tuple corresponding page access record for payment flow, output binary group <
Visitlog1, adlog1 >, terminate the traffic source deterministic process for the current four-tuple.
Then sequence by the next four-tuple < name1,2016-06-18 10:20, visitlog2 in piecemeal 1,
The current four-tuple of NULL > conduct, repeats traffic source deterministic process.The current four-tuple is unsatisfactory for the last advertisement point
The characteristic for hitting record selectes the condition of step and free flow judgment step, and payment flow judgment step is executed to it.This
When, the second variable be four-tuple<name1,2016-06-18 10:10, visitlog1, NULL>, judge the meeting of current four-tuple
Words identify whether to be equal to bivariate session identification, it is assumed that the two is equal, then it represents that the corresponding page access of current four-tuple
Record is not the earliest record of the time in its session.And the characteristic (the first variable) of the last ad click record
It clicks in 24 hours before the access time of current four-tuple of time, determines the corresponding page access record of current four-tuple
Traffic source type be payment flow, export binary group<visitlog2, adlog1>.To complete the flow for piecemeal 1
Source deterministic process.
The binary group data exported by the method for the invention can carry out the flow of each page access in traffic log
Source Type distinguishes, and since each user per second is up to an access log, pass through what is recorded in binary group
Access time and access equipment number accurately can carry out duplicate removal to the page access in traffic log.
The method of the traffic source of determining page access provided by the invention further includes following entrance determination steps:
When the traffic source type of the current four-tuple of determination corresponding page access record is free flow, and the page is visited
What is asked is initiated by level-one entrance or secondary inlet, it is determined that the traffic ingress of the page access be corresponding level-one entrance or
The title of secondary inlet, wherein level-one entrance is the name of tv column of website homepage, and secondary inlet is that the column of homepage is once jumped
Turn the landing page that can be arrived.For the flow of level-one and secondary inlet can be classified as simultaneously, preferentially incorporate into level-one entrance.In this hair
In bright, homepage is had by oneself by (PC, APP, mobile terminal browser etc.) to various clients and bury a little, with what is clicked to user
Column is identified, so that it is determined that whether the page access of user is to be initiated by level-one entrance or secondary inlet.
In the present invention, each session is made of the page jumped in order, when a session is led to by user
Cross what level-one entrance or secondary inlet were initiated, the traffic ingress of all page access (PV) in this session is entry name
Claim.
When the traffic source type of the corresponding page access record of the current four-tuple of determination is payment flow, it is determined that the page
The traffic ingress that interview is asked is the title of corresponding advertisement type, that is, the last ad click record in the binary group exported
The title of advertisement type.
The method of the traffic source of determining page access provided by the invention realizes following pairs of payment traffic source types
Decision logic: have any ad click in the preceding first time period (such as 15 minutes) of the page access in a certain session, should
Page access incorporates the advertising channel of user's click into after ad click in session.Each page access possesses unique advertisement
Channel is subject to if being associated with multiple ad click at a distance of nearest.The ad click occurred in session only influences extensively
Session where accusing, the ad click occurred outside session only influence a subsequent session.For that can be classified as paying simultaneously
The page access of channel and free channel preferentially incorporates into as flow of paying.
In practical applications, after determining the traffic source type of page access, different flow source can accordingly be carried out
The traffic statistics of type, and corresponding order volume is combined, the conversion ratio of flow is calculated, thus to payment flow and free flow
Effect is reasonably assessed.
The order volume for obtaining corresponding source can be counted by the judgement in order source, to carry out the conversion of above-mentioned flow
Rate calculates, and is briefly introduced the decision procedure in order source below.Order source is divided into payment and free source.When one
When order can be classified as payment source and free source simultaneously, payment source is preferentially incorporated into.
The judgement in payment order source is realized by advertisement merchandiser, as user buys commodity for the previous period (such as 15 days)
The a certain product line advertisement that landing page is the affiliated three-level classification+brand article of the commodity was inside clicked, which is denoted as the advertisement
The affiliated order of product line.The advertisement that multiple different product line was clicked in such as this time, by away from the recent advertisement that places an order
It clicks and order is divided to a certain advertised product line.
The judgement in free order source is obtained by the association detailed access to web page of quotient with order row.By a certain SKU (minimum category list
Member) (standardized product unit refers to same by the page access PV and same SPU in a period of time (such as 24 hours) thereafter of commodity details page
Style commodity, such as certain Mobile phone are a SPU, two SKU with the golden version and silver color version of capacity), with shop where SPU
Paving is associated with three-level classification goods orders row where SPU, is incorporated order to a certain traffic ingress into using the entrance of PV.
It stands the judgement of external flux source: being linked, sentenced by the upper hop of first PV in identification station external flux bring session
Which home Web site PV in disconnected session derives from, it will the traffic source of all PV in words incorporates this channel into.
The method of the traffic source of determining page access provided by the invention records page access record and ad click
Characteristic is converted to, then the characteristic set being made of the two is ranked up, the characteristic set after sequence
In, the feature of its corresponding the last ad click record is found according to the feature that page access records, based on this nearest one
Secondary ad click record judges the traffic source type of page access with the relationship of page access record.Compared with the existing technology
By carrying out conjunctive query to traffic log and advertisement click logs, it is then based on query result and carries out traffic source analysis, originally
The analysis method of invention is efficiently quick, and accurately can carry out duplicate removal to traffic log.
The embodiment of the present invention also provides a kind of device of the traffic source of determining page access, as shown in figure 3, the device
300 include: that record obtains module 301, characteristic extracting module 302 and judgment module 303.
When record obtains module 301 for obtaining the record of one or more page access in first time period and second
Between one or more ad clicks record in section.
Characteristic extracting module 302 is used to extract in each page access record and the second time period in first time period
Each ad click record feature, with obtain page access record characteristic and ad click record characteristic
According to, and obtain the characteristic set being made of all characteristics.
Judgment module 303 is used in characteristic set, and the characteristic according to each page access record is corresponding
The last ad click record characteristic judge the page access record traffic source type.
The feature for the page access record that characteristic extracting module 302 is extracted includes: access time and access equipment number, feature
The feature for the ad click record that extraction module extracts includes: to click time and pointing device number.
The device of the traffic source of determining page access provided in an embodiment of the present invention further include: sorting module, sort mould
Before block is used to judge in characteristic set the step of traffic source type of page access record, to characteristic set
It is ranked up, to obtain one or more subregions of the set, wherein include: the one or more of identical device number in subregion
Characteristic, characteristic sorts from far near according to the time in subregion.
The characteristic of the corresponding the last ad click record of the characteristic of page access record is in characteristic
The characteristic recorded before the characteristic that the page access records and apart from nearest ad click according to sequence in set.
Judgment module 303 is further used for sentencing the following traffic sources of each characteristic execution in characteristic set
Disconnected process:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature
Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited
The traffic source type for asking record is free flow.
Payment flow judgment step: if the characteristic of current page access record and its last ad click record
Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether
In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note
The traffic source type of record is payment flow.
Judgment module 303 is further used for judging whether current page access record is first record in its session,
If so, preset duration is the first duration, otherwise, preset duration is the second duration, and the second duration is greater than the first duration.
The device of the traffic source of determining page access provided in an embodiment of the present invention further include: sessionizing module, meeting
Division module is talked about for wide in each page access record in extraction first time period and each in second time period
It is every in first time period according to the access time of page access record and access equipment number before accusing the feature for clicking record
A page access record addition session identification, so that the feature of page access record further include: session identification, wherein have
The session identification of identical access equipment number and access time all page access records in preset third duration is identical.
The session identification that judgment module 303 is further used in the characteristic recorded according to page access judges the page
Whether access record is first in its session record.
Judgment module 303 is further used for defining the first variable and the second variable, the first variable and the bivariate initial
State is sky.
The judgment module be further used for sequence in the characteristic set each characteristic execute institute
State traffic source deterministic process, the traffic source deterministic process further include: execute before the free flow judgment step
The last ad click record characteristic select step: if current signature data be ad click record characteristic
According to, then it is this feature data the first variable replacement, and empty the second variable, it is in the free flow judgment step and described
In flow judgment step of paying, the feature of the last ad click record of current signature data is indicated by the first variable
Data.
Judgment module 303 is further used in the payment flow judgment step, judges the session of current signature data
It identifies whether to be equal to whether the second variable and the second variable are empty, if the session identification of current signature data becomes not equal to second
Amount or the second variable are that sky then indicates that this feature data corresponding page access record is time in its session earliest note
Otherwise record indicates that this feature data corresponding page access record is not time in its session earliest record, and when should
Characteristic corresponding page access record is the earliest record of the time in its session and its corresponding the last advertisement point
It hits in the first duration of the click time of the characteristic of record before the access time of its characteristic, then the second variable
Replace with the session identification of this feature data.
In the present invention, characteristic is four-tuple, and the four-tuple of page access record is < access equipment number, when access
Between, page access record, empty>, the four-tuple of ad click record is<pointing device number, clicks the time, empty, ad click note
Record >.
The device of the traffic source of determining page access provided in an embodiment of the present invention further include: output module is used for
The traffic source type for determining page access record is output binary group<traffic log after free flow, empty>, determining the page
After the traffic source type of access record is payment flow, output binary group<traffic log, the last ad click record>.
Judgment module 303 is further used for characteristic set piecemeal, wherein each piecemeal includes one or more points
Area executes traffic source deterministic process to each characteristic in characteristic set and refers to each characteristic in piecemeal
According to execution traffic source deterministic process.
The device of the traffic source of determining page access provided by the invention records page access record and ad click
Characteristic is converted to, then the characteristic set being made of the two is ranked up, the characteristic set after sequence
In, the feature of its corresponding the last ad click record is found according to the feature that page access records, based on this nearest one
Secondary ad click record judges the traffic source type of page access with the relationship of page access record.Compared with the existing technology
By carrying out conjunctive query to traffic log and advertisement click logs, it is then based on query result and carries out traffic source analysis, originally
The analysis method of invention is efficiently quick, and accurately can carry out duplicate removal to traffic log.
Below with reference to Fig. 4, it illustrates the computer systems 400 for the electronic equipment for being suitable for being used to realize the embodiment of the present invention
Structural schematic diagram.Electronic equipment shown in Fig. 4 is only an example, function to the embodiment of the present invention and should not use model
Shroud carrys out any restrictions.
As shown in figure 4, computer system 400 includes central processing unit (CPU) 401, it can be read-only according to being stored in
Program in memory (ROM) 402 or be loaded into the program in random access storage device (RAM) 403 from storage section 408 and
Execute various movements appropriate and processing.In RAM 403, also it is stored with system 400 and operates required various programs and data.
CPU 401, ROM 402 and RAM 403 are connected with each other by bus 404.Input/output (I/O) interface 405 is also connected to always
Line 404.
I/O interface 405 is connected to lower component: the importation 406 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 407 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 408 including hard disk etc.;
And the communications portion 409 of the network interface card including LAN card, modem etc..Communications portion 409 via such as because
The network of spy's net executes communication process.Driver 410 is also connected to I/O interface 405 as needed.Detachable media 411, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 410, in order to read from thereon
Computer program be mounted into storage section 408 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.?
In such embodiment, which can be downloaded and installed from network by communications portion 409, and/or from can
Medium 411 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 401, system of the invention is executed
The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
It includes record and obtains module, characteristic extracting module and judgment module.Wherein, the title of these modules is not constituted under certain conditions
Restriction to the module itself, for example, characteristic extracting module is also described as " for the characteristic set piecemeal
Module ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
Obtaining the equipment includes:
It obtains one or more page access record in first time period and the one or more in second time period is wide
It accuses and clicks record;
Extract each page access record in first time period and each ad click note in second time period
The feature of record to obtain the characteristic of page access record and the characteristic of ad click record, and is obtained by all spies
Levy the characteristic set of data composition;
In the characteristic set, the characteristic recorded according to each page access is corresponding the last wide
Accuse the traffic source type that the characteristic clicked and recorded judges page access record.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.
Claims (17)
1. a kind of method of the traffic source of determining page access characterized by comprising
Obtain one or more page access record in first time period and one or more advertisement points in second time period
Hit record;
Extract each page access record in first time period and each ad click record in second time period
Feature to obtain the characteristic of page access record and the characteristic of ad click record, and is obtained by all characteristics
According to the characteristic set of composition;
In the characteristic set, according to the corresponding the last advertisement point of the characteristic of each page access record
The characteristic for hitting record judges the traffic source type of page access record.
2. the method according to claim 1, wherein
The feature of the page access record includes: access time and access equipment number;The feature of ad click record includes: a little
Hit time and pointing device number;
It is described judgement page access record traffic source type the step of before, the characteristic set is arranged
Sequence, to obtain one or more subregions of the set, wherein include: the one or more spy of identical device number in the subregion
Data are levied, the characteristic described in the subregion sorts from far near according to the time;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in the spy
Levy the feature that sequence records before the characteristic that the page access records and apart from nearest ad click in data acquisition system
Data.
3. according to the method described in claim 2, it is characterized in that, the traffic source type of the judgement page access record
Step includes:
Following traffic source deterministic processes are executed to each characteristic in the characteristic set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature data
It is whether identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page access is remembered
The traffic source type of record is free flow;
Payment flow judgment step: if the spy that the characteristic of current page access record and its last ad click record
Whether the device number for levying data is identical, then judge the click time of the characteristic of the last time ad click record current
In preset duration before the access time of the characteristic of page access record, if, it is determined that page access record
Traffic source type is payment flow.
4. according to the method described in claim 3, it is characterized in that, in the payment flow judgment step further include:
Judge whether current page access record is first in its session record, if so, the preset duration is first
Duration, otherwise, the preset duration are the second duration, and second duration is greater than first duration.
5. according to the method described in claim 4, it is characterized in that,
Each ad click in each page access record and second time period in the extraction first time period
Before the step of feature of record, further includes: the access time and access equipment number recorded according to page access is described first
Each page access record addition session identification in period, so that the feature of page access record further includes the session
Mark, wherein there is identical access equipment number and the meeting of access time all page access records in preset third duration
Words mark is identical;
It is the feature recorded according to page access for whether page access record is the judgement of first in its session record
What the session identification in data carried out.
6. according to the method described in claim 5, it is characterized in that,
The traffic source deterministic process is executing to each characteristic in the characteristic set for sequence, described
Traffic source deterministic process further include:
Define the first variable and the second variable before the free flow judgment step, the first variable and the bivariate initial
State is sky;
Selected step is executed before the free flow judgment step: if current signature data are the features of ad click record
Data are then this feature data the first variable replacement, and empty the second variable;
In the free flow judgment step and in the payment flow judgment step, current spy is indicated using the first variable
The characteristic of the last ad click record of data is levied,
In the payment flow judgment step, judge whether page access record is that first record in its session includes:
Judge whether the session identification of current signature data is equal to whether the second variable and the second variable are sky, if current signature data
Session identification not equal to the second variable or the second variable be that sky then indicates that the corresponding page access record of this feature data is
First record in its session,
And when this feature data corresponding page access record be first record in its session and its corresponding nearest one
In the first duration of the click time of the characteristic of secondary ad click record before the access time of its characteristic, then
Second variable replacement is the session identification of this feature data.
7. method according to any one of claim 3 to 6, which is characterized in that the characteristic is four-tuple, described
The four-tuple of page access record is<access equipment number, access time, page access record, empty>, the ad click record
Four-tuple be<pointing device number, click the time, it is empty, ad click record>,
The method also includes: after determining that the traffic source type of page access record is free flow, export binary group < stream
Amount log, empty>, after determining the traffic source type of page access record for payment flow, output binary group<traffic log,
The last ad click record >.
8. method according to any one of claim 3 to 6, which is characterized in that the stream of the judgement page access record
Measure source type the step of include:
The characteristic set piecemeal, wherein each piecemeal includes one or more subregions,
It includes in the piecemeal that each characteristic in characteristic set, which executes traffic source deterministic process,
Each characteristic executes the traffic source deterministic process.
9. a kind of device of the traffic source of determining page access characterized by comprising
Record obtains module, for obtaining in the record of one or more page access in first time period and second time period
One or more ad clicks record;
Characteristic extracting module, it is each in the record of each page access in first time period and second time period for extracting
The feature of a ad click record, to obtain the characteristic of page access record and the characteristic of ad click record, and
Obtain the characteristic set being made of all characteristics;
Judgment module, in the characteristic set, the characteristic recorded according to each page access to be corresponding
The characteristic of the last ad click record judges the traffic source type of page access record.
10. device according to claim 9, which is characterized in that the page access that the characteristic extracting module is extracted
The feature of record includes: access time and access equipment number, the feature for the ad click record that the characteristic extracting module is extracted
It include: to click time and pointing device number;
Described device further include: sorting module, for judging that the flow of page access record comes in the characteristic set
Before the step of Source Type, the characteristic set is ranked up, to obtain one or more subregions of the set,
In, include: the one or more features data of identical device number in the subregion, the characteristic described in the subregion according to
Time sorts from far near;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in the spy
Levy the feature that sequence records before the characteristic that the page access records and apart from nearest ad click in data acquisition system
Data.
11. device according to claim 10, which is characterized in that the judgment module is further used for the characteristic
Following traffic source deterministic processes are executed according to each characteristic in set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature data
It is whether identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page access is remembered
The traffic source type of record is free flow;
Payment flow judgment step: if the spy that the characteristic of current page access record and its last ad click record
Whether the device number for levying data is identical, then judge the click time of the characteristic of the last time ad click record current
In preset duration before the access time of the characteristic of page access record, if, it is determined that page access record
Traffic source type is payment flow.
12. device according to claim 11, which is characterized in that the judgment module is further used for judging current page
Whether access record is first in its session record, if so, the preset duration is the first duration, it is otherwise, described pre-
If when a length of second duration, second duration is greater than first duration.
13. device according to claim 12, which is characterized in that further include: sessionizing module, in the extraction
The step of the feature of each page access record in first time period and each ad click record in second time period
It is each page access in the first time period according to the access time of page access record and access equipment number before rapid
Record addition session identification, so that the feature of page access record further include: the session identification, wherein there is identical visit
It asks device number and the session identification of access time all page access records in preset third duration is identical;
The session identification that the judgment module is further used in the characteristic recorded according to page access judges that the page is visited
Ask whether record is first in its session record.
14. device according to claim 13, which is characterized in that the judgment module be further used for sequence to described
Each characteristic in characteristic set executes the traffic source deterministic process;
The traffic source deterministic process that the judgment module executes further include: the is defined before the free flow judgment step
One variable and the second variable, the first variable and bivariate original state are sky;Before the free flow judgment step
It executes selected step: being this feature the first variable replacement if current signature data are the characteristics of ad click record
Data, and empty the second variable;
The judgment module is further used for indicating the last ad click note of current signature data using the first variable
The characteristic of record;
The judgment module is further used for judging whether the session identification of current signature data is equal to the second variable and second
Whether variable is sky, indicates the spy if being sky not equal to the second variable or the second variable if the session identification of current signature data
The corresponding page access record of sign data is first record in its session, and works as the corresponding page access of this feature data
Record is the click time of the characteristic of first record in its session and its corresponding the last ad click record
It is then the session mark of this feature data the second variable replacement in the first duration before the access time of its characteristic
Know.
15. device described in 1 or 14 according to claim 1, which is characterized in that it is characterized in that, the characteristic is quaternary
Group, the four-tuple of page access record are<access equipment number, access time, page access record, empty>, the advertisement point
The four-tuple for hitting record is<pointing device number, clicks the time, empty, ad click record>,
Described device further include: output module, for determine page access record traffic source type be free flow after,
Output binary group<traffic log, empty>, after determining the traffic source type of page access record for payment flow, output binary
Group<traffic log, the last ad click record>.
16. a kind of electronic equipment of the traffic source of determining page access characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method described in any one of claims 1-8.
17. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
Such as method described in any one of claims 1-8 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711205737.7A CN110020364B (en) | 2017-11-27 | 2017-11-27 | Method and device for determining flow source of page access |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711205737.7A CN110020364B (en) | 2017-11-27 | 2017-11-27 | Method and device for determining flow source of page access |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110020364A true CN110020364A (en) | 2019-07-16 |
CN110020364B CN110020364B (en) | 2021-11-30 |
Family
ID=67186617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711205737.7A Active CN110020364B (en) | 2017-11-27 | 2017-11-27 | Method and device for determining flow source of page access |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110020364B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112307370A (en) * | 2020-10-26 | 2021-02-02 | 银盛通信有限公司 | Order source tracking method and system |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101799830A (en) * | 2010-03-25 | 2010-08-11 | 北京国双科技有限公司 | Flow data processing method capable of realizing multi-dimensional free analysis |
CN102411573A (en) * | 2010-09-20 | 2012-04-11 | 百度在线网络技术(北京)有限公司 | Method and system for acquiring information based on behavior of webpage visitor in webpage |
CN102684925A (en) * | 2012-05-24 | 2012-09-19 | 北京国双科技有限公司 | Method and device for acquiring internet access source information |
CN102999572A (en) * | 2012-11-09 | 2013-03-27 | 同济大学 | User behavior mode digging system and user behavior mode digging method |
CN104346374A (en) * | 2013-07-31 | 2015-02-11 | 阿里巴巴集团控股有限公司 | Data processing method and system |
CN104426713A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(北京)有限公司 | Method and device for monitoring network site access effect data |
CN105357054A (en) * | 2015-11-26 | 2016-02-24 | 上海晶赞科技发展有限公司 | Website traffic analysis method and apparatus, and electronic equipment |
CN105450460A (en) * | 2014-06-03 | 2016-03-30 | 北京奇虎科技有限公司 | Network operation recording method and system |
US20160117297A1 (en) * | 2014-10-24 | 2016-04-28 | POWr Inc. | Systems and methods for dynamic, real time management of cross-domain web content |
CN105677866A (en) * | 2016-01-08 | 2016-06-15 | 车智互联(北京)科技有限公司 | Content conversion tracing method, device and system and conversion server |
CN105718184A (en) * | 2014-12-05 | 2016-06-29 | 北京搜狗科技发展有限公司 | Data processing method and apparatus |
CN105989002A (en) * | 2015-01-27 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Webpage data query method and device, and method and device for establishing webpage jump path database |
CN106897196A (en) * | 2015-12-17 | 2017-06-27 | 北京国双科技有限公司 | The determination method and device of access path between Website page |
-
2017
- 2017-11-27 CN CN201711205737.7A patent/CN110020364B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101799830A (en) * | 2010-03-25 | 2010-08-11 | 北京国双科技有限公司 | Flow data processing method capable of realizing multi-dimensional free analysis |
CN102411573A (en) * | 2010-09-20 | 2012-04-11 | 百度在线网络技术(北京)有限公司 | Method and system for acquiring information based on behavior of webpage visitor in webpage |
CN102684925A (en) * | 2012-05-24 | 2012-09-19 | 北京国双科技有限公司 | Method and device for acquiring internet access source information |
CN102999572A (en) * | 2012-11-09 | 2013-03-27 | 同济大学 | User behavior mode digging system and user behavior mode digging method |
CN104346374A (en) * | 2013-07-31 | 2015-02-11 | 阿里巴巴集团控股有限公司 | Data processing method and system |
CN104426713A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(北京)有限公司 | Method and device for monitoring network site access effect data |
CN105450460A (en) * | 2014-06-03 | 2016-03-30 | 北京奇虎科技有限公司 | Network operation recording method and system |
US20160117297A1 (en) * | 2014-10-24 | 2016-04-28 | POWr Inc. | Systems and methods for dynamic, real time management of cross-domain web content |
CN105718184A (en) * | 2014-12-05 | 2016-06-29 | 北京搜狗科技发展有限公司 | Data processing method and apparatus |
CN105989002A (en) * | 2015-01-27 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Webpage data query method and device, and method and device for establishing webpage jump path database |
CN105357054A (en) * | 2015-11-26 | 2016-02-24 | 上海晶赞科技发展有限公司 | Website traffic analysis method and apparatus, and electronic equipment |
CN106897196A (en) * | 2015-12-17 | 2017-06-27 | 北京国双科技有限公司 | The determination method and device of access path between Website page |
CN105677866A (en) * | 2016-01-08 | 2016-06-15 | 车智互联(北京)科技有限公司 | Content conversion tracing method, device and system and conversion server |
Non-Patent Citations (2)
Title |
---|
YUHAO ZHU 等: "Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing", 《IEEE COMPUTER ARCHITECTURE LETTERS》 * |
郭绍华: "电子商务平台中流量统计模块的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112307370A (en) * | 2020-10-26 | 2021-02-02 | 银盛通信有限公司 | Order source tracking method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110020364B (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832468B (en) | Demand recognition methods and device | |
CN108664513B (en) | Method, device and equipment for pushing keywords | |
KR100792697B1 (en) | Method and system for decideing advertising fee | |
CN108985809A (en) | Motivate method, apparatus, electronic equipment and the storage medium of push | |
CN108632311A (en) | Information-pushing method and device | |
CN106911801A (en) | The method and information transmission system of association user information | |
CN109388548A (en) | Method and apparatus for generating information | |
CN110298716A (en) | Information-pushing method and device | |
CN108932625A (en) | Analysis method, device, medium and the electronic equipment of user behavior data | |
CN110020143A (en) | A kind of landing page generation method and device | |
CN107885784A (en) | The method and apparatus for extracting user characteristic data | |
CN109040000A (en) | IP address-based user identification method and system | |
CN108737486A (en) | Information-pushing method and device | |
KR20160070282A (en) | Providing system and method for shopping mall web site, program and recording medium thereof | |
CN109190027A (en) | Multi-source recommended method, terminal, server, computer equipment, readable medium | |
KR20150102266A (en) | Travel content providing system using big data | |
CN111626767A (en) | Resource data distribution method, device and equipment | |
CN110020131B (en) | Method and device for arranging commodities | |
CN110035053A (en) | For detecting user-content provider couple method and system of fraudulent | |
KR101083002B1 (en) | Method and Server apparatus for calculating User Conversion Rate | |
CN110276566A (en) | Information output method and device | |
CN110362583A (en) | A kind of data processing method and device for multi-data source | |
CN110020364A (en) | The method and apparatus for determining the traffic source of page access | |
CN113327146A (en) | Information tracking method and device | |
KR100462294B1 (en) | Method and System for Providing Information on Article of Commerce |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |