CN110020364A - The method and apparatus for determining the traffic source of page access - Google Patents

The method and apparatus for determining the traffic source of page access Download PDF

Info

Publication number
CN110020364A
CN110020364A CN201711205737.7A CN201711205737A CN110020364A CN 110020364 A CN110020364 A CN 110020364A CN 201711205737 A CN201711205737 A CN 201711205737A CN 110020364 A CN110020364 A CN 110020364A
Authority
CN
China
Prior art keywords
record
characteristic
page access
click
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711205737.7A
Other languages
Chinese (zh)
Other versions
CN110020364B (en
Inventor
赵鹏程
钟雨
崔波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201711205737.7A priority Critical patent/CN110020364B/en
Publication of CN110020364A publication Critical patent/CN110020364A/en
Application granted granted Critical
Publication of CN110020364B publication Critical patent/CN110020364B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses the method and apparatus for the traffic source for determining page access, are related to field of computer technology.One specific embodiment of this method includes: to obtain one or more page access record in first time period and one or more ad clicks record in second time period;Extract the feature of each page access record in first time period and each ad click record in second time period, to obtain the characteristic of page access record and the characteristic of ad click record, and obtain the characteristic set being made of all characteristics;In the characteristic set, the characteristic of the corresponding the last ad click record of the characteristic recorded according to each page access judges the traffic source type of page access record.The embodiment is efficiently quick, and accurately can carry out duplicate removal to traffic log.

Description

The method and apparatus for determining the traffic source of page access
Technical field
The present invention relates to field of computer technology more particularly to a kind of methods and dress of the traffic source of determining page access It sets.
Background technique
Website traffic refers to the amount of access of website, is browsed for describing the number of users and user of one website of access The indexs such as webpage quantity.In order to improve website traffic, e-commerce platform is usually drained using multiple channel, such as according to By the displaying payment drainage way such as advertisement and search advertisements, or freely using the displaying floor of electric business platform and marketing activity etc. Drainage way.In order to guarantee flow good conversion while promoting website traffic, supplier needs the flow to each channel The quality of the flow of overall contribution and each channel is evaluated, this just needs to establish comprehensive, unified flow channel body System is reasonably divided and is counted to each channel bring flow and conversion, and its premise is exactly to access Website page It is analyzed in the source of flow.
The method that the existing source to page access flow is analyzed is by data warehouse to traffic log and wide It accuses click logs and carries out conjunctive query, the determination of subsequent traffic source is then carried out by query result.
In realizing process of the present invention, at least there are the following problems in the prior art: the side of the prior art for inventor's discovery For method when handling billions of traffic logs and several hundred million advertisement click logs, speed is slower, and efficiency is lower.
Therefore, a kind of method and apparatus of traffic source for efficiently quickly determining Website page access are needed.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus of the traffic source of determining Website page access, It efficiently can quickly handle a large amount of daily record data.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of stream of determining page access is provided The method for measuring source, comprising:
It obtains one or more page access record in first time period and the one or more in second time period is wide It accuses and clicks record;
Extract each page access record in first time period and each ad click note in second time period The feature of record to obtain the characteristic of page access record and the characteristic of ad click record, and is obtained by all spies Levy the characteristic set of data composition;
In the characteristic set, the characteristic recorded according to each page access is corresponding the last wide Accuse the traffic source type that the characteristic clicked and recorded judges page access record.
Further, the feature of the page access record includes: access time and access equipment number;Ad click record Feature include: to click time and pointing device number;
It is described judgement page access record traffic source type the step of before, to the characteristic set carry out Sequence, to obtain one or more subregions of the set, wherein include: the one or more of identical device number in the subregion Characteristic, the characteristic described in the subregion sort from far near according to the time;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in institute State what sequence in characteristic set recorded before the characteristic that the page access records and apart from nearest ad click Characteristic.
Further, the step of traffic source type of the judgement page access record includes:
Following traffic source deterministic processes are executed to each characteristic in the characteristic set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited The traffic source type for asking record is free flow;
Payment flow judgment step: if the characteristic of current page access record and its last ad click record Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note The traffic source type of record is payment flow.
Further, in the payment flow judgment step further include:
Judge whether current page access record is first in its session record, if so, the preset duration is First duration, otherwise, the preset duration are the second duration, and second duration is greater than first duration.
The method of the traffic source of determining page access provided in an embodiment of the present invention, in the extraction first time period Each page access record and second time period in each ad click record feature the step of before, also wrap It includes: being each page access record in the first time period according to the access time of page access record and access equipment number Session identification is added, so that the feature of page access record further includes the session identification, wherein have identical access equipment Number and access time all page access record in preset third duration session identification it is identical;
It for whether page access record is the judgement of first in its session record is recorded according to page access What the session identification in characteristic carried out.
Further, the traffic source deterministic process is sequence to each characteristic in the characteristic set According to execution, the traffic source deterministic process further include:
Define the first variable and the second variable before the free flow judgment step, the first variable and bivariate Original state is sky;
Selected step is executed before the free flow judgment step: if current signature data are ad click records Characteristic is then this feature data the first variable replacement, and empties the second variable;
In the free flow judgment step and in the payment flow judgment step, indicate to work as using the first variable The characteristic of the last ad click record of preceding characteristic,
In the payment flow judgment step, judge whether page access record is that first record in its session wraps It includes: judging whether the session identification of current signature data is equal to whether the second variable and the second variable are sky, if current signature The session identification of data is that sky then indicates the corresponding page access note of this feature data not equal to the second variable or the second variable Record is first record in its session,
And working as the corresponding page access record of this feature data is that first in its session records and its is corresponding most In the first duration of the click time of the characteristic of nearly ad click record before the access time of its characteristic, It is then the session identification of this feature data the second variable replacement.
Optionally, the characteristic is four-tuple, and the four-tuple of the page access record is < access equipment number, is visited It asks the time, page access record, empty>, the four-tuple of the ad click record is<pointing device number, clicks the time, it is empty, extensively Announcement click record >,
The method also includes: after determining that the traffic source type of page access record is free flow, export binary Group<traffic log, empty>, after determining the traffic source type of page access record for payment flow, export binary group<flow Log, the last ad click record >.
Further, the step of traffic source type of the judgement page access record includes:
The characteristic set piecemeal, wherein each piecemeal includes one or more subregions,
It includes to the piecemeal that each characteristic in characteristic set, which executes traffic source deterministic process, In each characteristic execute the traffic source deterministic process.
To achieve the above object, other side according to an embodiment of the present invention provides a kind of determining page access The device of traffic source, comprising:
Record obtains module, for obtaining the record of one or more page access in first time period and the second time One or more ad clicks record in section;
Characteristic extracting module, for extracting in the record of each page access in first time period and second time period The feature of each ad click record, to obtain the characteristic of page access record and the characteristic of ad click record According to, and obtain the characteristic set being made of all characteristics;
Judgment module, the characteristic pair for being recorded according to each page access in the characteristic set The characteristic for the last ad click record answered judges the traffic source type of page access record.
Further, the feature for the page access record that the characteristic extracting module is extracted include: access time and The feature of access equipment number, the ad click record that the characteristic extracting module is extracted includes: to click time and pointing device number;
Described device further include: sorting module, for judging the stream of page access record in the characteristic set Before the step of measuring source type, the characteristic set is ranked up, to obtain one or more subregions of the set, It wherein, include: the one or more features data of identical device number in the subregion, the characteristic described in the subregion is pressed It sorts from far near according to the time;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in institute State what sequence in characteristic set recorded before the characteristic that the page access records and apart from nearest ad click Characteristic.
Further, the judgment module is further used for executing each characteristic in the characteristic set Following traffic source deterministic processes:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited The traffic source type for asking record is free flow;
Payment flow judgment step: if the characteristic of current page access record and its last ad click record Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note The traffic source type of record is payment flow.
Further, the judgment module be further used for judging current page access record whether be in its session the One record, if so, the preset duration is the first duration, otherwise, the preset duration is the second duration, when described second It is long to be greater than first duration.
The device of the traffic source of determining page access provided in an embodiment of the present invention, further includes: sessionizing module is used In each ad click note in each page access record and second time period in the extraction first time period It is in the first time period according to the access time of page access record and access equipment number before the step of feature of record Each page access record addition session identification, so that the feature of page access record further include: the session identification, In, there is identical access equipment number and the session identification phase of access time all page access records in preset third duration Together;
The session identification that the judgment module is further used in the characteristic recorded according to page access judges the page Interview asks whether record is first in its session record.
Further, the judgment module be further used for sequence to each characteristic in the characteristic set According to the execution traffic source deterministic process;
The traffic source deterministic process that the judgment module executes further include: fixed before the free flow judgment step Adopted first variable and the second variable, the first variable and bivariate original state are sky;In the free flow judgment step Execute selected step before: if current signature data are the characteristics of ad click record, being the first variable replacement should Characteristic, and empty the second variable;
The judgment module is further used for indicating the last advertisement point of current signature data using the first variable Hit the characteristic of record;
The judgment module be further used for judging the session identification of current signature data whether be equal to the second variable and Whether the second variable is sky, is indicated if being sky not equal to the second variable or the second variable if the session identification of current signature data The corresponding page access record of this feature data is first record in its session, and works as the corresponding page of this feature data Access record is the click of the characteristic of first record in its session and its corresponding the last ad click record It is then the session of this feature data the second variable replacement in the first duration of the time before the access time of its characteristic Mark.
Further, the characteristic is four-tuple, and the four-tuple of the page access record is < access equipment number, Access time, page access record, empty>, the four-tuple of the ad click record is<pointing device number, clicks the time, it is empty, Ad click record >,
The device of the traffic source of determining page access provided in an embodiment of the present invention, further includes: output module is used for The traffic source type for determining page access record is output binary group<traffic log after free flow, empty>, determining the page After the traffic source type of access record is payment flow, output binary group<traffic log, the last ad click record>.
To achieve the above object, other side according to an embodiment of the present invention provides a kind of determining page access The electronic equipment of traffic source, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing The method that device realizes the traffic source of determining page access provided in an embodiment of the present invention.
To achieve the above object, other side according to an embodiment of the present invention provides a kind of computer-readable medium, It is stored thereon with computer program, determining page access provided in an embodiment of the present invention is realized when described program is executed by processor Traffic source method.
The method and apparatus of the traffic source of determining page access provided by the invention, by page access record and advertisement point It hits record and is converted to characteristic, then the characteristic set being made of the two is ranked up, the characteristic after sequence According in set, the feature of its corresponding the last ad click record is found according to the feature that page access records, being based on should The last ad click record judges the traffic source type of page access with the relationship of page access record.Relative to existing There is technology by carrying out conjunctive query to traffic log and advertisement click logs, is then based on query result and carries out traffic source point Analysis, analysis method of the invention is efficiently quick, and accurately can carry out duplicate removal to traffic log.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the method flow diagram of the traffic source of determining page access provided in an embodiment of the present invention;
Fig. 2 is the application flow schematic diagram of the method for the traffic source of determining page access provided in an embodiment of the present invention;
Fig. 3 is the schematic device of the traffic source of determining page access provided in an embodiment of the present invention;
Fig. 4 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
The embodiment of the present invention provides a kind of method of the traffic source of determining page access, in the present invention, comes from client Html webpage content requests of end browser are looked at as a page access PV (page view), the stream of page access Amount source refers to that the secondary page access is that user is accessed by what kind of access channel, therebetween may be by repeatedly jumping Turn.
In the present invention, there are two types of types for the traffic source of page access, and one is free flows, refer to the access of the page It is the visit carried out by the operator for website or using website platform for a side of specific business without generating expense Ask what channel generated, for example, user is accessed specifically by the homepage or website search engine of website in electric business website Item detail page, then this time is free flow to the traffic source type of the access of commodity details page.Another kind is paid for Flow, with free flow on the contrary, the access of the i.e. page be generated by needing to generate the access channel of expense, for example, if with Family is ultimately linked to commodity details page by pay-per-click advertisement, then traffic source type of this time to the access of commodity details page As payment flow.
The method of the traffic source of determining page access provided in an embodiment of the present invention, as shown in Figure 1, comprising: step S101, step S102 and step S103, this method determine that the traffic source type of page access is freely to flow through the above steps Amount is also paid for flow.
In step s101, it obtains in one or more page access record and the second time period in first time period One or more ad clicks record.Wherein, page access record can be obtained from traffic log, and ad click record can To be obtained from advertisement click logs.Due in subsequent steps determine page access traffic source type when, the page visit Ask that record needs to be associated with the ad click record of time before it, therefore in the present invention, second time period not only includes First time period, a period of time before can also including first time period.For example, in this step, obtaining the complete of a certain day Traffic log is measured, and obtains the advertisement click logs of the full dose of this day and the proxima luce (prox. luc) of this day simultaneously.
In step s 102, each in each page access record and second time period in extraction first time period The feature of a ad click record, to obtain the characteristic of page access record and the characteristic of ad click record, and Obtain the characteristic set being made of all characteristics.Then in step s 103, in characteristic set, according to every The characteristic of the corresponding the last ad click record of the characteristic of one page access record judges the page access The traffic source type of record.
The method of the traffic source of determining page access provided by the invention, passes through the page access in first time period In the characteristic set of the characteristic composition of ad click record in the characteristic and second time period of record, utilize The feature of page access record and ad click record knows the corresponding relationship of page access record and ad click record, thus The traffic source type of page access record is judged.Compared with the existing technology by traffic log and ad click day Will carries out conjunctive query, is then based on query result and carries out traffic source analysis, analysis method of the invention is efficiently quick.
In step S102 of the present invention, when extracting the feature that page access record and ad click record, page access note The feature of record includes: access time and access equipment number, i.e., the user to access to the page access the page time and Device number.The feature of ad click record includes: to click time and pointing device number, i.e., the user's point clicked to the advertisement Hit time and the device number of the advertisement.
The method of the traffic source of determining page access provided by the invention further include: judge page in characteristic set Before the step of traffic source type of record is asked in interview, i.e., after step s 102, before step S103, to characteristic Set is ranked up, to obtain one or more subregions of the set, wherein include: in subregion one of identical device number or Multiple characteristics, characteristic sorts from far near according to the time in subregion, that is to say, that the foundation of sequence is device number And the time, device number subregion is first pressed, the characteristic with same device number is located in the same subregion, presses again in each subregion Time is by as far as nearly sequence, the characteristic of morning time comes front.
In step s 103, the characteristic of the corresponding the last ad click record of the characteristic of page access record According in characteristic set sequence before the characteristic that the page access record and apart from nearest ad click note The characteristic of record.Since sequence has been carried out in characteristic set, it can be found according to the feature that page access records The feature of corresponding the last time ad click record, since ad click record is to record it in page access in the ranking It is preceding and can judge page access apart from nearest, therefore based on ad click record and the relationship of page access record Traffic source type.
In the present invention, step S103 judges the traffic source class of page access record in characteristic set Type includes: to execute following traffic source deterministic processes to each characteristic in characteristic set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited The traffic source type for asking record is free flow.
Payment flow judgment step: if the characteristic of current page access record and its last ad click record Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note The traffic source type of record is payment flow.
That is, being the judgement of payment flow for the traffic source type of page access record, satisfaction is first had to Condition is that the characteristic of page access record is identical as the device number of characteristic of its last ad click record, i.e., Show that same user has carried out ad click before carrying out page access.When the conditions are satisfied, it also needs to confirm this recently Ad click record time whether page access record before preset duration in, the time of the page access of user If the time for clicking advertisement apart from its last time is separated by too long, the traffic source type of the page access payment is not considered as Flow.
In the present invention, in flow judgment step of paying further include: judge whether current page access record is its session In first record (in i.e. same session the time it is earliest page access record), if so, preset duration be the first duration, Otherwise, preset duration is the second duration, and the second duration is greater than the first duration.Page i.e. for being located at different location in same session Record is asked in interview, when judging whether it meets the requirements with the time that corresponding the last ad click is separated by, using different Judgment criteria.Due to the page access for the different location being located in a session, the same ad click may be will originate from, it should Ad click can be considered as all page access that can be influenced in the session, and have same effect to all page access Power, but the time that may be separated by between the page access record in same session is more long, therefore, for first in session Page access record and positioned at other page access record and the last ad click record thereafter interval time whether The first duration and the second duration is respectively adopted as judgment criteria in the judgement met the requirements.First duration and the second duration can To be set according to specific demand.
The method of the traffic source of determining page access provided by the invention further include: every in first time period extracting Before the feature of each ad click record in one page access record and second time period, i.e., after step slol Before step S102, further includes:
According to each page access note that the access time of page access record and access equipment number are in first time period Record addition session identification, so that the feature of page access record further include: session identification, wherein there is identical access equipment Number and access time all page access record in preset 5th duration session identification it is identical, the 5th duration is used to cut The duration of branch's words, can be set according to specific requirements, and by the duration, the page access record of identical access equipment number is drawn It is divided into a session.
In payment flow judgment step in step s 103, for page access record whether be in its session first A record is judged according to the session identification in the characteristic of page access record.
Below with reference to a specific embodiment to the method for the traffic source of determining page access provided by the invention into Row further instruction.
In the present embodiment, in step s101, obtain the full dose traffic log of a certain day, and obtain simultaneously the day and The advertisement click logs of the full dose of the proxima luce (prox. luc) of this day.Due in the subsequent deterministic process of present embodiment, by page access Whether record, which can be associated with the record of the ad click before it in 15 minutes or 24 hours and be used as, judges that the page access records Whether be pay flow condition.Therefore, the advertisement click logs obtained in this step are one more than corresponding traffic log Day.
What the device number and access time for extracting all page access record in traffic log were recorded as page access Feature extracts the device number of all ad clicks record in advertisement click logs and clicks what the time recorded as ad click Feature, in the present embodiment, characteristic are four-tuple, in step s 102, by all page access record and advertisement point It hits record and is separately converted to<device number, logging time, page access record, ad click record>joint four-tuple form.
The four-tuple of page access record is<access equipment number, access time, page access record, empty>, wherein the page Access record itself is also used as a characteristic element, wherein having session identification, the 4th element of four-tuple is sky.For example, if The four-tuple of all page access record obtained are as follows:
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>;
<name2,2016-06-19 11:00, visitlog3, NULL>;
<name3,2016-06-19 12:00, visitlog4, NULL>.
Wherein, name indicates access equipment number, is followed by access time, visitlog representation page access record, NULL It is expressed as sky.
The four-tuple of ad click record is<pointing device number, clicks the time, empty, ad click record>.Wherein, advertisement It clicks record itself and is also used as a characteristic element, it is corresponding with the four-tuple of page access record, the four of ad click record The third element of tuple is sky, so that the format system of the four-tuple of page access record and the four-tuple of ad click record One, the two data can be merged into unified structure.
Example in correspondence, if the four-tuple of all ad clicks record obtained are as follows:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name3,2016-06-19 10:00, NULL, adlog2>;
<name4,2016-06-20 10:00, NULL, adlog3>;
<name5,2016-06-21 10:00, NULL, adlog4>.
Wherein, name indicates pointing device number, is followed by and clicks the time, and NULL is expressed as sky, and adlog indicates ad click Record.
As shown in Fig. 2, the four-tuple of the four-tuple of above-mentioned all page access records and ad click record is closed And obtain four-tuple set:
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>;
<name2,2016-06-19 11:00, visitlog3, NULL>;
<name3,2016-06-19 12:00, visitlog4, NULL>;
<name1,2016-06-18 10:00, NULL, adlog1>;
<name3,2016-06-19 10:00, NULL, adlog2>;
<name4,2016-06-20 10:00, NULL, adlog3>;
<name5,2016-06-21 10:00, NULL, adlog4>.
Subregion and sequence are carried out to four-tuple set, make include: in scoring area identical device number one or more features Data, characteristic sorts from far near according to the time in subregion.Ibid example obtains 5 subregions of above-mentioned four-tuple set:
Subregion 1:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>.
Subregion 2:
<name2,2016-06-19 11:00, visitlog3, NULL>.
Subregion 3:
<name3,2016-06-19 10:00, NULL, adlog2>;
<name3,2016-06-19 12:00, visitlog4, NULL>.
Subregion 4:
<name4,2016-06-20 10:00, NULL, adlog3>.
Subregion 5:
<name5,2016-06-21 10:00, NULL, adlog4>.
In the present embodiment, above-mentioned four-tuple set piecemeal, wherein each piecemeal includes one or more subregions, It obtains:
Piecemeal 1:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:20, visitlog2, NULL>.
Piecemeal 2:
<name2,2016-06-19 11:00, visitlog3, NULL>.
Piecemeal 3:
<name3,2016-06-19 10:00, NULL, adlog2>;
<name3,2016-06-19 12:00, visitlog4, NULL>.
Piecemeal 4:
<name4,2016-06-20 10:00, NULL, adlog3>;
<name5,2016-06-21 10:00, NULL, adlog4>.
It may include multiple subregions in one piecemeal, the method for piecemeal can be determined according to specific demand, such as can To carry out piecemeal according to initial, then one shares 26 piecemeals.Due in actual application, in characteristic set Characteristic is often very more, after carrying out piecemeal to characteristic set in the present embodiment, subsequent traffic source judgement Process parallel can carry out in each piecemeal, to accelerate the processing speed of analysis method of the present invention.
It certainly, can also be without piecemeal, directly to four-tuple for the four-tuple set after having carried out subregion and having sorted Set carries out subsequent traffic source deterministic process, carries out traffic source deterministic process to four-tuple set and flows with to piecemeal The principle of amount source deterministic process is consistent, and in the following the description of present embodiment, is sentenced with carrying out traffic source to piecemeal Disconnected process is described in detail traffic source deterministic process.
In the present embodiment, traffic source deterministic process is executed to each four-tuple in piecemeal, defines the first variable With the second variable, the first variable and bivariate original state are sky, and sequence executes stream to each four-tuple in piecemeal Source deterministic process is measured, i.e., carries out the fast interior all four-tuples of traversal point in sequence.
By taking piecemeal 1 as an example:
Piecemeal 1:
<name1,2016-06-18 10:00, NULL, adlog1>;
<name1,2016-06-18 10:10, visitlog1, NULL>;
<name1,2016-06-18 10:30, visitlog2, NULL>.
Firstly, the characteristic for executing the last ad click record selectes step:
If the ad click record adlog in current four-tuple is not sky, showing that current four-tuple is is ad click note The characteristic of record is then the four-tuple the first variable replacement, and empties the second variable.
First four-tuple<name1 in piecemeal 1,2016-06-18 10:00, NULL, adlog1>meet above-mentioned item Part, will be four-tuple<name1 the first variable replacement at this time, 2016-06-18 10:00, NULL, adlog1>, and empty the Two variables terminate the traffic source deterministic process to the four-tuple.Then sequence by next four-tuple in piecemeal 1 < Name1,2016-06-18 10:10, visitlog1, the current four-tuple of NULL > conduct repeat traffic source and judged Journey.
In subsequent free flow judgment step and in payment flow judgment step, indicated by the first variable current The characteristic of the last ad click record of characteristic.
To current four-tuple<name1, the last advertisement point of 2016-06-18 10:10, visitlog1, NULL>execution The characteristic for hitting record selectes step, and judgement learns that the four-tuple is unsatisfactory for condition, and free flow judgement is then executed to it Step: if current four-tuple page access record visitlog is not sky, show that the four-tuple is the spy for being page access record Data are levied, then judge whether the four-tuple is identical as the device number of the four-tuple of its last ad click record, if it is different, The traffic source type for then determining page access record is free flow.In the traffic source type for determining page access record After free flow, output binary group<traffic log, empty>.
Current four-tuple<name1,2016-06-18 10:10, visitlog1, NULL>device number and it is the last The device number of the four-tuple (i.e. the first variable) of ad click record is identical, is unsatisfactory for the judgement item of free flow judgment step Part continues to carry out it following payment flow judgment steps.
In payment flow judgment step, judge whether the corresponding page access record of current four-tuple is in its session First record, specifically: judge the session identification of current four-tuple whether be equal to the second variable and the second variable whether be Sky indicates the corresponding page of the four-tuple if being sky not equal to the second variable or the second variable if the session identification of current four-tuple Interview asks that record is time in its session earliest record, otherwise indicates that the corresponding page access record of the four-tuple is not it Time in session earliest record.
If current page access record is first record in its session, the last ad click record is judged Whether the click time of characteristic (the first variable) in the first duration before the access time of current four-tuple, in this reality It applies in mode, the first duration is selected as 15 minutes, if, it is determined that the traffic source type of page access record is payment stream Amount, output binary group<traffic log, the last ad click record>, and be the meeting of this feature data the second variable replacement Words mark.
If current page access record is not first record in its session, judge that the last time ad click is remembered Whether the click time of the characteristic (the first variable) of record in the second duration before the access time of current four-tuple, In present embodiment, the second duration is selected as 24 hours, if, it is determined that the traffic source type of page access record is payment Flow, output binary group<traffic log, the last ad click record>.
For current four-tuple<name1,2016-06-18 10:10, visitlog1, NULL>payment flow judge step In rapid, the second variable is sky, determines that the corresponding page access record of the four-tuple is time in its session earliest record, and The click time of the characteristic (the first variable) of the last ad click record is before the access time of current four-tuple 15 minutes in, determine the traffic source type of the four-tuple corresponding page access record for payment flow, output binary group < Visitlog1, adlog1 >, terminate the traffic source deterministic process for the current four-tuple.
Then sequence by the next four-tuple < name1,2016-06-18 10:20, visitlog2 in piecemeal 1, The current four-tuple of NULL > conduct, repeats traffic source deterministic process.The current four-tuple is unsatisfactory for the last advertisement point The characteristic for hitting record selectes the condition of step and free flow judgment step, and payment flow judgment step is executed to it.This When, the second variable be four-tuple<name1,2016-06-18 10:10, visitlog1, NULL>, judge the meeting of current four-tuple Words identify whether to be equal to bivariate session identification, it is assumed that the two is equal, then it represents that the corresponding page access of current four-tuple Record is not the earliest record of the time in its session.And the characteristic (the first variable) of the last ad click record It clicks in 24 hours before the access time of current four-tuple of time, determines the corresponding page access record of current four-tuple Traffic source type be payment flow, export binary group<visitlog2, adlog1>.To complete the flow for piecemeal 1 Source deterministic process.
The binary group data exported by the method for the invention can carry out the flow of each page access in traffic log Source Type distinguishes, and since each user per second is up to an access log, pass through what is recorded in binary group Access time and access equipment number accurately can carry out duplicate removal to the page access in traffic log.
The method of the traffic source of determining page access provided by the invention further includes following entrance determination steps:
When the traffic source type of the current four-tuple of determination corresponding page access record is free flow, and the page is visited What is asked is initiated by level-one entrance or secondary inlet, it is determined that the traffic ingress of the page access be corresponding level-one entrance or The title of secondary inlet, wherein level-one entrance is the name of tv column of website homepage, and secondary inlet is that the column of homepage is once jumped Turn the landing page that can be arrived.For the flow of level-one and secondary inlet can be classified as simultaneously, preferentially incorporate into level-one entrance.In this hair In bright, homepage is had by oneself by (PC, APP, mobile terminal browser etc.) to various clients and bury a little, with what is clicked to user Column is identified, so that it is determined that whether the page access of user is to be initiated by level-one entrance or secondary inlet.
In the present invention, each session is made of the page jumped in order, when a session is led to by user Cross what level-one entrance or secondary inlet were initiated, the traffic ingress of all page access (PV) in this session is entry name Claim.
When the traffic source type of the corresponding page access record of the current four-tuple of determination is payment flow, it is determined that the page The traffic ingress that interview is asked is the title of corresponding advertisement type, that is, the last ad click record in the binary group exported The title of advertisement type.
The method of the traffic source of determining page access provided by the invention realizes following pairs of payment traffic source types Decision logic: have any ad click in the preceding first time period (such as 15 minutes) of the page access in a certain session, should Page access incorporates the advertising channel of user's click into after ad click in session.Each page access possesses unique advertisement Channel is subject to if being associated with multiple ad click at a distance of nearest.The ad click occurred in session only influences extensively Session where accusing, the ad click occurred outside session only influence a subsequent session.For that can be classified as paying simultaneously The page access of channel and free channel preferentially incorporates into as flow of paying.
In practical applications, after determining the traffic source type of page access, different flow source can accordingly be carried out The traffic statistics of type, and corresponding order volume is combined, the conversion ratio of flow is calculated, thus to payment flow and free flow Effect is reasonably assessed.
The order volume for obtaining corresponding source can be counted by the judgement in order source, to carry out the conversion of above-mentioned flow Rate calculates, and is briefly introduced the decision procedure in order source below.Order source is divided into payment and free source.When one When order can be classified as payment source and free source simultaneously, payment source is preferentially incorporated into.
The judgement in payment order source is realized by advertisement merchandiser, as user buys commodity for the previous period (such as 15 days) The a certain product line advertisement that landing page is the affiliated three-level classification+brand article of the commodity was inside clicked, which is denoted as the advertisement The affiliated order of product line.The advertisement that multiple different product line was clicked in such as this time, by away from the recent advertisement that places an order It clicks and order is divided to a certain advertised product line.
The judgement in free order source is obtained by the association detailed access to web page of quotient with order row.By a certain SKU (minimum category list Member) (standardized product unit refers to same by the page access PV and same SPU in a period of time (such as 24 hours) thereafter of commodity details page Style commodity, such as certain Mobile phone are a SPU, two SKU with the golden version and silver color version of capacity), with shop where SPU Paving is associated with three-level classification goods orders row where SPU, is incorporated order to a certain traffic ingress into using the entrance of PV.
It stands the judgement of external flux source: being linked, sentenced by the upper hop of first PV in identification station external flux bring session Which home Web site PV in disconnected session derives from, it will the traffic source of all PV in words incorporates this channel into.
The method of the traffic source of determining page access provided by the invention records page access record and ad click Characteristic is converted to, then the characteristic set being made of the two is ranked up, the characteristic set after sequence In, the feature of its corresponding the last ad click record is found according to the feature that page access records, based on this nearest one Secondary ad click record judges the traffic source type of page access with the relationship of page access record.Compared with the existing technology By carrying out conjunctive query to traffic log and advertisement click logs, it is then based on query result and carries out traffic source analysis, originally The analysis method of invention is efficiently quick, and accurately can carry out duplicate removal to traffic log.
The embodiment of the present invention also provides a kind of device of the traffic source of determining page access, as shown in figure 3, the device 300 include: that record obtains module 301, characteristic extracting module 302 and judgment module 303.
When record obtains module 301 for obtaining the record of one or more page access in first time period and second Between one or more ad clicks record in section.
Characteristic extracting module 302 is used to extract in each page access record and the second time period in first time period Each ad click record feature, with obtain page access record characteristic and ad click record characteristic According to, and obtain the characteristic set being made of all characteristics.
Judgment module 303 is used in characteristic set, and the characteristic according to each page access record is corresponding The last ad click record characteristic judge the page access record traffic source type.
The feature for the page access record that characteristic extracting module 302 is extracted includes: access time and access equipment number, feature The feature for the ad click record that extraction module extracts includes: to click time and pointing device number.
The device of the traffic source of determining page access provided in an embodiment of the present invention further include: sorting module, sort mould Before block is used to judge in characteristic set the step of traffic source type of page access record, to characteristic set It is ranked up, to obtain one or more subregions of the set, wherein include: the one or more of identical device number in subregion Characteristic, characteristic sorts from far near according to the time in subregion.
The characteristic of the corresponding the last ad click record of the characteristic of page access record is in characteristic The characteristic recorded before the characteristic that the page access records and apart from nearest ad click according to sequence in set.
Judgment module 303 is further used for sentencing the following traffic sources of each characteristic execution in characteristic set Disconnected process:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature Whether data are identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page is visited The traffic source type for asking record is free flow.
Payment flow judgment step: if the characteristic of current page access record and its last ad click record Characteristic device number it is identical, then judge the last time ad click record characteristic the click time whether In preset duration before the access time of the characteristic of current page access record, if, it is determined that page access note The traffic source type of record is payment flow.
Judgment module 303 is further used for judging whether current page access record is first record in its session, If so, preset duration is the first duration, otherwise, preset duration is the second duration, and the second duration is greater than the first duration.
The device of the traffic source of determining page access provided in an embodiment of the present invention further include: sessionizing module, meeting Division module is talked about for wide in each page access record in extraction first time period and each in second time period It is every in first time period according to the access time of page access record and access equipment number before accusing the feature for clicking record A page access record addition session identification, so that the feature of page access record further include: session identification, wherein have The session identification of identical access equipment number and access time all page access records in preset third duration is identical.
The session identification that judgment module 303 is further used in the characteristic recorded according to page access judges the page Whether access record is first in its session record.
Judgment module 303 is further used for defining the first variable and the second variable, the first variable and the bivariate initial State is sky.
The judgment module be further used for sequence in the characteristic set each characteristic execute institute State traffic source deterministic process, the traffic source deterministic process further include: execute before the free flow judgment step The last ad click record characteristic select step: if current signature data be ad click record characteristic According to, then it is this feature data the first variable replacement, and empty the second variable, it is in the free flow judgment step and described In flow judgment step of paying, the feature of the last ad click record of current signature data is indicated by the first variable Data.
Judgment module 303 is further used in the payment flow judgment step, judges the session of current signature data It identifies whether to be equal to whether the second variable and the second variable are empty, if the session identification of current signature data becomes not equal to second Amount or the second variable are that sky then indicates that this feature data corresponding page access record is time in its session earliest note Otherwise record indicates that this feature data corresponding page access record is not time in its session earliest record, and when should Characteristic corresponding page access record is the earliest record of the time in its session and its corresponding the last advertisement point It hits in the first duration of the click time of the characteristic of record before the access time of its characteristic, then the second variable Replace with the session identification of this feature data.
In the present invention, characteristic is four-tuple, and the four-tuple of page access record is < access equipment number, when access Between, page access record, empty>, the four-tuple of ad click record is<pointing device number, clicks the time, empty, ad click note Record >.
The device of the traffic source of determining page access provided in an embodiment of the present invention further include: output module is used for The traffic source type for determining page access record is output binary group<traffic log after free flow, empty>, determining the page After the traffic source type of access record is payment flow, output binary group<traffic log, the last ad click record>.
Judgment module 303 is further used for characteristic set piecemeal, wherein each piecemeal includes one or more points Area executes traffic source deterministic process to each characteristic in characteristic set and refers to each characteristic in piecemeal According to execution traffic source deterministic process.
The device of the traffic source of determining page access provided by the invention records page access record and ad click Characteristic is converted to, then the characteristic set being made of the two is ranked up, the characteristic set after sequence In, the feature of its corresponding the last ad click record is found according to the feature that page access records, based on this nearest one Secondary ad click record judges the traffic source type of page access with the relationship of page access record.Compared with the existing technology By carrying out conjunctive query to traffic log and advertisement click logs, it is then based on query result and carries out traffic source analysis, originally The analysis method of invention is efficiently quick, and accurately can carry out duplicate removal to traffic log.
Below with reference to Fig. 4, it illustrates the computer systems 400 for the electronic equipment for being suitable for being used to realize the embodiment of the present invention Structural schematic diagram.Electronic equipment shown in Fig. 4 is only an example, function to the embodiment of the present invention and should not use model Shroud carrys out any restrictions.
As shown in figure 4, computer system 400 includes central processing unit (CPU) 401, it can be read-only according to being stored in Program in memory (ROM) 402 or be loaded into the program in random access storage device (RAM) 403 from storage section 408 and Execute various movements appropriate and processing.In RAM 403, also it is stored with system 400 and operates required various programs and data. CPU 401, ROM 402 and RAM 403 are connected with each other by bus 404.Input/output (I/O) interface 405 is also connected to always Line 404.
I/O interface 405 is connected to lower component: the importation 406 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 407 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 408 including hard disk etc.; And the communications portion 409 of the network interface card including LAN card, modem etc..Communications portion 409 via such as because The network of spy's net executes communication process.Driver 410 is also connected to I/O interface 405 as needed.Detachable media 411, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 410, in order to read from thereon Computer program be mounted into storage section 408 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 409, and/or from can Medium 411 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 401, system of the invention is executed The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet It includes record and obtains module, characteristic extracting module and judgment module.Wherein, the title of these modules is not constituted under certain conditions Restriction to the module itself, for example, characteristic extracting module is also described as " for the characteristic set piecemeal Module ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes:
It obtains one or more page access record in first time period and the one or more in second time period is wide It accuses and clicks record;
Extract each page access record in first time period and each ad click note in second time period The feature of record to obtain the characteristic of page access record and the characteristic of ad click record, and is obtained by all spies Levy the characteristic set of data composition;
In the characteristic set, the characteristic recorded according to each page access is corresponding the last wide Accuse the traffic source type that the characteristic clicked and recorded judges page access record.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims (17)

1. a kind of method of the traffic source of determining page access characterized by comprising
Obtain one or more page access record in first time period and one or more advertisement points in second time period Hit record;
Extract each page access record in first time period and each ad click record in second time period Feature to obtain the characteristic of page access record and the characteristic of ad click record, and is obtained by all characteristics According to the characteristic set of composition;
In the characteristic set, according to the corresponding the last advertisement point of the characteristic of each page access record The characteristic for hitting record judges the traffic source type of page access record.
2. the method according to claim 1, wherein
The feature of the page access record includes: access time and access equipment number;The feature of ad click record includes: a little Hit time and pointing device number;
It is described judgement page access record traffic source type the step of before, the characteristic set is arranged Sequence, to obtain one or more subregions of the set, wherein include: the one or more spy of identical device number in the subregion Data are levied, the characteristic described in the subregion sorts from far near according to the time;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in the spy Levy the feature that sequence records before the characteristic that the page access records and apart from nearest ad click in data acquisition system Data.
3. according to the method described in claim 2, it is characterized in that, the traffic source type of the judgement page access record Step includes:
Following traffic source deterministic processes are executed to each characteristic in the characteristic set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature data It is whether identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page access is remembered The traffic source type of record is free flow;
Payment flow judgment step: if the spy that the characteristic of current page access record and its last ad click record Whether the device number for levying data is identical, then judge the click time of the characteristic of the last time ad click record current In preset duration before the access time of the characteristic of page access record, if, it is determined that page access record Traffic source type is payment flow.
4. according to the method described in claim 3, it is characterized in that, in the payment flow judgment step further include:
Judge whether current page access record is first in its session record, if so, the preset duration is first Duration, otherwise, the preset duration are the second duration, and second duration is greater than first duration.
5. according to the method described in claim 4, it is characterized in that,
Each ad click in each page access record and second time period in the extraction first time period Before the step of feature of record, further includes: the access time and access equipment number recorded according to page access is described first Each page access record addition session identification in period, so that the feature of page access record further includes the session Mark, wherein there is identical access equipment number and the meeting of access time all page access records in preset third duration Words mark is identical;
It is the feature recorded according to page access for whether page access record is the judgement of first in its session record What the session identification in data carried out.
6. according to the method described in claim 5, it is characterized in that,
The traffic source deterministic process is executing to each characteristic in the characteristic set for sequence, described Traffic source deterministic process further include:
Define the first variable and the second variable before the free flow judgment step, the first variable and the bivariate initial State is sky;
Selected step is executed before the free flow judgment step: if current signature data are the features of ad click record Data are then this feature data the first variable replacement, and empty the second variable;
In the free flow judgment step and in the payment flow judgment step, current spy is indicated using the first variable The characteristic of the last ad click record of data is levied,
In the payment flow judgment step, judge whether page access record is that first record in its session includes: Judge whether the session identification of current signature data is equal to whether the second variable and the second variable are sky, if current signature data Session identification not equal to the second variable or the second variable be that sky then indicates that the corresponding page access record of this feature data is First record in its session,
And when this feature data corresponding page access record be first record in its session and its corresponding nearest one In the first duration of the click time of the characteristic of secondary ad click record before the access time of its characteristic, then Second variable replacement is the session identification of this feature data.
7. method according to any one of claim 3 to 6, which is characterized in that the characteristic is four-tuple, described The four-tuple of page access record is<access equipment number, access time, page access record, empty>, the ad click record Four-tuple be<pointing device number, click the time, it is empty, ad click record>,
The method also includes: after determining that the traffic source type of page access record is free flow, export binary group < stream Amount log, empty>, after determining the traffic source type of page access record for payment flow, output binary group<traffic log, The last ad click record >.
8. method according to any one of claim 3 to 6, which is characterized in that the stream of the judgement page access record Measure source type the step of include:
The characteristic set piecemeal, wherein each piecemeal includes one or more subregions,
It includes in the piecemeal that each characteristic in characteristic set, which executes traffic source deterministic process, Each characteristic executes the traffic source deterministic process.
9. a kind of device of the traffic source of determining page access characterized by comprising
Record obtains module, for obtaining in the record of one or more page access in first time period and second time period One or more ad clicks record;
Characteristic extracting module, it is each in the record of each page access in first time period and second time period for extracting The feature of a ad click record, to obtain the characteristic of page access record and the characteristic of ad click record, and Obtain the characteristic set being made of all characteristics;
Judgment module, in the characteristic set, the characteristic recorded according to each page access to be corresponding The characteristic of the last ad click record judges the traffic source type of page access record.
10. device according to claim 9, which is characterized in that the page access that the characteristic extracting module is extracted The feature of record includes: access time and access equipment number, the feature for the ad click record that the characteristic extracting module is extracted It include: to click time and pointing device number;
Described device further include: sorting module, for judging that the flow of page access record comes in the characteristic set Before the step of Source Type, the characteristic set is ranked up, to obtain one or more subregions of the set, In, include: the one or more features data of identical device number in the subregion, the characteristic described in the subregion according to Time sorts from far near;
The characteristic of the corresponding the last ad click record of the characteristic of the page access record is in the spy Levy the feature that sequence records before the characteristic that the page access records and apart from nearest ad click in data acquisition system Data.
11. device according to claim 10, which is characterized in that the judgment module is further used for the characteristic Following traffic source deterministic processes are executed according to each characteristic in set:
Free flow judgment step: if current signature data are the characteristic of page access record, judge this feature data It is whether identical as the device number of the characteristic of its last ad click record, if it is different, then determining that the page access is remembered The traffic source type of record is free flow;
Payment flow judgment step: if the spy that the characteristic of current page access record and its last ad click record Whether the device number for levying data is identical, then judge the click time of the characteristic of the last time ad click record current In preset duration before the access time of the characteristic of page access record, if, it is determined that page access record Traffic source type is payment flow.
12. device according to claim 11, which is characterized in that the judgment module is further used for judging current page Whether access record is first in its session record, if so, the preset duration is the first duration, it is otherwise, described pre- If when a length of second duration, second duration is greater than first duration.
13. device according to claim 12, which is characterized in that further include: sessionizing module, in the extraction The step of the feature of each page access record in first time period and each ad click record in second time period It is each page access in the first time period according to the access time of page access record and access equipment number before rapid Record addition session identification, so that the feature of page access record further include: the session identification, wherein there is identical visit It asks device number and the session identification of access time all page access records in preset third duration is identical;
The session identification that the judgment module is further used in the characteristic recorded according to page access judges that the page is visited Ask whether record is first in its session record.
14. device according to claim 13, which is characterized in that the judgment module be further used for sequence to described Each characteristic in characteristic set executes the traffic source deterministic process;
The traffic source deterministic process that the judgment module executes further include: the is defined before the free flow judgment step One variable and the second variable, the first variable and bivariate original state are sky;Before the free flow judgment step It executes selected step: being this feature the first variable replacement if current signature data are the characteristics of ad click record Data, and empty the second variable;
The judgment module is further used for indicating the last ad click note of current signature data using the first variable The characteristic of record;
The judgment module is further used for judging whether the session identification of current signature data is equal to the second variable and second Whether variable is sky, indicates the spy if being sky not equal to the second variable or the second variable if the session identification of current signature data The corresponding page access record of sign data is first record in its session, and works as the corresponding page access of this feature data Record is the click time of the characteristic of first record in its session and its corresponding the last ad click record It is then the session mark of this feature data the second variable replacement in the first duration before the access time of its characteristic Know.
15. device described in 1 or 14 according to claim 1, which is characterized in that it is characterized in that, the characteristic is quaternary Group, the four-tuple of page access record are<access equipment number, access time, page access record, empty>, the advertisement point The four-tuple for hitting record is<pointing device number, clicks the time, empty, ad click record>,
Described device further include: output module, for determine page access record traffic source type be free flow after, Output binary group<traffic log, empty>, after determining the traffic source type of page access record for payment flow, output binary Group<traffic log, the last ad click record>.
16. a kind of electronic equipment of the traffic source of determining page access characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method described in any one of claims 1-8.
17. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method described in any one of claims 1-8 is realized when row.
CN201711205737.7A 2017-11-27 2017-11-27 Method and device for determining flow source of page access Active CN110020364B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711205737.7A CN110020364B (en) 2017-11-27 2017-11-27 Method and device for determining flow source of page access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711205737.7A CN110020364B (en) 2017-11-27 2017-11-27 Method and device for determining flow source of page access

Publications (2)

Publication Number Publication Date
CN110020364A true CN110020364A (en) 2019-07-16
CN110020364B CN110020364B (en) 2021-11-30

Family

ID=67186617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711205737.7A Active CN110020364B (en) 2017-11-27 2017-11-27 Method and device for determining flow source of page access

Country Status (1)

Country Link
CN (1) CN110020364B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307370A (en) * 2020-10-26 2021-02-02 银盛通信有限公司 Order source tracking method and system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799830A (en) * 2010-03-25 2010-08-11 北京国双科技有限公司 Flow data processing method capable of realizing multi-dimensional free analysis
CN102411573A (en) * 2010-09-20 2012-04-11 百度在线网络技术(北京)有限公司 Method and system for acquiring information based on behavior of webpage visitor in webpage
CN102684925A (en) * 2012-05-24 2012-09-19 北京国双科技有限公司 Method and device for acquiring internet access source information
CN102999572A (en) * 2012-11-09 2013-03-27 同济大学 User behavior mode digging system and user behavior mode digging method
CN104346374A (en) * 2013-07-31 2015-02-11 阿里巴巴集团控股有限公司 Data processing method and system
CN104426713A (en) * 2013-08-28 2015-03-18 腾讯科技(北京)有限公司 Method and device for monitoring network site access effect data
CN105357054A (en) * 2015-11-26 2016-02-24 上海晶赞科技发展有限公司 Website traffic analysis method and apparatus, and electronic equipment
CN105450460A (en) * 2014-06-03 2016-03-30 北京奇虎科技有限公司 Network operation recording method and system
US20160117297A1 (en) * 2014-10-24 2016-04-28 POWr Inc. Systems and methods for dynamic, real time management of cross-domain web content
CN105677866A (en) * 2016-01-08 2016-06-15 车智互联(北京)科技有限公司 Content conversion tracing method, device and system and conversion server
CN105718184A (en) * 2014-12-05 2016-06-29 北京搜狗科技发展有限公司 Data processing method and apparatus
CN105989002A (en) * 2015-01-27 2016-10-05 阿里巴巴集团控股有限公司 Webpage data query method and device, and method and device for establishing webpage jump path database
CN106897196A (en) * 2015-12-17 2017-06-27 北京国双科技有限公司 The determination method and device of access path between Website page

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799830A (en) * 2010-03-25 2010-08-11 北京国双科技有限公司 Flow data processing method capable of realizing multi-dimensional free analysis
CN102411573A (en) * 2010-09-20 2012-04-11 百度在线网络技术(北京)有限公司 Method and system for acquiring information based on behavior of webpage visitor in webpage
CN102684925A (en) * 2012-05-24 2012-09-19 北京国双科技有限公司 Method and device for acquiring internet access source information
CN102999572A (en) * 2012-11-09 2013-03-27 同济大学 User behavior mode digging system and user behavior mode digging method
CN104346374A (en) * 2013-07-31 2015-02-11 阿里巴巴集团控股有限公司 Data processing method and system
CN104426713A (en) * 2013-08-28 2015-03-18 腾讯科技(北京)有限公司 Method and device for monitoring network site access effect data
CN105450460A (en) * 2014-06-03 2016-03-30 北京奇虎科技有限公司 Network operation recording method and system
US20160117297A1 (en) * 2014-10-24 2016-04-28 POWr Inc. Systems and methods for dynamic, real time management of cross-domain web content
CN105718184A (en) * 2014-12-05 2016-06-29 北京搜狗科技发展有限公司 Data processing method and apparatus
CN105989002A (en) * 2015-01-27 2016-10-05 阿里巴巴集团控股有限公司 Webpage data query method and device, and method and device for establishing webpage jump path database
CN105357054A (en) * 2015-11-26 2016-02-24 上海晶赞科技发展有限公司 Website traffic analysis method and apparatus, and electronic equipment
CN106897196A (en) * 2015-12-17 2017-06-27 北京国双科技有限公司 The determination method and device of access path between Website page
CN105677866A (en) * 2016-01-08 2016-06-15 车智互联(北京)科技有限公司 Content conversion tracing method, device and system and conversion server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUHAO ZHU 等: "Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing", 《IEEE COMPUTER ARCHITECTURE LETTERS》 *
郭绍华: "电子商务平台中流量统计模块的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307370A (en) * 2020-10-26 2021-02-02 银盛通信有限公司 Order source tracking method and system

Also Published As

Publication number Publication date
CN110020364B (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN107832468B (en) Demand recognition methods and device
CN108664513B (en) Method, device and equipment for pushing keywords
KR100792697B1 (en) Method and system for decideing advertising fee
CN108985809A (en) Motivate method, apparatus, electronic equipment and the storage medium of push
CN108632311A (en) Information-pushing method and device
CN106911801A (en) The method and information transmission system of association user information
CN109388548A (en) Method and apparatus for generating information
CN110298716A (en) Information-pushing method and device
CN108932625A (en) Analysis method, device, medium and the electronic equipment of user behavior data
CN110020143A (en) A kind of landing page generation method and device
CN107885784A (en) The method and apparatus for extracting user characteristic data
CN109040000A (en) IP address-based user identification method and system
CN108737486A (en) Information-pushing method and device
KR20160070282A (en) Providing system and method for shopping mall web site, program and recording medium thereof
CN109190027A (en) Multi-source recommended method, terminal, server, computer equipment, readable medium
KR20150102266A (en) Travel content providing system using big data
CN111626767A (en) Resource data distribution method, device and equipment
CN110020131B (en) Method and device for arranging commodities
CN110035053A (en) For detecting user-content provider couple method and system of fraudulent
KR101083002B1 (en) Method and Server apparatus for calculating User Conversion Rate
CN110276566A (en) Information output method and device
CN110362583A (en) A kind of data processing method and device for multi-data source
CN110020364A (en) The method and apparatus for determining the traffic source of page access
CN113327146A (en) Information tracking method and device
KR100462294B1 (en) Method and System for Providing Information on Article of Commerce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant