CN108241984A - A kind of visitor's sorting technique and device - Google Patents

A kind of visitor's sorting technique and device Download PDF

Info

Publication number
CN108241984A
CN108241984A CN201611208440.1A CN201611208440A CN108241984A CN 108241984 A CN108241984 A CN 108241984A CN 201611208440 A CN201611208440 A CN 201611208440A CN 108241984 A CN108241984 A CN 108241984A
Authority
CN
China
Prior art keywords
session
user
visitor
average
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611208440.1A
Other languages
Chinese (zh)
Inventor
卢金金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201611208440.1A priority Critical patent/CN108241984A/en
Publication of CN108241984A publication Critical patent/CN108241984A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of visitor's sorting technique and device, the historical session data of visitor to be predicted are obtained first, and from the historical session extracting data session characteristics.The session characteristics that extraction obtains are input in the object classifiers that training obtains in advance and are analyzed, obtain the classification results of the visitor to be predicted.This method judges whether the user is conversion user using the rule of the historical session data of user.Precision dispensing is carried out to filter out conversion user in next step, effectively reduction launches user group, so as to save dispensing cost.

Description

A kind of visitor's sorting technique and device
Technical field
The application belongs to technical field of the computer network more particularly to a kind of visitor's sorting technique and device.
Background technology
As network technology develops rapidly, network has become an important component of people's daily life, people The network free choice of goods, participation network courses etc. can be passed through.
For example, user sees the ad content shown on the webpage while webpage is accessed, for advertisement putting person For, how the network session quality of visual assessment visitor, so as to find high quality user group, carrying out precision marketing becomes Urgent problem to be solved.
The method that the method for the network session quality of existing assessment user mainly uses self-defined page weight, for example, The weight of " submitting order " page is larger, and other page weights unrelated with order conversion are smaller.All pages accessed user The weight summation in face, so as to score user conversation quality, still, such method cannot utilize user conversation historical data Rule, predict whether user future can generate order.
Invention content
In view of the above problems, it is proposed that the application overcomes the above problem in order to provide one kind or solves at least partly State the visitor's sorting technique and device of problem.
In a first aspect, the application provides a kind of visitor's sorting technique, including:
Obtain the historical session data of visitor to be predicted;
From the historical session extracting data session characteristics;
Classification processing is carried out to the session characteristics using object classifiers trained in advance, obtains the visitor to be predicted Classification results, wherein, classification results include conversion user or non-transformed user, and the conversion user refers to that next session occurs The user of order conversion, the non-transformed user refer to that the user of order conversion does not occur for next session.
Optionally, it is described that classification processing is carried out to the session characteristics using object classifiers trained in advance, obtain institute The classification results of visitor to be predicted are stated, including:
The session characteristics are analyzed using the object classifiers, the visitor to be predicted session next time is obtained and orders The probability of single-turn;
When the probability that order conversion occurs is more than or equal to probability threshold value, it is conversion user to determine the visitor to be predicted;
When the probability that order conversion occurs is less than probability threshold value, it is non-transformed user to determine the visitor to be predicted.
Optionally, classification processing is carried out to the session characteristics using object classifiers trained in advance described, obtained Before the classification results of the visitor to be predicted, the method further includes:
Obtain network session sample data;
Sample characteristics are extracted from the network session sample data;
The sample characteristics are trained using grader, obtain the object classifiers.
Optionally, the acquisition network session sample data, including:
It obtains and overall network session of the user before the conversion of order for the first time is totally converted in seclected time period as positive sample Notebook data obtains the overall network session of whole unconverted users in the seclected time period as negative sample data, Mei Geyong All sessions of the family in the seclected time period are a sample.
Optionally, it is described to extract sample characteristics from the network session sample data, including:
Session number, average page pageview, average duplicate removal page browsing amount, search in Website are obtained from each sample data Accounting, average search in Website number, average search in Website hits, session be averaged duration, minimum session duration, max-session when Length jumps out rate, paid search accounting, mobile terminal session accounting, average event number, average criterion page number, average mouse click Number, search source session accounting, recommends source session accounting, uses number of devices, operating system number average mouse rollovers number It measures, be averaged out page browsing amount, average page refresh rate, averagely the page exits rate and the average page loads duration.
Second aspect, the application provide a kind of visitor's sorter, including:
First acquisition module, for obtaining the historical session data of visitor to be predicted;
Characteristic extracting module, for from the historical session extracting data session characteristics;
Sort module for carrying out classification processing to the session characteristics using object classifiers trained in advance, obtains The classification results of the visitor to be predicted, wherein, classification results include conversion user or non-transformed user, and the conversion user is Refer to the user that order conversion occurs for next session, the non-transformed user refers to that the user of order conversion does not occur for next session.
Optionally, the sort module, including:
Submodule is analyzed, for analyzing the session characteristics using the object classifiers, obtains the visitor to be predicted The probability of order conversion occurs for session next time;
First determination sub-module, for when the probability that order conversion occurs is more than or equal to probability threshold value, determining described treat Prediction visitor is conversion user;
Second determination sub-module, for when the probability that order conversion occurs is less than probability threshold value, determining described to be predicted Visitor is non-transformed user.
Optionally, described device further includes:
Second acquisition module, for carrying out classification processing to the session characteristics using object classifiers trained in advance Before, network session sample data is obtained;
Sample characteristics extraction module, for extracting sample characteristics from the network session sample data;
Training module for being trained using grader to the sample characteristics, obtains the object classifiers.
Optionally, second acquisition module, including:
Positive sample acquisition submodule is totally converted user before the conversion of order for the first time for obtaining in seclected time period Overall network session is as positive sample data;
Negative sample acquisition submodule, for obtaining all overall network sessions of unconverted user in the seclected time period As negative sample data, all sessions of each user in the seclected time period are a sample.
Optionally, the sample characteristics extraction module, including:
Sample characteristics extracting sub-module, for obtaining session number from each sample data, the page pageview that is averaged, being averaged Duplicate removal page browsing amount, search in Website accounting, average search in Website number, average search in Website hits, session are averaged duration, most Small session duration, max-session duration jump out rate, paid search accounting, mobile terminal session accounting, average event number, average mesh Page number is marked, average mouse clicks, average mouse rollovers number, search source session accounting, recommends source session accounting, use Number of devices, operating system quantity are averaged out page browsing amount, average page refresh rate, the average page and exit rate and average The page loads duration.
By visitor's sorting technique that above-mentioned technical proposal, the application provide, the history meeting of visitor to be predicted is obtained first Data are talked about, and from the historical session extracting data session characteristics.To extract obtained session characteristics be input to it is trained in advance To object classifiers in analyzed, obtain the classification results of the visitor to be predicted.This method utilizes the historical session of user The rule of data judges whether the user is conversion user.Precision dispensing is carried out to filter out conversion user in next step, is had Effect reduction launches user group, so as to save dispensing cost.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this field Technical staff will become clear.Attached drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the application Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of visitor's sorting technique of the embodiment of the present application;
Fig. 2 shows a kind of flow charts that assorting process is carried out using object classifiers of the embodiment of the present application;
Fig. 3 shows the flow chart of the embodiment of the present application another kind visitor's sorting technique;
Fig. 4 shows a kind of block diagram of visitor's sorter of the embodiment of the present application;
Fig. 5 shows a kind of block diagram of sort module of the embodiment of the present application;
Fig. 6 shows the block diagram of the embodiment of the present application another kind visitor's sorter;
Fig. 7 shows a kind of block diagram of second acquisition module of the embodiment of the present application.
Specific embodiment
The exemplary embodiment of the application is more fully described below with reference to accompanying drawings.Although the application is shown in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the application without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
Fig. 1 is referred to, shows a kind of flow chart of visitor's sorting technique of the embodiment of the present application, this method is applied to calculate In machine terminal or server, the general of order conversion occurs for visitor's future that order conversion does not occur for analyzing certain website for this method Rate.
As shown in Figure 1, this method may comprise steps of:
S110 obtains the historical session data of visitor to be predicted.
When the webpage that user opens certain website starts to access the website, a session start is denoted as, in the following two kinds situation Under, conversation end;If user accesses the website again after conversation end, it is denoted as new session.
1) user closes browser, and (for example, 30min, which can be configured in script) does not have in preset time period There is return visit, terminate when closing browser for session pressure.
2) user is not turned off browser, and the new page, referred to as session timeout are not visited again in preset time period.
Visitor to be predicted refers to access target website and the current visitor that order conversion does not occur, and targeted website is current The website of monitoring.Historical session data are that visitor to be predicted accesses the webpage and produced in the seclected time period before current time Raw whole session datas.
S120, from historical session extracting data session characteristics.
Then, go out session characteristics from the historical session extracting data obtained, the type of the historical session feature of extraction exists It is determined during training grader.
The feature of extraction includes at least the feature of user access activity, for example, page browsing amount, page access duration, flat The equal page exits rate, is averaged out page browsing amount, average page refresh rate, average duplicate removal page browsing amount etc..
S130 carries out classification processing using object classifiers trained in advance to the session characteristics, obtain it is described treat it is pre- Survey the classification results of visitor.
Using object classifiers analyze that the session characteristics of visitor to be predicted obtain be the visitor to be predicted classification results, Classification includes conversion user and non-transformed user.
Conversion user refers to occur in session next time the user of order conversion, which may be potential customers;It is non- Conversion user refers to occur in session next time the user of order conversion.
The probability of order conversion occurs when what object classifiers obtained is user's session next time, the higher user of probability is User is converted, the smaller user of probability is non-transformed user.In a kind of possible realization method of the application, as shown in Fig. 2, S130 may comprise steps of:
S131 using object classifiers analysis session feature, obtains the visitor to be predicted session next time and order turn occurs The probability of change.
The corresponding session characteristics of visitor to be predicted are input to object classifiers, it is special that object classifiers analyze each session Sign obtains the probability that the visitor to be predicted carries out occurring during network session order conversion next time.
S132, when the probability is more than or equal to probability threshold value, it is conversion user to determine the visitor to be predicted.
Wherein, probability threshold value can train to obtain in training objective grader.
S133, when the probability is less than probability threshold value, it is non-transformed user to determine the visitor to be predicted.
In a kind of possible realization method of the application, object classifiers can be random forest grader or SVM (Support Vector Machine, support vector machines) grader, trains to obtain using the sample data being obtained ahead of time.
Visitor's sorting technique provided in this embodiment, obtains the historical session data of visitor to be predicted, and from the history meeting Talk about extracting data session characteristics.The session characteristics that extraction obtains are input in the object classifiers that training obtains in advance and are carried out Analysis, obtains the classification results of the visitor to be predicted.This method judges the user using the rule of the historical session data of user Whether it is conversion user.And then conversion user is filtered out, precision dispensing can be carried out, user group, section are launched in effectively reduction About launch cost.
Fig. 3 is referred to, shows the flow chart of the embodiment of the present application another kind visitor's sorting technique, the present embodiment will focus on The process of training objective grader is introduced, as shown in figure 3, further including following step on the basis of this method embodiment shown in Fig. 1 Suddenly:
S210 obtains network session sample data.
The historical session data for the user for having multiple session in seclected time period are chosen at as sample.
Positive sample is the overall network session that user is totally converted in seclected time period before the conversion of order for the first time;Negative sample Originally it is all whole sessions of unconverted user in seclected time period.Whole sessions of each user in seclected time period are one A sample.Preferably, positive sample and negative sample ratio are close to 1:1.
S220 extracts sample characteristics from network session sample data.
In a kind of possible realization method of the application, the sample characteristics of selection include:Session number, average page browsing Amount, average duplicate removal page browsing amount, search in Website accounting, average search in Website number, average search in Website hits, session are average Duration, minimum session duration, max-session duration, jump out rate, paid search accounting, mobile terminal session accounting, average event number, Average criterion page number, average mouse rollovers number, search source session accounting, recommends source session to account at average mouse clicks Than, using number of devices, operating system quantity, be averaged out page browsing amount, average page refresh rate, the average page and exit rate Duration is loaded with the average page.
Wherein, above-mentioned each feature concrete meaning is as follows:
Session number:Network session number of the user in seclected time period;
Average page pageview:The quantity of user's accession page in seclected time period, with user in seclected time period Session number ratio;
Average duplicate removal page browsing amount:The quantity of the duplicate removal page that user accesses in seclected time period, is being selected with user The ratio for the session number fixed time in section;
Search in Website accounting:User in seclected time period there are the session number of search in Website, with user in seclected time The ratio of session number in section;
Average search in Website number:The number of user's search in Website in seclected time period, with user in seclected time period Session number ratio;
Average search in Website hits:User's number that search in Website is clicked in seclected time period, with user selected The ratio of session number in period;
Session is averaged duration:User seclected time whole session total duration, with meeting of the user in seclected time period Talk about the ratio of number;
Minimum session duration:The minimum duration of user's session in seclected time period;
Max-session duration:The maximum duration of user's session in seclected time period;
Jump out rate:The session number that user jumps out in seclected time period, with session number of the user in seclected time period Ratio;
Paid search accounting:The session number in user paid search source in seclected time period, with user in seclected time The ratio of session number in section;
Mobile terminal session accounting:User accesses the session number of the website in seclected time period by mobile terminal, with user The ratio of total session number in seclected time period;
Average event number:The event number that user generates in seclected time period, with session of the user in seclected time period Several ratio;
Average criterion page number:The quantity of user's access target page in seclected time period, with user in seclected time The ratio of session number in section;
Average mouse clicks:User clicks the number of mouse in seclected time period, with user in seclected time period Session number ratio;
Average mouse rollovers number:The number of user's roll mouse in seclected time period, with user in seclected time period Session number ratio;
Search for source session accounting:User searches for the session number in source in seclected time period, with user in seclected time The ratio of session number in section;
Recommend source session accounting:User accesses the session number for recommending source in seclected time period, with user selected Session number in period;
Use number of devices:The quantity of user's device therefor in seclected time period;
Operating system quantity:The quantity of user's operating system used in seclected time period;
It is averaged out page browsing amount:All page browsing amounts for the duplicate removal page that user accesses in seclected time period Sum, the ratio of the page number accessed in seclected time period with user;
Average page refresh rate:The quantity of user's refresh page in seclected time period, with user in seclected time period The ratio of the page number of access;
The average page exits rate:User exits the quantity of the page in seclected time period, with user in seclected time period The ratio of the page number of access;
Average page loading duration:The loading duration for whole pages that user accesses in seclected time period, exists with user The ratio of the page number accessed in seclected time period.
S230 is trained the sample characteristics using grader, obtains object classifiers.
The process of training grader is opposite with using the process of trained class test;During training pattern, from training sample Extract characteristic in notebook data, and by the tag along sort of the characteristic of extraction and each sample (in the application, tag representation The sample belongs to conversion user or non-transformed user) grader to be trained is input to, grader to be trained is according to each sample Corresponding characteristic and tag along sort determine the optimized parameter of the grader.
In a kind of possible realization method of the application, random forest grader is selected, random forest grader has such as Lower feature:Random forest can be that each sample output probability scores, the foundation as classification;Random forest is that a kind of promoted is calculated Method is made of, good classification effect multiple graders;Random forest is insensitive to Problems of Multiple Synteny.
The random forest grader parameter that final training obtains:N_estimators=30 (the numbers set in random forest Amount), max_depth=6 (depth capacity set in random forest), other parameters use default value.
It is then possible to the effect of grader obtained using sample (test sample) the test training in other a period of time The characteristic of each test sample, is inputed to the grader trained and obtained by fruit, and grader exports the pre- of each test sample Mark label, then, the corresponding prediction label of more same test sample and true tag.
The effect of generally use accuracy rate and recall rate characterization grader, wherein, accuracy rate reflects grader to entire The decision-making ability of sample, that is, can be by positive judgement for just, negative judgement is negative ability;Recall rate reflects grader by just The positive example really judged accounts for the proportion of total positive example.
Accuracy rate of the grader that training obtains in test sample is 98.68%, recall rate 100%.
After having trained grader, the sequence of feature being affected to order conversion can be exported automatically:The average page moves back Extracting rate is averaged out page browsing amount, the page refresh rate that is averaged, average page pageview, average duplicate removal page browsing amount, is averaged Event number, average criterion page number jump out rate, the page loading duration that is averaged, max-session duration, minimum session duration, are averaged Mouse clicks, search in Website accounting, session number.
In a kind of possible realization method of the application, when the grader obtained using this training predicts new data, The feature for being affected to order conversion in new data can be only extracted to classify.
In the alternatively possible realization method of the application, last being converted on order for training output can be utilized to influence Larger feature carries out classifier training again, advanced optimizes the parameter of grader.
Visitor's sorting technique provided in this embodiment, by being trained to obtain target classification to a large amount of sample data Then device, classifies to visitor to be predicted using object classifiers, obtains classification results, to filter out conversion in next step User carries out precision dispensing, and user group is launched in effectively reduction, so as to save dispensing cost.During training objective grader, adopt It is sample with the historical session data for the user for having a plurality of session in seclected time period, and positive sample and negative sample in sample data Ratio close to 1:1, the classifying quality of object classifiers that training obtains is preferable.
Corresponding to above-mentioned visitor's sorting technique embodiment, present invention also provides visitor's sorter embodiments.
Fig. 4 is referred to, shows a kind of block diagram of visitor's sorter of the embodiment of the present application, which is applied to computer In terminal or server, as shown in figure 4, the device includes:First acquisition module 110, characteristic extracting module 120 and sort module 130。
First acquisition module 110, for obtaining the historical session data of visitor to be predicted.
Visitor to be predicted refers to access target website and the current visitor that order conversion does not occur, and targeted website is current The website of monitoring.Historical session data are that visitor to be predicted accesses the webpage and produced in the seclected time period before current time Raw whole session datas.
Characteristic extracting module 120, for from the historical session extracting data session characteristics.
The feature of extraction includes at least the feature of user access activity, for example, page browsing amount, page access duration, flat The equal page exits rate, is averaged out page browsing amount, average page refresh rate, average duplicate removal page browsing amount etc..
Sort module 130 for carrying out classification processing to the session characteristics using object classifiers trained in advance, obtains The classification results of the visitor to be predicted.
Wherein, classification results conversion user or non-transformed user, conversion user refer to that order conversion occurs for next session User, non-transformed user refer to that the user of order conversion does not occur for next session.
The probability of order conversion occurs when what object classifiers obtained is user's session next time, the higher user of probability is User is converted, the smaller user of probability is non-transformed user.In a kind of possible realization method of the application, as shown in figure 5, point Generic module 130 includes analysis submodule 131, the first determination sub-module 132 and the second determination sub-module 133.
Submodule 131 is analyzed, for analyzing the session characteristics using the object classifiers, obtains the visit to be predicted The probability of order conversion occurs for objective session next time.
The corresponding session characteristics of visitor to be predicted are input to object classifiers, it is special that object classifiers analyze each session Sign obtains the probability that the visitor to be predicted carries out occurring during network session order conversion next time.
First determination sub-module 132, for when the probability that order conversion occurs is more than or equal to probability threshold value, determining described Visitor to be predicted is conversion user.
Probability threshold value can train to obtain in training objective grader.
Second determination sub-module 133, for when the probability that order conversion occurs is less than probability threshold value, determine it is described treat it is pre- It is non-transformed user to survey visitor.
Visitor's sorter provided in this embodiment is obtained the historical session of visitor to be predicted by the first acquisition module 110 Data, and by characteristic extracting module from the historical session extracting data session characteristics.Finally extraction is obtained by sort module Session characteristics be input in the obtained object classifiers of training in advance and analyzed, obtain the classification knot of the visitor to be predicted Fruit.The device judges whether the user is conversion user using the rule of the historical session data of user.And then filter out conversion User can carry out precision dispensing, and effectively reduction launches user group, saves and launch cost.
Fig. 6 is referred to, shows the block diagram of the embodiment of the present application another kind visitor's sorter, the device is shown in Fig. 4 It is further included on the basis of embodiment:Second acquisition module 210, sample characteristics extraction module 220 and training module 230.
Second acquisition module 210, for carrying out classification processing to session characteristics using object classifiers trained in advance Before, network session sample data is obtained.
The historical session data for the user for having multiple session in seclected time period are chosen at as sample.
In a kind of possible realization method of the application, as shown in fig. 7, the second acquisition module 210 can include:Positive sample Acquisition submodule 211 and negative sample acquisition submodule 212.
Positive sample acquisition submodule 211 is totally converted user and converts it in order for the first time for obtaining in seclected time period Preceding overall network session is as positive sample data;
Negative sample acquisition submodule 212, for obtaining the overall network of whole unconverted users in the seclected time period As negative sample data, all sessions of each user in the seclected time period are a sample for session.
Whole sessions of each user in seclected time period are a sample.Preferably, positive sample and negative sample ratio It is close to 1:1.
Sample characteristics extraction module 220, for extracting sample characteristics from network session sample data.
The sample characteristics of selection include:Session number, average page pageview, average duplicate removal page browsing amount, search in Website Accounting, average search in Website number, average search in Website hits, session be averaged duration, minimum session duration, max-session when Length jumps out rate, paid search accounting, mobile terminal session accounting, average event number, average criterion page number, average mouse click Number, search source session accounting, recommends source session accounting, uses number of devices, operating system number average mouse rollovers number It measures, be averaged out page browsing amount, average page refresh rate, averagely the page exits rate and the average page loads duration.
Training module 230 for being trained using grader to the sample characteristics, obtains the object classifiers.
The process of training grader is opposite with using the process of trained class test;During training pattern, from training sample Extract characteristic in notebook data, and by the tag along sort of the characteristic of extraction and each sample (in the application, tag representation The sample belongs to conversion user or non-transformed user) grader to be trained is input to, grader to be trained is according to each sample Corresponding characteristic and tag along sort determine the optimized parameter of the grader.
It is then possible to the effect of grader obtained using sample (test sample) the test training in other a period of time The characteristic of each test sample, is inputed to the grader trained and obtained by fruit, and grader exports the pre- of each test sample Mark label, then, the corresponding prediction label of more same test sample and true tag.
The effect of generally use accuracy rate and recall rate characterization grader, wherein, accuracy rate reflects grader to entire The decision-making ability of sample, can be by positive judgement for just, negative judgement is negative ability;It is correct that recall rate reflects grader The positive example of judgement accounts for the proportion of total positive example.
Visitor's sorter provided in this embodiment, by being trained to obtain target classification to a large amount of sample data Then device, classifies to visitor to be predicted using object classifiers, obtains classification results, to filter out conversion in next step User carries out precision dispensing, and user group is launched in effectively reduction, so as to save dispensing cost.During training objective grader, adopt It is sample with the historical session data for the user for having a plurality of session in seclected time period, and positive sample and negative sample in sample data Ratio close to 1:1, the classifying quality of object classifiers that training obtains is preferable.
Visitor's sorter includes processor and memory, above-mentioned first acquisition module 110, characteristic extracting module 120th, sort module 130, the second acquisition module 210, sample characteristics extraction module 220 and training module 230 etc. are used as program Unit stores in memory, performs above procedure unit stored in memory by processor to realize corresponding function.
Comprising kernel in processor, gone in memory to transfer corresponding program unit by kernel.Kernel can set one Or more, predicted by adjusting kernel parameter whether occur order conversion visitor's future to be predicted.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes at least one deposit Store up chip.
Visitor's sorter is obtained the historical session data of visitor to be predicted by the first acquisition module 110, and by feature Extraction module is from the historical session extracting data session characteristics.Finally inputted by sort module by obtained session characteristics are extracted It is analyzed in the object classifiers obtained to advance training, obtains the classification results of the visitor to be predicted.The device utilizes use The rule of the historical session data at family judges whether the user is conversion user.And then conversion user is filtered out, essence can be carried out Standardization is launched, and effectively reduction launches user group, saves and launch cost.
Present invention also provides a kind of computer program products, first when being performed on data processing equipment, being adapted for carrying out The program code of beginningization there are as below methods step:
Obtain the historical session data of visitor to be predicted;
From the historical session extracting data session characteristics;
Classification processing is carried out to the session characteristics using object classifiers trained in advance, obtains the visitor to be predicted Classification results, wherein, classification results conversion user or non-transformed user, the conversion user refers to that order occurs for next session The user of conversion, the non-transformed user refer to that the user of order conversion does not occur for next session.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware can be used in the application Apply the form of example.Moreover, the computer for wherein including computer usable program code in one or more can be used in the application The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided The processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices is generated for real The device of function specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction generation being stored in the computer-readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps are performed on calculation machine or other programmable devices to generate computer implemented processing, so as in computer or The instruction offer performed on other programmable devices is used to implement in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM read-only memory (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, available for storing the information that can be accessed by a computing device.It defines, calculates according to herein Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It these are only embodiments herein, be not limited to the application.To those skilled in the art, The application can have various modifications and variations.All any modifications made within spirit herein and principle, equivalent replacement, Improve etc., it should be included within the scope of claims hereof.

Claims (10)

1. a kind of visitor's sorting technique, which is characterized in that including:
Obtain the historical session data of visitor to be predicted;
From the historical session extracting data session characteristics;
Classification processing is carried out to the session characteristics using object classifiers trained in advance, obtains point of the visitor to be predicted For class as a result, wherein, classification results include conversion user or non-transformed user, the conversion user refers to that order occurs for next session The user of conversion, the non-transformed user refer to that the user of order conversion does not occur for next session.
2. according to the method described in claim 1, it is characterized in that, described utilize object classifiers trained in advance to the meeting Words feature carries out classification processing, obtains the classification results of the visitor to be predicted, including:
The session characteristics are analyzed using the object classifiers, the visitor to be predicted session next time is obtained and order turn occurs The probability of change;
When the probability that order conversion occurs is more than or equal to probability threshold value, it is conversion user to determine the visitor to be predicted;
When the probability that order conversion occurs is less than probability threshold value, it is non-transformed user to determine the visitor to be predicted.
3. according to the method described in claim 1, it is characterized in that, object classifiers trained in advance are utilized to described described Session characteristics carry out classification processing, and before obtaining the classification results of the visitor to be predicted, the method further includes:
Obtain network session sample data;
Sample characteristics are extracted from the network session sample data;
The sample characteristics are trained using grader, obtain the object classifiers.
4. according to the method described in claim 3, it is characterized in that, it is described acquisition network session sample data, including:
It obtains and overall network session of the user before the conversion of order for the first time is totally converted in seclected time period as positive sample number According to obtaining in the seclected time period overall network session of all unconverted users as negative sample data, each user exists All sessions in the seclected time period are a sample.
5. according to the method described in claim 3, it is characterized in that, described extract sample from the network session sample data Feature, including:
Session number is obtained from each sample data, averagely page pageview, average duplicate removal page browsing amount, search in Website accounts for Than, average search in Website number, average search in Website hits, session be averaged duration, minimum session duration, max-session duration, Jump out rate, paid search accounting, mobile terminal session accounting, average event number, average criterion page number, average mouse clicks, Average mouse rollovers number, source session accounting of recommending, uses number of devices, operating system quantity, flat at search source session accounting All page browsing amounts, average page refresh rate, the average page exit rate and average page loading duration.
6. a kind of visitor's sorter, which is characterized in that including:
First acquisition module, for obtaining the historical session data of visitor to be predicted;
Characteristic extracting module, for from the historical session extracting data session characteristics;
Sort module for carrying out classification processing to the session characteristics using object classifiers trained in advance, obtains described The classification results of visitor to be predicted, wherein, classification results include conversion user or non-transformed user, under the conversion user refers to The user of order conversion occurs for secondary session, and the non-transformed user refers to that the user of order conversion does not occur for next session.
7. device according to claim 6, which is characterized in that the sort module, including:
Submodule is analyzed, for analyzing the session characteristics using the object classifiers, is obtained one under the visitor to be predicted The probability of order conversion occurs for secondary session;
First determination sub-module, for when the probability that order conversion occurs is more than or equal to probability threshold value, determining described to be predicted Visitor is conversion user;
Second determination sub-module, for when the probability that order conversion occurs is less than probability threshold value, determining the visitor to be predicted It is non-transformed user.
8. device according to claim 6, which is characterized in that described device further includes:
Second acquisition module, for using object classifiers trained in advance the session characteristics to be carried out with classification processing Before, obtain network session sample data;
Sample characteristics extraction module, for extracting sample characteristics from the network session sample data;
Training module for being trained using grader to the sample characteristics, obtains the object classifiers.
9. device according to claim 8, which is characterized in that second acquisition module, including:
Positive sample acquisition submodule, for obtaining the whole that user is totally converted in seclected time period before the conversion of order for the first time Network session is as positive sample data;
Negative sample acquisition submodule, for obtaining all overall network session conducts of unconverted user in the seclected time period Negative sample data, all sessions of each user in the seclected time period are a sample.
10. device according to claim 8, which is characterized in that the sample characteristics extraction module, including:
Sample characteristics extracting sub-module, for obtaining session number, average page pageview, average duplicate removal from each sample data Page browsing amount, search in Website accounting, average search in Website number, average search in Website hits, session be averaged duration, minimum meeting Words duration, max-session duration jump out rate, paid search accounting, mobile terminal session accounting, average event number, average criterion page Face number, average mouse rollovers number, search source session accounting, recommends source session accounting, uses equipment average mouse clicks Quantity, operating system quantity are averaged out page browsing amount, average page refresh rate, the average page and exit rate and the average page Load duration.
CN201611208440.1A 2016-12-23 2016-12-23 A kind of visitor's sorting technique and device Pending CN108241984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611208440.1A CN108241984A (en) 2016-12-23 2016-12-23 A kind of visitor's sorting technique and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611208440.1A CN108241984A (en) 2016-12-23 2016-12-23 A kind of visitor's sorting technique and device

Publications (1)

Publication Number Publication Date
CN108241984A true CN108241984A (en) 2018-07-03

Family

ID=62704197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611208440.1A Pending CN108241984A (en) 2016-12-23 2016-12-23 A kind of visitor's sorting technique and device

Country Status (1)

Country Link
CN (1) CN108241984A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427234A (en) * 2019-06-27 2019-11-08 阿里巴巴集团控股有限公司 The methods of exhibiting and device of the page
CN110942358A (en) * 2018-09-21 2020-03-31 北京国双科技有限公司 Information interaction method, device, equipment and medium
CN111061815A (en) * 2019-12-13 2020-04-24 携程计算机技术(上海)有限公司 Conversation data classification method
CN111080355A (en) * 2019-12-10 2020-04-28 支付宝(杭州)信息技术有限公司 User set display method and device and electronic equipment
CN112053192A (en) * 2020-09-02 2020-12-08 北京达佳互联信息技术有限公司 User quality determination method, device, server, terminal, medium and product
CN113177176A (en) * 2021-05-21 2021-07-27 脸萌有限公司 Feature construction method, content display method and related device
CN113269577A (en) * 2020-02-17 2021-08-17 北京达佳互联信息技术有限公司 Data acquisition method, device, server and storage medium
CN113297461A (en) * 2020-02-24 2021-08-24 北京达佳互联信息技术有限公司 Target user identification method, target user group identification method and related product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8966036B1 (en) * 2010-11-24 2015-02-24 Google Inc. Method and system for website user account management based on event transition matrixes
CN105528374A (en) * 2014-10-21 2016-04-27 苏宁云商集团股份有限公司 A commodity recommendation method in electronic commerce and a system using the same
CN106204063A (en) * 2016-06-30 2016-12-07 北京奇艺世纪科技有限公司 A kind of paying customer's method for digging and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8966036B1 (en) * 2010-11-24 2015-02-24 Google Inc. Method and system for website user account management based on event transition matrixes
CN105528374A (en) * 2014-10-21 2016-04-27 苏宁云商集团股份有限公司 A commodity recommendation method in electronic commerce and a system using the same
CN106204063A (en) * 2016-06-30 2016-12-07 北京奇艺世纪科技有限公司 A kind of paying customer's method for digging and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942358A (en) * 2018-09-21 2020-03-31 北京国双科技有限公司 Information interaction method, device, equipment and medium
CN110427234A (en) * 2019-06-27 2019-11-08 阿里巴巴集团控股有限公司 The methods of exhibiting and device of the page
CN111080355B (en) * 2019-12-10 2022-12-20 蚂蚁胜信(上海)信息技术有限公司 User set display method and device and electronic equipment
CN111080355A (en) * 2019-12-10 2020-04-28 支付宝(杭州)信息技术有限公司 User set display method and device and electronic equipment
CN111061815A (en) * 2019-12-13 2020-04-24 携程计算机技术(上海)有限公司 Conversation data classification method
CN111061815B (en) * 2019-12-13 2023-04-25 携程计算机技术(上海)有限公司 Session data classification method
CN113269577A (en) * 2020-02-17 2021-08-17 北京达佳互联信息技术有限公司 Data acquisition method, device, server and storage medium
CN113269577B (en) * 2020-02-17 2023-10-13 北京达佳互联信息技术有限公司 Data acquisition method, device, server and storage medium
CN113297461A (en) * 2020-02-24 2021-08-24 北京达佳互联信息技术有限公司 Target user identification method, target user group identification method and related product
CN113297461B (en) * 2020-02-24 2023-12-12 北京达佳互联信息技术有限公司 Target user identification method, target user group identification method and device
CN112053192A (en) * 2020-09-02 2020-12-08 北京达佳互联信息技术有限公司 User quality determination method, device, server, terminal, medium and product
CN112053192B (en) * 2020-09-02 2024-05-14 北京达佳互联信息技术有限公司 User quality determining method, device, server, terminal, medium and product
CN113177176A (en) * 2021-05-21 2021-07-27 脸萌有限公司 Feature construction method, content display method and related device
WO2022245280A1 (en) * 2021-05-21 2022-11-24 脸萌有限公司 Feature construction method, content display method, and related apparatus

Similar Documents

Publication Publication Date Title
CN108241984A (en) A kind of visitor's sorting technique and device
CN110837931B (en) Customer churn prediction method, device and storage medium
CN105225135B (en) Potential customer identification method and device
CN110163647A (en) A kind of data processing method and device
CN111078880B (en) Sub-application risk identification method and device
US11481707B2 (en) Risk prediction system and operation method thereof
CN109583966A (en) A kind of high value customer recognition methods, system, equipment and storage medium
CN111951044A (en) Bank terminal interaction method and system
CN110766481A (en) Client data processing method and device, electronic equipment and computer readable medium
CN111179051A (en) Financial target customer determination method and device and electronic equipment
CN112884515A (en) User loss prediction method and device and computer storage medium
KR20220071875A (en) Device and method for underwriting a person who will subscribe to insurance based on artificial neural network
CN111160959A (en) User click conversion estimation method and device
CN110689164A (en) Prediction method and system for user reduction behavior
US20220129754A1 (en) Utilizing machine learning to perform a merger and optimization operation
Klinkenberg Meta-Learning, Model Selection, and Example Selection in Machine Learning Domains with Concept Drift.
CN111309994A (en) User matching method and device, electronic equipment and readable storage medium
Prehanto et al. Use of Naïve Bayes classifier algorithm to detect customers’ interests in buying internet token
CN111160647A (en) Money laundering behavior prediction method and device
CN108154377A (en) Advertisement cheating Forecasting Methodology and device
Fitrianto et al. Development of direct marketing strategy for banking industry: the use of a Chi-squared Automatic Interaction Detector (CHAID) in deposit subscription classification.
CN114693325A (en) User public praise intelligent guarantee method and device based on neural network
CN115392992A (en) Commodity recommendation method, terminal device and computer-readable storage medium
CN112686448A (en) Loss early warning method and system based on attribute data
CN112132690A (en) Foreign exchange product information pushing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100080 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20180703

RJ01 Rejection of invention patent application after publication