CN109087145A - Target group's method for digging, device, server and readable storage medium storing program for executing - Google Patents

Target group's method for digging, device, server and readable storage medium storing program for executing Download PDF

Info

Publication number
CN109087145A
CN109087145A CN201810917001.0A CN201810917001A CN109087145A CN 109087145 A CN109087145 A CN 109087145A CN 201810917001 A CN201810917001 A CN 201810917001A CN 109087145 A CN109087145 A CN 109087145A
Authority
CN
China
Prior art keywords
user
target group
sample
collection
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810917001.0A
Other languages
Chinese (zh)
Inventor
王思萌
王盛
顾进杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810917001.0A priority Critical patent/CN109087145A/en
Publication of CN109087145A publication Critical patent/CN109087145A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/021Services related to particular areas, e.g. point of interest [POI] services, venue services or geofences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This specification embodiment provides a kind of target group's method for digging, using the mode classification of two-stage (Weak Classifier and strong classifier), can accomplish more accurately to excavate for a large number of users data;Moreover, filtering out Primary objectives user collection by concentrating from user to be identified, the range of identification can be reduced, in addition, the target group's feature vector determined according to the intercommunity of target group can accurately describe the feature of target group, so that recognition result is more accurate.

Description

Target group's method for digging, device, server and readable storage medium storing program for executing
Technical field
This specification embodiment be related to machine learning techniques field more particularly to a kind of target group's method for digging, device, Server and readable storage medium storing program for executing.
Background technique
Universal with internet and intelligent terminal, people are increasingly dependent on the intelligent terminals such as mobile phone and live and work Relevant issues, for example, people more generally use all kinds of special APP or website carries out on the net compared to traditional financial, shopping etc. Operation or transaction.In order to provide a user more targeted service, all kinds of APP or website need to carry out specific target person Group excavates, and so as to promotion business, and improves service precision.
Summary of the invention
This specification embodiment provides and a kind of target group's method for digging, device, server and readable storage medium storing program for executing.
In a first aspect, this specification embodiment provides a kind of target group's method for digging, comprising:
Collect for user to be identified, according to user's natural quality information and user's social property information, filters out primary mesh Mark user's collection;
Collect for the Primary objectives user, extracts target group's feature vector;
It for target group's feature vector, is identified, is obtained based on first order Weak Classifier trained in advance Grade target user's collection;
Collect for the intermediate target user, is identified based on second level strong classifier trained in advance, determine to belong to Collect in the ultimate aim user of target group.
Second aspect, this specification embodiment provide a kind of target group's method for digging, comprising:
Collect for user to be identified, according to user's natural quality information and user's social property information, filters out primary mesh Mark user's collection;
Collect for the Primary objectives user, extracts target group's feature vector;
For target group's feature vector, is identified based on strong classifier trained in advance, determine to belong to mesh The end user collection of mark crowd.
The third aspect, this specification embodiment provide a kind of target group's excavating gear, comprising:
Primary objectives user collects screening unit, for collecting for user to be identified, according to user's natural quality information and use Family social property information filters out Primary objectives user collection;
Target group's characteristic vector pickup unit extracts target group spy for collecting for the Primary objectives user Levy vector;
First order recognition unit, for being directed to target group's feature vector, based on trained in advance weak point of the first order Class device is identified, intermediate target user's collection is obtained;
Second level recognition unit, for being classified by force based on the second level trained in advance for intermediate target user's collection Device is identified, determines the ultimate aim user collection for belonging to target group.
Fourth aspect, this specification embodiment provide a kind of target group's excavating gear, comprising:
Primary objectives user collects screening unit, for collecting for user to be identified, according to user's natural quality information and use Family social property information filters out Primary objectives user collection;
Target group's characteristic vector pickup unit extracts target group spy for collecting for the Primary objectives user Levy vector;
Recognition unit, for being identified based on strong classifier trained in advance for target group's feature vector, Determine the end user collection for belonging to target group.
5th aspect, this specification embodiment provide a kind of server, including memory, processor and are stored in memory Computer program that is upper and can running on a processor, the processor realize side described in any of the above-described when executing described program The step of method.
6th aspect, this specification embodiment provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence, when which is executed by processor the step of realization any of the above-described the method.
This specification embodiment has the beneficial effect that:
As it can be seen that target group's method for digging that this specification embodiment provides, using two-stage (Weak Classifier and strong classification Device) mode classification, a large number of users data can be accomplished more accurately to excavate;Moreover, being sieved by being concentrated from user to be identified Primary objectives user collection is selected, the range of identification can be reduced, in addition, the target group determined according to the intercommunity of target group Feature vector can accurately describe the feature of target group, so that recognition result is more accurate.
In addition, proposing the mode of two-level model training, for example, being directed to the model training of student enrollment, use first Weak Classifier (such as naive Bayesian) is classified, and is then used the negative sense result in classification results as negative sample, is carried out Second wheel classifies to the positive result in first round classification results compared with the training of strong classifier (such as support vector machines), Final classification result is that positive judgement is student enrollment crowd.
For the Feature Selection mode of university student crowd's candidate user and an innovation point.For example, using campus Wifi feature links number as feature, and (wherein campus wifi connects during the new term begins from the user for having been marked as university student Connect the wifi more situation of number), user add student enrollment's entry communication record quantity as feature, the position user LBS Change the shipping address whether added as feature, user contain around colleges and universities or colleges and universities as feature etc., above several features Selection, the characteristics of student enrollment more can comprehensively be depicted is accurately to train model and be recognized accurately university Raw basis.
Detailed description of the invention
Fig. 1 is this specification embodiment target group method for digging application scenarios schematic diagram;
Fig. 2 is target group's method for digging flow chart that this specification embodiment first aspect provides;
Model training schematic diagram in target group's method for digging that Fig. 3 provides for this specification embodiment first aspect;
Model identifies schematic diagram in target group's method for digging that Fig. 4 provides for this specification embodiment first aspect;
Fig. 5 is target group's method for digging flow chart that this specification embodiment second aspect provides;
Model training schematic diagram in target group's method for digging that Fig. 6 provides for this specification embodiment second aspect;
Model identifies schematic diagram in target group's method for digging that Fig. 7 provides for this specification embodiment second aspect;
Fig. 8 is the structural schematic diagram for target group's excavating gear that this specification embodiment third aspect provides;
Fig. 9 is the structural schematic diagram for target group's excavating gear that this specification embodiment fourth aspect provides;
Figure 10 is the structural representation for the server excavated for target group that the 5th aspect of this specification embodiment provides Figure.
Specific embodiment
In order to better understand the above technical scheme, below by attached drawing and specific embodiment to this specification embodiment Technical solution be described in detail, it should be understood that the specific features in this specification embodiment and embodiment are to this explanation The detailed description of book embodiment technical solution, rather than the restriction to this specification technical solution, in the absence of conflict, Technical characteristic in this specification embodiment and embodiment can be combined with each other.
It is this specification embodiment target group method for digging application scenarios schematic diagram referring to Fig. 1.Terminal 10 is user End, server-side 20 is website or the background server end of APP.Server-side 20 is collected into the correlation of multiple users from multiple terminals 10 Data, for example, getting the natural quality information (such as age, gender etc.) of user from user's registration, operation, consumption information Or social property information (zone of action, good friend etc.).Server-side 20 is based on a large amount of user information got, to specific objective Crowd excavates.
In a first aspect, this specification embodiment provides a kind of target group's method for digging, referring to FIG. 2, including S201- S204。
S201: collecting for user to be identified, according to user's natural quality information and user's social property information, filters out just Grade target user's collection.
Such as Fig. 1 scene, server-side gets user data from multiple terminals, constitutes user's collection to be identified.
In order to expeditiously be excavated to target group, can according to user's natural quality information and social property information, From a large amount of user set data to be identified, Primary objectives user collection is filtered out, to reduce data area.
For example, can determine screening conditions according to the actual situation if target group to be excavated is student enrollment Are as follows: the age belongs to campus 17-27 years old, frequent zone of action, tentatively meets to filter out from a large number of users data The Primary objectives user of student enrollment collects.
It is appreciated that it is above-mentioned using the age as user's natural quality information, zone of action as social property information only It is an example, actual conditions can be without being limited thereto.Similarly, it is also only one that target group to be excavated, which is student enrollment, Example, the excavation for other target groups's (such as certain company personnel, special workers etc.), this specification embodiment It is equally applicable.
S202: collect for Primary objectives user, extract target group's feature vector.
It is appreciated that target group has certain intercommunity, therefore target group's feature can be determined based on its intercommunity Vector.
Still by taking student enrollment as an example, determining feature can be respectively: local city, online remaining sum, age, address list Good friend, shipping address, campus wifi link number and LBS (location based service is based on location-based service) feature Deng.
Local city, online remaining sum, age are known as the static nature of user.For example, local city and institute of colleges and universities It is whether consistent in city, it can be used as a Rule of judgment.Whether online remaining sum is in a certain range.Whether the age is in a certain range (17-27 years old).
Address list good friend, shipping address, campus wifi link number and LBS feature are properly termed as the behavioral characteristics of user. For example, user's shipping address belongs to the ground in campus if it is the addresses such as school dormitory address or school experiment room, teaching and research room Location.It has been the quantity of student enrollment in the address list good friend of selection user's addition as ginseng for address list good friend's feature It examines, particularly, for entrant, the behavior for adding good friend can be further defined to (start to school season) from August to September bimestrial The number of addition.Similarly, user is also used as a characteristic value in the number for connecting campus wifi the 8-9 month.Wherein, campus wifi Decision procedure can be used the wifi title that the student enrollment of mark often connects, be judged as the wifi in campus.User LBS feature, then be choose the nearest colleges and universities of user distance distance as judgement.Before college entrance examination, during the Spring Festival, during summer vacation and September is started to school the time, and the distance of user distance colleges and universities is detected.
S203: being directed to target group's feature vector, identified based on first order Weak Classifier trained in advance, obtains Grade target user's collection.
Collect the target group's feature vector extracted for Primary objectives user, is input to trained in advance weak point of the first order Class device is identified, can further determine that out the intermediate target user collection for belonging to target group.
S204: it for intermediate target user's collection, is identified based on second level strong classifier trained in advance, determines to belong to Collect in the ultimate aim user of target group.
For the intermediate target user collection that first order Weak Classifier identifies, it is strong to be based further on the second level trained in advance Classifier is identified, determines the ultimate aim user collection for belonging to target group.
It is this that target group is known otherwise using two-level classifier, it is ensured that the accuracy of identification.
Wherein, the algorithm of first order Weak Classifier application includes but is not limited to naive Bayesian, logistic regression, in boost Any one;The algorithm of second level strong classifier application includes but is not limited to support vector machines, deep neural network, gradient promotion One in decision tree, xgboost.Hereinafter, the training to first order Weak Classifier and second level strong classifier in conjunction with Fig. 3 Journey is illustrated.
Referring to Fig. 3, model training signal in the target group's method for digging provided for this specification embodiment first aspect Figure.
Training process are as follows:
(1) positive sample of target group is obtained, and obtains non-mark sample.
It is positive sample to labeled User label (labeled).According to user's natural quality information and society Attribute information determines non-mark (unlabeled) sample.For example, still by taking student enrollment identifies as an example, target group be away from User relatively close from school and that the age is in admission range, thus choose the age be in 17-27 years old and apart from nearest school 5km with It is interior to be used as non-mark sample.Under normal circumstances, non-mark sample size is significantly more than the quantity for having determined that positive sample, for example, just Sample, non-mark sample ratio be 1:10, the user of 300w or so is certified positive sample, and in addition 3000w or so User be unlabeled user for certification.
(2) for positive sample and non-mark sample, target group's feature vector is extracted.
For positive sample and non-mark sample, the feature of description target group's intercommunity is extracted as target group's feature Vector.For example, extracting static nature, (age of user, user local city, user are remaining online for student enrollment It is one or more in volume) and behavioral characteristics (network link information, special time period in special time period for specific region It is interior user good friend, station address, one or more in LBS information) constitute target group's feature vector.
(3) based on the target group's feature vector gone out according to positive sample and non-mark sample extraction, weak point of the first order of training Class device.
The algorithm of first order Weak Classifier application includes but is not limited to naive Bayesian, logistic regression, any in boost ?.The first order Weak Classifier can tentatively judge target group for target group's feature vector.
(4) classified using first order Weak Classifier to non-mark sample, determine the negative sample of target group.
Classified using first order Weak Classifier to the unlabeled sample mentioned in step (1), classification results 0 The label negative sample that is, this part of negative sample will be used to train second level strong classifier together with positive sample.
(5) positive sample and negative sample, training second level strong classifier are based on.
Based on the negative sample that mark positive sample and first order Weak Classifier determine, training second level strong classifier.Second Grade strong classifier application algorithm include but is not limited to support vector machines, deep neural network, gradient promoted decision tree, One in xgboost.
On the basis of first order Weak Classifier and second level strong classifier are completed in training, it can collect to user to be identified Carry out the identification of target group.
Referring to fig. 4, model identification signal in the target group's method for digging provided for this specification embodiment first aspect Figure.
Model identification process include:
(1) collect for user to be identified, filter out Primary objectives user collection.
For example, being directed to student enrollment, according to age of user and zone of action, Primary objectives user collection is filtered out.
(2) collect for Primary objectives user, extract target group's feature vector.
For example, being directed to student enrollment, determining target group's feature vector includes static nature (age of user, user It is one or more in local city, the online remaining sum of user), behavioral characteristics (network of specific region is directed in special time period It is user good friend in link information, special time period, station address, one or more in LBS information), therefore, for primary mesh User's collection is marked, extracts above-mentioned several features as target group's feature vector.
(3) it is directed to target group's feature vector, is identified based on first order Weak Classifier, obtains intermediate target user Collection.
It by target group's feature vector, is input to first order Weak Classifier and is identified, recognition result is 1 (this grade classification Determination belongs to target group) or 0 (this grade classification determination is not belonging to target group), using recognition result be 1 recognition result as Intermediate target user's collection.
(4) it for intermediate target user's collection, is identified based on second level strong classifier, determines to belong to target group's Ultimate aim user collection.
Intermediate target user collection is based further on second level strong classifier to identify, recognition result is 1 (this grade classification Determination belongs to target group) or 0 (this grade classification determination is not belonging to target group), using recognition result be 1 recognition result as Ultimate aim user collection.
As it can be seen that target group's method for digging that this specification embodiment provides, using two-stage (Weak Classifier and strong classification Device) mode classification, a large number of users data can be accomplished more accurately to excavate;Moreover, being sieved by being concentrated from user to be identified Primary objectives user collection is selected, the range of identification can be reduced, in addition, the target group determined according to the intercommunity of target group Feature vector can accurately describe the feature of target group, so that recognition result is more accurate.
In addition, proposing the mode of two-level model training, for example, being directed to the model training of student enrollment, use first Weak Classifier (such as naive Bayesian) is classified, and is then used the negative sense result in classification results as negative sample, is carried out Second wheel classifies to the positive result in first round classification results compared with the training of strong classifier (such as support vector machines), Final classification result is that positive judgement is student enrollment crowd.For the Feature Selection side of university student crowd's candidate user Formula and an innovation point.For example, campus wifi feature is used to link number as feature (the wherein source campus wifi In having been marked as the user of the university student more situation of wifi number of connection during the new term begins), user addition student enrollment Whether the quantity of entry communication record adds as feature, user containing colleges and universities or colleges and universities' week as feature, user LBS change in location The characteristics of shipping address enclosed is as feature etc., the selection of above several features, and student enrollment more can comprehensively be depicted, It is the basis for accurately training model and university student being recognized accurately.
Second aspect, based on the same inventive concept, this specification embodiment provide a kind of target group's method for digging.It is following Related detailed process in Fig. 5-7 can refer to Fig. 2-4, only illustrate below to difference.
Referring to FIG. 5, including: for target group's method for digging flow chart that this specification embodiment second aspect provides
S501: collecting for user to be identified, according to user's natural quality information and user's social property information, filters out just Grade target user's collection;
S502: collect for Primary objectives user, extract target group's feature vector;
S503: being directed to target group's feature vector, is identified based on strong classifier trained in advance, determines to belong to mesh The end user collection of mark crowd.
Wherein, the process that Primary objectives user collection is filtered out in step S501 can be, previously according to the multinomial of user Natural quality information and user's social property information, set out the matching rule for meeting target group, use for target to be identified Family collection selects Primary objectives user collection according to matching rule.For example, according to age of user information, User Activity area information, Good friend's quantity, user are increased newly in the interior network linking number for being directed to specific region of user's special time period, user's special time period It is one or more in specific region whether address belongs to, and determines that Primary objectives user collects.For example, being directed to the digging of university student Pick, the matching rule of setting is: within school 5km, the age at 17-27 years old and be most recently connected campus wifi number it is big In 5 times, cell phone address book adds number and is more than 1 people, and shipping address is not belonging to together in school area, local city and present city The similar rules such as one city.
Model training schematic diagram in the target group's method for digging provided referring to Fig. 6, this specification embodiment second aspect.
(1) positive sample of target group is obtained, and obtains non-mark sample.
For example, according to specific region is directed in age of user information, User Activity area information, user's special time period Increase whether good friend's quantity, station address belong in specific region in network linking number, user's special time period newly one or It is multinomial, determine non-mark sample.
(2) sample of preset proportion is never selected in mark sample as negative sample.
For example, never mark sample is sampled, random (or according to rule) select 10% without mark sample conduct Negative sample.
(3) for positive sample and negative sample, target group's feature vector is extracted.
(4) based on the target group's feature vector extracted according to positive sample and negative sample, training strong classifier.
Referring to Fig. 7, model identification signal in the target group's method for digging provided for this specification embodiment second aspect Figure.
Model identification process includes:
(1) collect for user to be identified, filter out Primary objectives user collection.
(2) collect for Primary objectives user, extract target group's feature vector.
(3) it is directed to target group's feature vector, is identified based on strong classifier trained in advance, determines to belong to target The end user collection of crowd.
It excavates difference with the target group that this specification embodiment first aspect provides to be, this specification embodiment second Aspect provide target group's method for digging in, identified only with level-one strong classifier, be implemented it is easier, wherein In order to guarantee to identify accuracy, to the screening of Primary objectives user collection in identification process, and, during model training, The determination of screening and negative sample for unmarked sample gives some specific processing modes, for example, using rule With determining Primary objectives user collection and unmarked sample, and determine a certain proportion of unmarked sample as negative sample.
Referring to Fig. 8, for the structural schematic diagram for target group's excavating gear that this specification embodiment third aspect provides.Dress It sets and includes:
Primary objectives user collects screening unit 801, for collecting for user to be identified, according to user's natural quality information and User's social property information filters out Primary objectives user collection;
Target group's characteristic vector pickup unit 802 extracts target group for collecting for the Primary objectives user Feature vector;
First order recognition unit 803, it is weak based on the first order trained in advance for being directed to target group's feature vector Classifier is identified, intermediate target user's collection is obtained;
Second level recognition unit 804, for being divided by force based on the second level trained in advance for intermediate target user's collection Class device is identified, determines the ultimate aim user collection for belonging to target group.
In a kind of optional way, further includes: classifier training unit 805;
The classifier training unit 805 further comprises:
Sample acquisition subelement 8051 for obtaining the positive sample of target group, and obtains non-mark sample;
Target group's characteristic vector pickup subelement 8052, for mentioning for the positive sample and the non-mark sample Take out target group's feature vector;
The first order trains subelement 8053, for based on the mesh gone out according to the positive sample and the non-mark sample extraction Mark crowd characteristic vector, training first order Weak Classifier;
Negative sample determines subelement 8054, for being divided using the first order Weak Classifier the non-mark sample Class determines the negative sample of target group;
Subelement 8055 is trained in the second level, and for being based on the positive sample and the negative sample, the training second level is divided by force Class device.
In a kind of optional way, the Primary objectives user collects screening unit 801 or the sample acquisition subelement 8051 are specifically used for: according to age of user information and User Activity area information, determining Primary objectives user collection or do not beat Standard specimen sheet.
In a kind of optional way, target group's characteristic vector pickup unit 802 or target group's feature vector are mentioned It takes subelement 8052 to be specifically used for: for Primary objectives user collection or the positive sample and the non-mark sample, extracting The static nature and behavioral characteristics of user out;Target group's feature vector is made of the static nature and behavioral characteristics.
In a kind of optional way, the static nature includes: age of user, user local city, the online remaining sum of user In it is one or more, the behavioral characteristics include: in special time period for specific region network link information, it is specific when Between user good friend in section, station address, one or more in LBS information.
In a kind of optional way, the algorithm of first order Weak Classifier application include naive Bayesian, logistic regression, One in boost, the algorithm of second level strong classifier application includes that support vector machines, deep neural network, gradient mention One in liter decision tree, xgboost.
Referring to Fig. 9, for the structural schematic diagram for target group's excavating gear that this specification embodiment fourth aspect provides.It should Device includes:
Primary objectives user collects screening unit 901, for collecting for user to be identified, according to user's natural quality information and User's social property information filters out Primary objectives user collection;
Target group's characteristic vector pickup unit 902 extracts target group for collecting for the Primary objectives user Feature vector;
Recognition unit 903 is known for being directed to target group's feature vector based on strong classifier trained in advance Not, the end user collection for belonging to target group is determined.
In a kind of optional way, further includes: classifier training unit 904;
The classifier training unit 904 further comprises:
Sample acquisition subelement 9041 for obtaining the positive sample of target group, and obtains non-mark sample;
Negative sample determines subelement 9042, for selecting the sample of preset proportion from the non-mark sample as negative Sample;
Target group's characteristic vector pickup subelement 9043, for extracting for the positive sample and the negative sample Target group's feature vector;
Classifier training subelement 9044, for based on the target person extracted according to the positive sample and the negative sample Group character vector, training strong classifier.
In a kind of optional way, the Primary objectives user collects screening unit 901 or sample acquisition subelement 9041 has Body is used for: according to the lattice chain for being directed to specific region in age of user information, User Activity area information, user's special time period It connects number, increase good friend's quantity, that whether station address belongs to is one or more in specific region in user's special time period newly, really Make Primary objectives user collection or non-mark sample.
In a kind of optional way, target group's characteristic vector pickup unit 902 or target group's feature vector are mentioned It takes subelement 9042 to be specifically used for: for Primary objectives user collection or the positive sample and the negative sample, extracting use The static nature and behavioral characteristics at family;Target group's feature vector is made of the static nature and behavioral characteristics.
In a kind of optional way, the static nature includes: age of user, user local city, the online remaining sum of user In it is one or more, the behavioral characteristics include: in special time period for specific region network link information, it is specific when Between user good friend in section, station address, one or more in LBS information.
In a kind of optional way, the algorithm of the strong classifier application includes support vector machines, deep neural network, ladder One in degree promotion decision tree, xgboost.
5th aspect, is based on inventive concept same as target group's method for digging in previous embodiment, and the present invention also mentions For a kind of server, as shown in Figure 10, including memory 1004, processor 1002 and it is stored on memory 1004 and can locating The computer program run on reason device 1002, the processor 1002 realize that target group described previously digs when executing described program The step of pick method.
Wherein, in Figure 10, bus architecture (is represented) with bus 1000, and bus 1000 may include any number of mutual The bus and bridge of connection, bus 1000 will include that the one or more processors represented by processor 1002 and memory 1004 represent The various circuits of memory link together.Bus 1000 can also will such as peripheral equipment, voltage-stablizer and power management electricity Various other circuits on road or the like link together, and these are all it is known in the art, therefore, no longer carry out herein to it It further describes.Bus interface 1006 provides interface between bus 1000 and receiver 1001 and transmitter 1003.Receiver 1001 and transmitter 1003 can be the same element, i.e. transceiver, provide for over a transmission medium with various other devices The unit of communication.Processor 1002 is responsible for management bus 1000 and common processing, and memory 1004 can be used to store The used data when executing operation of processor 1002.
6th aspect, based on the inventive concept with target group's method for digging in previous embodiment, the present invention also provides one Kind computer readable storage medium, is stored thereon with computer program, which realizes mesh described previously when being executed by processor The step of mark crowd's method for digging.
This specification is referring to the method, equipment (system) and computer program product according to this specification embodiment Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions every in flowchart and/or the block diagram The combination of process and/or box in one process and/or box and flowchart and/or the block diagram.It can provide these computers Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute In setting for the function that realization is specified in one or more flows of the flowchart and/or one or more blocks of the block diagram It is standby.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of equipment, the commander equipment realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of this specification has been described, once a person skilled in the art knows basic wounds The property made concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as includes Preferred embodiment and all change and modification for falling into this specification range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this specification to this specification Spirit and scope.In this way, if these modifications and variations of this specification belong to this specification claim and its equivalent skill Within the scope of art, then this specification is also intended to include these modifications and variations.

Claims (28)

1. a kind of target group's method for digging, comprising:
Collect for user to be identified, according to user's natural quality information and user's social property information, filters out Primary objectives use Family collection;
Collect for the Primary objectives user, extracts target group's feature vector;
For target group's feature vector, is identified based on first order Weak Classifier trained in advance, obtain intermediate mesh Mark user's collection;
Collect for the intermediate target user, is identified based on second level strong classifier trained in advance, determine to belong to mesh The ultimate aim user of mark crowd collects.
2. according to the method described in claim 1, the method also includes the training first order Weak Classifier and described second Grade strong classifier:
The positive sample of target group is obtained, and obtains non-mark sample;
For the positive sample and the non-mark sample, target group's feature vector is extracted;
Based on the target group's feature vector gone out according to the positive sample and the non-mark sample extraction, weak point of the first order of training Class device;
Classified using the first order Weak Classifier to the non-mark sample, determines the negative sample of target group;
Based on the positive sample and the negative sample, training second level strong classifier.
3. method according to claim 1 or 2, the acquisition modes of the Primary objectives user collection or non-mark sample are as follows:
According to age of user information and User Activity area information, Primary objectives user collection or non-mark sample are determined.
4. the method described in 1 or 2 according to claim, described to extract target group's feature vector, comprising:
For Primary objectives user collection or the positive sample and the non-mark sample, the static nature of user is extracted And behavioral characteristics;
Target group's feature vector is made of the static nature and behavioral characteristics.
5. according to the method described in claim 4, the static nature includes: that age of user, user local city, user are online One or more in remaining sum, the behavioral characteristics include: that the network linking in user's special time period for specific region is believed Breath increases good friend in user's special time period newly, is station address, one or more in LBS information.
6. according to the method described in claim 5, being directed to the network link information of specific region in user's special time period Acquisition modes are as follows:
Determine that the user network link information for having been labeled as target group is the network link information for being directed to specific region;System Count the network link information that the user is directed to the specific region in specific time.
7. method according to claim 1 or 2, the algorithm of the first order Weak Classifier application include naive Bayesian, One in logistic regression, boost, the algorithm of the second level strong classifier application includes support vector machines, depth nerve net Network, gradient promote decision tree, one in xgboost.
8. a kind of target group's method for digging, comprising:
Collect for user to be identified, according to user's natural quality information and user's social property information, filters out Primary objectives use Family collection;
Collect for the Primary objectives user, extracts target group's feature vector;
For target group's feature vector, is identified based on strong classifier trained in advance, determine to belong to target person The end user collection of group.
9. according to the method described in claim 8, the method also includes the training strong classifiers:
The positive sample of target group is obtained, and obtains non-mark sample;
The sample of preset proportion is selected as negative sample from the non-mark sample;
For the positive sample and the negative sample, target group's feature vector is extracted;
Based on the target group's feature vector extracted according to the positive sample and the negative sample, training strong classifier.
10. method according to claim 8 or claim 9, the acquisition modes of the Primary objectives user collection or non-mark sample are as follows:
According to the network linking for being directed to specific region in age of user information, User Activity area information, user's special time period Increase good friend's quantity, that whether station address belongs to is one or more in specific region in number, user's special time period newly, determines Primary objectives user collection or non-mark sample out.
11. the method described in 8 or 9 according to claim, described to extract target group's feature vector, comprising:
For Primary objectives user collection or the positive sample and the negative sample, extracts the static nature of user and move State feature;
Target group's feature vector is made of the static nature and behavioral characteristics.
12. according to the method for claim 11, the static nature includes: age of user, user local city, Yong Hu One or more in line remaining sum, the behavioral characteristics include: the network linking that specific region is directed in user's special time period Increase good friend, station address, one or more in LBS information in information, user's special time period newly.
13. according to the method for claim 12, the network linking in user's special time period for specific region is believed The acquisition modes of breath are as follows:
Determine that the user network link information for having been labeled as target group is the network link information for being directed to specific region;System Count the network link information that the user is directed to the specific region in specific time.
14. method according to claim 8 or claim 9, the algorithm of the strong classifier application includes support vector machines, depth mind Through one in network, gradient promotion decision tree, xgboost.
15. a kind of target group's excavating gear, comprising:
Primary objectives user collects screening unit, for collecting for user to be identified, according to user's natural quality information and user society Meeting attribute information filters out Primary objectives user collection;
Target group's characteristic vector pickup unit, for for the Primary objectives user collect, extract target group's feature to Amount;
First order recognition unit, for being directed to target group's feature vector, based on first order Weak Classifier trained in advance It is identified, obtains intermediate target user's collection;
Second level recognition unit, for for intermediate target user's collection, based on second level strong classifier trained in advance into The ultimate aim user collection for belonging to target group is determined in row identification.
16. device according to claim 15, further includes: classifier training unit;
The classifier training unit further comprises:
Sample acquisition subelement for obtaining the positive sample of target group, and obtains non-mark sample;
Target group's characteristic vector pickup subelement, for extracting target for the positive sample and the non-mark sample Crowd characteristic vector;
The first order trains subelement, for special based on the target group gone out according to the positive sample and the non-mark sample extraction Levy vector, training first order Weak Classifier;
Negative sample determines subelement, for being classified using the first order Weak Classifier to the non-mark sample, determines The negative sample of target group out;
Subelement is trained in the second level, for being based on the positive sample and the negative sample, training second level strong classifier.
17. device according to claim 15 or 16, the Primary objectives user collects screening unit or the sample acquisition Subelement is specifically used for: according to age of user information and User Activity area information, determining Primary objectives user collection or not Mark sample.
18. the device described in 15 or 16 according to claim, target group's characteristic vector pickup unit or target person Group character vector extracts subelement and is specifically used for: for Primary objectives user collection or the positive sample and the non-mark Sample extracts the static nature and behavioral characteristics of user;The target group is constituted by the static nature and behavioral characteristics Feature vector.
19. device according to claim 18, the static nature includes: age of user, user local city, Yong Hu One or more in line remaining sum, the behavioral characteristics include: the network linking that specific region is directed in user's special time period Increase good friend, station address, one or more in LBS information in information, user's special time period newly.
20. device according to claim 15 or 16, the algorithm of the first order Weak Classifier application includes simple pattra leaves This, one in logistic regression, boost, the algorithm of the second level strong classifier application includes support vector machines, depth nerve Network, gradient promote decision tree, one in xgboost.
21. a kind of target group's excavating gear, comprising:
Primary objectives user collects screening unit, for collecting for user to be identified, according to user's natural quality information and user society Meeting attribute information filters out Primary objectives user collection;
Target group's characteristic vector pickup unit, for for the Primary objectives user collect, extract target group's feature to Amount;
Recognition unit is identified for being directed to target group's feature vector based on strong classifier trained in advance, is determined Belong to the end user collection of target group out.
22. device according to claim 21, further includes: classifier training unit;
The classifier training unit further comprises:
Sample acquisition subelement for obtaining the positive sample of target group, and obtains non-mark sample;
Negative sample determines subelement, for selecting the sample of preset proportion as negative sample from the non-mark sample;
Target group's characteristic vector pickup subelement, for extracting target group for the positive sample and the negative sample Feature vector;
Classifier training subelement, for based on the target group's feature extracted according to the positive sample and the negative sample to Amount, training strong classifier.
23. the device according to claim 21 or 22, the Primary objectives user collects screening unit or sample acquisition is single Member is specifically used for: according to the net for being directed to specific region in age of user information, User Activity area information, user's special time period Whether newly-increased good friend's quantity, station address belong to one or more in specific region in network link number, user's special time period , determine Primary objectives user collection or non-mark sample.
24. the device described in 21 or 22 according to claim, target group's characteristic vector pickup unit or target person Group character vector extracts subelement and is specifically used for: it is directed to Primary objectives user collection or the positive sample and the negative sample, Extract the static nature and behavioral characteristics of user;From the static nature and behavioral characteristics constitute target group's feature to Amount.
25. device according to claim 24, the static nature includes: age of user, user local city, Yong Hu One or more in line remaining sum, the behavioral characteristics include: the network linking that specific region is directed in user's special time period Increase good friend, station address, one or more in LBS information in information, user's special time period newly.
26. the algorithm of the device according to claim 21 or 22, the strong classifier application includes support vector machines, depth Neural network, gradient promote decision tree, one in xgboost.
27. a kind of server including memory, processor and stores the computer that can be run on a memory and on a processor The step of program, the processor realizes any one of claim 1-14 the method when executing described program.
28. a kind of computer readable storage medium, is stored thereon with computer program, power is realized when which is executed by processor Benefit requires the step of any one of 1-14 the method.
CN201810917001.0A 2018-08-13 2018-08-13 Target group's method for digging, device, server and readable storage medium storing program for executing Pending CN109087145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810917001.0A CN109087145A (en) 2018-08-13 2018-08-13 Target group's method for digging, device, server and readable storage medium storing program for executing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810917001.0A CN109087145A (en) 2018-08-13 2018-08-13 Target group's method for digging, device, server and readable storage medium storing program for executing

Publications (1)

Publication Number Publication Date
CN109087145A true CN109087145A (en) 2018-12-25

Family

ID=64834345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810917001.0A Pending CN109087145A (en) 2018-08-13 2018-08-13 Target group's method for digging, device, server and readable storage medium storing program for executing

Country Status (1)

Country Link
CN (1) CN109087145A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267207A (en) * 2019-06-03 2019-09-20 中国建设银行股份有限公司 Intelligent position monitoring method, device and electronic equipment
CN111831681A (en) * 2020-01-22 2020-10-27 浙江连信科技有限公司 Intelligent terminal-based personnel discrimination method and device
CN112291713A (en) * 2020-12-25 2021-01-29 浙江口碑网络技术有限公司 Method for acquiring target potential user data
CN112738724A (en) * 2020-12-17 2021-04-30 福建新大陆软件工程有限公司 Method, device, equipment and medium for accurately identifying regional target crowd
CN112804134A (en) * 2020-12-31 2021-05-14 深圳市镜玩科技有限公司 Task initiating method based on instant messaging, related device, equipment and medium
CN114125815A (en) * 2021-11-26 2022-03-01 中国联合网络通信集团有限公司 Identity recognition method and device and computer readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130124298A1 (en) * 2011-11-15 2013-05-16 Huajing Li Generating clusters of similar users for advertisement targeting
CN104090888A (en) * 2013-12-10 2014-10-08 深圳市腾讯计算机***有限公司 Method and device for analyzing user behavior data
CN106062871A (en) * 2014-03-28 2016-10-26 英特尔公司 Training classifiers using selected cohort sample subsets
CN106934410A (en) * 2015-12-30 2017-07-07 阿里巴巴集团控股有限公司 The sorting technique and system of data
CN107273454A (en) * 2017-05-31 2017-10-20 北京京东尚科信息技术有限公司 User data sorting technique, device, server and computer-readable recording medium
US20180032883A1 (en) * 2016-07-27 2018-02-01 Facebook, Inc. Socioeconomic group classification based on user features
CN108073883A (en) * 2016-11-11 2018-05-25 深圳云天励飞技术有限公司 Large-scale crowd attribute recognition approach and device
CN108334647A (en) * 2018-04-12 2018-07-27 阿里巴巴集团控股有限公司 Data processing method, device, equipment and the server of Insurance Fraud identification

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130124298A1 (en) * 2011-11-15 2013-05-16 Huajing Li Generating clusters of similar users for advertisement targeting
CN104090888A (en) * 2013-12-10 2014-10-08 深圳市腾讯计算机***有限公司 Method and device for analyzing user behavior data
CN106062871A (en) * 2014-03-28 2016-10-26 英特尔公司 Training classifiers using selected cohort sample subsets
CN106934410A (en) * 2015-12-30 2017-07-07 阿里巴巴集团控股有限公司 The sorting technique and system of data
US20180032883A1 (en) * 2016-07-27 2018-02-01 Facebook, Inc. Socioeconomic group classification based on user features
CN108073883A (en) * 2016-11-11 2018-05-25 深圳云天励飞技术有限公司 Large-scale crowd attribute recognition approach and device
CN107273454A (en) * 2017-05-31 2017-10-20 北京京东尚科信息技术有限公司 User data sorting technique, device, server and computer-readable recording medium
CN108334647A (en) * 2018-04-12 2018-07-27 阿里巴巴集团控股有限公司 Data processing method, device, equipment and the server of Insurance Fraud identification

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267207A (en) * 2019-06-03 2019-09-20 中国建设银行股份有限公司 Intelligent position monitoring method, device and electronic equipment
CN110267207B (en) * 2019-06-03 2021-08-31 中国建设银行股份有限公司 Intelligent position monitoring method and device and electronic equipment
CN111831681A (en) * 2020-01-22 2020-10-27 浙江连信科技有限公司 Intelligent terminal-based personnel discrimination method and device
CN111831681B (en) * 2020-01-22 2022-03-25 浙江连信科技有限公司 Intelligent terminal-based personnel discrimination method and device
CN112738724A (en) * 2020-12-17 2021-04-30 福建新大陆软件工程有限公司 Method, device, equipment and medium for accurately identifying regional target crowd
CN112291713A (en) * 2020-12-25 2021-01-29 浙江口碑网络技术有限公司 Method for acquiring target potential user data
CN112291713B (en) * 2020-12-25 2021-04-09 浙江口碑网络技术有限公司 Method for acquiring target potential user data
CN112804134A (en) * 2020-12-31 2021-05-14 深圳市镜玩科技有限公司 Task initiating method based on instant messaging, related device, equipment and medium
CN114125815A (en) * 2021-11-26 2022-03-01 中国联合网络通信集团有限公司 Identity recognition method and device and computer readable storage medium
CN114125815B (en) * 2021-11-26 2023-06-30 中国联合网络通信集团有限公司 Identity recognition method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN109087145A (en) Target group's method for digging, device, server and readable storage medium storing program for executing
CN110413707A (en) The excavation of clique's relationship is cheated in internet and checks method and its system
US20210026909A1 (en) System and method for identifying contacts of a target user in a social network
CN106682172A (en) Keyword-based document research hotspot recommending method
CN105117460A (en) Learning resource recommendation method and system
CN106651603A (en) Risk evaluation method and apparatus based on position service
CN105894089A (en) Method of establishing credit investigation model, credit investigation determination method and the corresponding apparatus thereof
Gulati Predictive analytics using data mining technique
CN108629413A (en) Neural network model training, trading activity Risk Identification Method and device
CN106296312A (en) Online education resource recommendation system based on social media
CN105931116A (en) Automated credit scoring system and method based on depth learning mechanism
JP2014522540A (en) Microblog sequencing, search, display method and system
CN106548367A (en) The site selection model and its applied research of multi-source data
Goncalves et al. Gathering alumni information from a web social network
Sim et al. Developing ontologies and persona to support and enhance requirements engineering activities–a case study
CN107592296A (en) The recognition methods of rubbish account and device
CN103886030A (en) Cost-sensitive decision-making tree based physical information fusion system data classification method
CN109992781A (en) Processing, device, storage medium and the processor of text feature
CN108897750A (en) Merge the personalized location recommendation method and equipment of polynary contextual information
KR20170046970A (en) Competency Evaluation System for cloud-based human resources management
CN104731937B (en) The processing method and processing device of user behavior data
CN108399229A (en) A kind of Database in Digital Library building method based on big data
CN106205288A (en) A kind of implementation method training robot
Pfeiffer et al. Active sampling of networks
Ganorkar et al. Analysis and prediction of student data using data science: a review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201009

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201009

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20181225

RJ01 Rejection of invention patent application after publication