CN107609196A - A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information - Google Patents

A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information Download PDF

Info

Publication number
CN107609196A
CN107609196A CN201710975842.2A CN201710975842A CN107609196A CN 107609196 A CN107609196 A CN 107609196A CN 201710975842 A CN201710975842 A CN 201710975842A CN 107609196 A CN107609196 A CN 107609196A
Authority
CN
China
Prior art keywords
user
time
adaboost
parameter
residence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710975842.2A
Other languages
Chinese (zh)
Inventor
曹万鹏
史辉
罗云彬
徐青
李鹏
李�浩
林绍福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710975842.2A priority Critical patent/CN107609196A/en
Publication of CN107609196A publication Critical patent/CN107609196A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information, and the different statistical vectors of following 3 features are carried out based on statistical method:1st, user is calculated for a period of time, the registration at maximum non-air time interval in daily ticket;2nd, calculate in user's Duo tickets and overlap maximum non-call office every continuous number of days;3rd, the non-duration of call of maximum in the daily ticket of user is calculated.Multiple statistical vectors of the ticket are obtained according to above- mentioned information, based on AdaBoost sorting algorithms, and the residence of user and non-residence sample, model is trained, and obtain model using training and the residence of user is judged, pass through accurately characteristic vector to select so that be based ultimately upon the grader classifying qualities of AdaBoost algorithms more preferably, nicety of grading it is higher.

Description

Sentence a kind of AdaBoost user residence based on user bill big data characteristic information Other method
Technical field
The invention belongs to work intelligence and machine Learning Theory, is believed more particularly to one kind based on user bill big data feature The AdaBoost user residence method of discrimination of breath.
Background technology
Sorting algorithm is namely based on sorter model and chooses optimal classification vacation from optional classification for sample to be detected If it belongs to machine learning category in artificial intelligence, the very big concern of field correlative study person is attracted.People put into Substantial amounts of time and efforts research such as C4.5, SVMs, bayesian algorithm, AdaBoost algorithms and K- arest neighbors point The sorting algorithms such as class algorithm, and they are applied to the different necks such as face recognition, person's handwriting checking, data analysis and medical application Domain.
The words of AdaBoost mono- are derived from the abbreviation of Adaptive Boosting (adaptive enhancing), are by Yoav The machine learning Meta algorithm that Freund and Robert Schapire are proposed.Its guideline designed is to ensure currently to train sample Originally there is highest nicety of grading.By the way that by different Weak Classifiers, (so-called Weak Classifier refers to that nicety of grading is better than slightly here Random guess) reasonably combine, strong classifier is formed, although the nicety of grading of each Weak Classifier is not high, final Strong classifier obtains tremendous increase on classification performance.AdaBoost algorithms pass through tune in the sense that be adaptive By the wrong sample weights divided of Weak Classifier before whole, the attention degree that follow-up Weak Classifier divides mistake sample is improved, is realized final The design of sorter model.This is based on, the rational design of one group of Weak Classifier can be combined into strong classifier, obtain one Gratifying nicety of grading on the whole.And training sample decides as the important evidence for judging, distinguishing relevant information The success or failure that AdaBoost disaggregated models learn and judged.Therefore, excavation and the structure for the key feature that related objective is distinguished are determined It is most important.
Telecommunication department produces the ticket big data of TB ranks daily, and a large amount of useful information are contained in the inside, by this The excavation of a little information, it can be that telecom operators bring bigger additional economic benefit, telecom operators' intensification pair can also be made The understanding of telecommunication user, provided for telecommunication user and preferably, more accurately service, experience.The present invention is based on this, is used by excavating Big data information in the ticket of family, targetedly user's characteristic information is excavated and created, is calculated using AdaBoost intelligent classifications Method identifies the possible residence information of user, and then provides the user more accurately customer service, improves Consumer's Experience.
The content of the invention
For the user bill big data of telecom operators' magnanimity, the present invention passes through the universal rule of life to general user Analysis, distinguished, proposed a kind of based on the essence of correlated characteristic information, consistency in residence and non-residence user bill AdaBoost user residence method of discrimination based on user bill big data characteristic information.
The different statistical vectors of following 3 features are carried out based on statistical method:1st, user is calculated for a period of time, daily ticket The registration at the middle non-air time interval of maximum;2nd, calculate in user's Duo tickets and overlap maximum non-call office every continuous day Number;3rd, the non-duration of call of maximum in the daily ticket of user is calculated.Multiple statistical vectors of the ticket are obtained according to above- mentioned information, Based on AdaBoost sorting algorithms, and the residence of user and non-residence sample, model is trained, and use training Obtained model is judged the residence of user, is selected by accurately characteristic vector so that be based ultimately upon AdaBoost The grader classifying quality of algorithm more preferably, nicety of grading it is higher.
A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information, including following step Suddenly:
(1) call frequency in user bill is searched out in ticket big data and most base station location information occurs
The number of same base based on connection of being conversed in user bill filters to ticket, by setting certain threshold Value, searches out the possible residence candidate item information of the user, and the wherein field of voice and short message ticket composition typically at least wraps Include the information such as calling and called phone number, air time, connection base station.
(2) call bill data to these base station location information is analyzed, calculated, and provides the residence institute in call bill data Essence, the uniqueness characteristic information having, improve AdaBoost Algorithm for Training sample precisions
Pass through the correlated characteristic state to non-residence in correlated characteristic state in the user bill of residence and user bill Difference, find basic gender gap therein.Meanwhile according to the general mankind work and rest rule and the relative continuity of habits and customs and Invariance feature, find the following relevant parameter in user bill:For a long time without air time interval registration parameter (words daily Dan Zhongwu calls starting point, end point, think its coincidence in certain time threshold error), for a long time without duration of call parameter, For a long time without Session continuity parameter (for example, the number of days overlapped), be connected the information such as base station parameter for a long time.
For a certain specific user, it is general have be relatively fixed, constant work and rest rule, for example user is relative in residence The fixed beginning time of having a rest, substantially constant sleep duration, regular time of getting up etc., and these information can pass through The statistic of correlated characteristic in user bill big data is reflected indirectly.
(3) statistical information based on features described above is built, is realized based on AdaBoost sorting algorithms and is based on ticket big data User residence precisely identify
Above-mentioned parameter is calculated according to the above-mentioned feature searched out, and based on statistical method, asks for their correspondence The parameters such as average m, variances sigma and High Order Moment ξ, as the input sample of AdaBoost sorter models, are instructed to disaggregated model Practice.And then according to features described above information, the AdaBoost disaggregated models obtained using study may to user in above-mentioned place Residence make a distinction, judge with non-residence.
Mass users ticket big data is extracted based on constructed AdaBoost graders, differentiated, determines user's Residence information;
Compared with prior art, the present invention has following obvious advantage and beneficial effect:
(1) present invention proposes a kind of AdaBoost user residence differentiation side based on user bill big data characteristic information Method, universal rule of life, work and rest feature of the algorithm from general user, according to residence and non-residence in user institute shape The basic sex differernce of characteristic into ticket, excavated based on the analysis to user bill big data a series of with strong differentiation The characteristic parameter of characteristic, premise is provided for accurate, quick training AdaBoost sorter models.
(2) present invention utilizes the characteristic parameter in user bill big data, builds the ASSOCIATE STATISTICS of above-mentioned essence feature The parameters such as amount, average value, variance, High Order Moment are classifier training as the characteristic parameter for judging the possible residence of user The input sample that user is objective, stable is established with identification, it is ensured that for the possibility of the differentiation call bill data of mass users The accurate differentiation of residence.
(3) present invention using AdaBoost sorting algorithms to user ticket big data analysis, excavate in, improve to The rapidity and accuracy that the possible residence in family judges.
Brief description of the drawings
Fig. 1 is the flow chart of the AdaBoost user residence method of discrimination based on user bill big data characteristic information.
Embodiment
The present invention will be further described with reference to the accompanying drawings and detailed description.
As shown in figure 1, a kind of AdaBoost user residence based on user bill big data characteristic information of the present invention is sentenced Other method comprises the following steps:
(1) the possible whole candidate residences information of user is extracted in ticket big data
The ticket big data of storage is filtered in oracle database, i.e., searched out in ticket big data a certain There are more whole base station location information in call frequency in targeted customer's ticket;
(2) the possible residence information of user is filtered out from ticket big data
Call frequency and sequence under counting user different base station connection herein, to sorting, forward and base station geographic position connects Near base station location merges, and filters out 5 later data of ranking and gives up;
(3) feature extraction in call bill data
After filtering out possible residence, relevant feature parameters are obtained from above-mentioned call bill data as AdaBoost classification The characteristic of division of device, is specifically included:
More days long-times without call time of coincidence parameter, not on the same day (set time started t without the air times, at the end of Between te) with (setting time started c with reference to days, end time ce, it is to occur frequency in user certain time through counting herein with reference to day Rate highest ticket without call the time started and the end time) the time of coincidence without the air time, such as count certain time be 30 days, certain day and the time of coincidence parameter T with reference to daycFor:
1)Tc=ce-ts cs<ts<ce<te
2)Tc=te-cs ts<cs<te<ce
3)Tc=ce-cs ts<cs<ce<te
4)Tc=te-ts cs<ts<te<ce
For a long time without duration of call parameter l, directly obtained by ticket;
For a long time without Session continuity parameter s (for example, number of days of the time period without call in one month);
The information such as the time parameter t of a certain FX base station are connected for a long time, the information is directly obtained by call bill data Take.
(4) characteristic statisticses parameter calculates
The parameters such as the average m of above-mentioned acquisition feature, variances sigma and High Order Moment ξ are calculated:Average refers in one group of data In all data sums again divided by data number;Sample variance is numerically equal to form the stochastic variable and population mean of sample Between difference square;High Order Moment refers to distance of the variable in different exponent numbers apart from zero point or center;
For example, here with more days long-times without call time of coincidence parameter TcExemplified by, for more days long-times without call weight Close time parameter Tc, its average m (T can be calculated respectively according to the above methodc), its variances sigma (Tc) and its High Order Moment ξ (Tc).It is above-mentioned Average, variance, the High Order Moment of other characteristic parameters can be obtained by same method.
(5) the disaggregated model training based on AdaBoost algorithms
In the present invention, a certain amount of known users life ground sample is obtained first, and calculates their character pair parameter, is wrapped Include for a long time without air time interval registration parameter, for a long time without duration of call parameter, for a long time without Session continuity parameter, The time parameter of a certain FX base station is connected for a long time, and then draws the parameters such as their average, variance, High Order Moment, and Based on AdaBoost sorting algorithms to train classification models, provide based on the patent formula, characteristic parameter and AdaBoost algorithms Train the sorter model of gained;
(6) the user residence information based on AdaBoost disaggregated models is differentiated
Gained AdaBoost graders are trained based on features described above, the call bill data of targeted customer is analyzed, extracted Correlated characteristic, provide the possible residence of each user.
By above-mentioned specific steps, the ticket big data information based on telecom operators can be realized accurately and rapidly pair The possible residence of user carries out debating knowledge.
The AdaBoost user residence method of discrimination based on user bill big data characteristic information of the present invention, is related to Following steps:1st, searched out in ticket big data in user bill it is continuous more days (it is self-defined, can be 1 month, 1 Season etc.), there is most base station location information in call frequency;2nd, the call bill data to these location informations is analyzed, united Meter, excavates its inherent law, provides the characteristic information in call bill data;3rd, according to features described above information, using AdaBoost points Class algorithm, the possible residence of user is made a distinction with non-residence in above-mentioned place, judged.Wherein, the 2nd step In, it is related to excavation, analysis, the calculating to key difference characteristic information in ticket big data, whether they can be accurate if decide Really, efficiently above-mentioned location information is made a distinction.The present invention passes through the analysis of the universal rule of life to general user, foundation The typically relative continuity of mankind's work and rest rule and habits and customs and Invariance feature, are talked about based on residence and non-residence user The essence of correlated characteristic information in list, consistency difference, propose a kind of based on user bill big data characteristic information AdaBoost user residence method of discrimination.The present invention carries out the different statistical vectors of following 3 features based on statistical method:1、 Calculate user for a period of time, the registration at maximum non-air time interval in daily ticket;2nd, weight in user's Duo tickets is calculated Maximum non-call office is closed every continuous number of days;3rd, the non-duration of call of maximum in the daily ticket of user is calculated.According to above- mentioned information, Multiple statistical vectors are derived based on ticket, based on AdaBoost sorting algorithm models, to residence and the non-residence of user It is trained, and obtains model using training and the residence of user is judged so that is based ultimately upon AdaBoost algorithms Grader classifying quality more preferably, nicety of grading it is higher.

Claims (2)

  1. A kind of 1. AdaBoost user residence method of discrimination based on user bill big data characteristic information, it is characterised in that Comprise the following steps:
    The possible whole candidate residences information of user is extracted in step (1), ticket big data;
    Step (2), the possible residence information of user is filtered out from ticket big data
    Feature extraction in step (3), call bill data
    After filtering out possible residence, relevant feature parameters are obtained from above-mentioned call bill data as AdaBoost graders Characteristic of division, specifically include:
    More days long-times without call time of coincidence parameter, not on the same day (set time started t without the air times, end time te) With (setting time started c with reference to days, end time ce, it is through counting in the user certain time frequency of occurrences most herein with reference to day High ticket without call the time started and the end time) the time of coincidence without the air time, such as count certain time be 30 days, Certain day and the time of coincidence parameter T with reference to daycFor:
    1)Tc=ce-ts cs<ts<ce<te
    2)Tc=te-cs ts<cs<te<ce
    3)Tc=ce-cs ts<cs<ce<te
    4)Tc=te-ts cs<ts<te<ce
    For a long time without duration of call parameter l, directly obtained by ticket;
    For a long time without Session continuity parameter s;
    The time parameter t information of a certain FX base station is connected for a long time, and the information is directly obtained by call bill data.
    Step (4) characteristic statisticses parameter calculates
    The average m of above-mentioned acquisition feature, variances sigma and High Order Moment ξ parameters are calculated:Average refers to own in one group of data Data sum again divided by data number;Sample variance is numerically equal between stochastic variable and the population mean of composition sample Square of difference;High Order Moment refers to distance of the variable in different exponent numbers apart from zero point or center;
    Disaggregated model training of the step (5) based on AdaBoost algorithms
    A certain amount of known users life ground sample is obtained first, and calculates their character pair parameter, including:For a long time without logical Words time interval registration parameter, for a long time without duration of call parameter, for a long time without Session continuity parameter, connect certain for a long time The time parameter of one FX base station, and then their average, variance, High Order Moment parameter are drawn, and based on AdaBoost points Class Algorithm for Training disaggregated model, obtains AdaBoost sorter models;
    User residence information of the step (6) based on AdaBoost sorter models is differentiated
    Gained AdaBoost graders are trained based on features described above, the call bill data of targeted customer analyzed, extraction is related Feature, provide the possible residence of each user.
  2. 2. the AdaBoost user residence differentiation side based on user bill big data characteristic information as claimed in claim 1 Method, it is characterised in that the call frequency under the connection of counting user different base station and sequence in step (2), to the forward and base that sorts The close base station location in geographical position of standing merges, and filters out 5 later data of ranking and gives up.
CN201710975842.2A 2017-10-19 2017-10-19 A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information Pending CN107609196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710975842.2A CN107609196A (en) 2017-10-19 2017-10-19 A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710975842.2A CN107609196A (en) 2017-10-19 2017-10-19 A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information

Publications (1)

Publication Number Publication Date
CN107609196A true CN107609196A (en) 2018-01-19

Family

ID=61078613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710975842.2A Pending CN107609196A (en) 2017-10-19 2017-10-19 A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information

Country Status (1)

Country Link
CN (1) CN107609196A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103052022A (en) * 2011-10-17 2013-04-17 ***通信集团公司 User stabile point discovering method and system based on mobile behaviors
CN105513351A (en) * 2015-12-17 2016-04-20 北京亚信蓝涛科技有限公司 Traffic travel characteristic data extraction method based on big data
CN107133265A (en) * 2017-03-31 2017-09-05 咪咕动漫有限公司 A kind of method and device of identification behavior abnormal user
WO2017168490A1 (en) * 2016-03-28 2017-10-05 アイホン株式会社 Intercom system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103052022A (en) * 2011-10-17 2013-04-17 ***通信集团公司 User stabile point discovering method and system based on mobile behaviors
CN105513351A (en) * 2015-12-17 2016-04-20 北京亚信蓝涛科技有限公司 Traffic travel characteristic data extraction method based on big data
WO2017168490A1 (en) * 2016-03-28 2017-10-05 アイホン株式会社 Intercom system
CN107133265A (en) * 2017-03-31 2017-09-05 咪咕动漫有限公司 A kind of method and device of identification behavior abnormal user

Similar Documents

Publication Publication Date Title
CN106550155B (en) Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted
CN107798032B (en) Method and device for processing response message in self-service voice conversation
CN107222865A (en) The communication swindle real-time detection method and system recognized based on suspicious actions
CN109819126B (en) Abnormal number identification method and device
CN109615116A (en) A kind of telecommunication fraud event detecting method and detection system
CN111159387B (en) Recommendation method based on multi-dimensional alarm information text similarity analysis
CN108924333A (en) Fraudulent call recognition methods, device and system
CN107729940A (en) A kind of user bill big data base station connection information customer relationship estimates method
WO2017186090A1 (en) Communication number processing method and apparatus
CN109800600A (en) Ocean big data susceptibility assessment system and prevention method towards privacy requirements
CN106385693A (en) Telecommunication fraud method for virtual number segments
CN109474756B (en) Telecommunication anomaly detection method based on collaborative network representation learning
CN112866486B (en) Multi-source feature-based fraud telephone identification method, system and equipment
CN108550050A (en) A kind of user&#39;s portrait method based on call center data
CN112001170A (en) Method and system for recognizing deformed sensitive words
CN110381218A (en) A kind of method and device identifying telephone fraud clique
CN114841705B (en) Anti-fraud monitoring method based on scene recognition
CN111510368A (en) Family group identification method, device, equipment and computer readable storage medium
CN111105064A (en) Method and device for determining suspected information of fraud event
CN105930430B (en) Real-time fraud detection method and device based on non-accumulative attribute
CN110717068B (en) Video retrieval method based on deep learning
CN110362828B (en) Network information risk identification method and system
CN107609196A (en) A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information
CN106407878A (en) Face detection method and device based on multiple classifiers
CN114048294B (en) Similar population extension model training method, similar population extension method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180119

RJ01 Rejection of invention patent application after publication