CN107609196A - A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information - Google Patents
A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information Download PDFInfo
- Publication number
- CN107609196A CN107609196A CN201710975842.2A CN201710975842A CN107609196A CN 107609196 A CN107609196 A CN 107609196A CN 201710975842 A CN201710975842 A CN 201710975842A CN 107609196 A CN107609196 A CN 107609196A
- Authority
- CN
- China
- Prior art keywords
- user
- time
- adaboost
- parameter
- residence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention discloses a kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information, and the different statistical vectors of following 3 features are carried out based on statistical method:1st, user is calculated for a period of time, the registration at maximum non-air time interval in daily ticket;2nd, calculate in user's Duo tickets and overlap maximum non-call office every continuous number of days;3rd, the non-duration of call of maximum in the daily ticket of user is calculated.Multiple statistical vectors of the ticket are obtained according to above- mentioned information, based on AdaBoost sorting algorithms, and the residence of user and non-residence sample, model is trained, and obtain model using training and the residence of user is judged, pass through accurately characteristic vector to select so that be based ultimately upon the grader classifying qualities of AdaBoost algorithms more preferably, nicety of grading it is higher.
Description
Technical field
The invention belongs to work intelligence and machine Learning Theory, is believed more particularly to one kind based on user bill big data feature
The AdaBoost user residence method of discrimination of breath.
Background technology
Sorting algorithm is namely based on sorter model and chooses optimal classification vacation from optional classification for sample to be detected
If it belongs to machine learning category in artificial intelligence, the very big concern of field correlative study person is attracted.People put into
Substantial amounts of time and efforts research such as C4.5, SVMs, bayesian algorithm, AdaBoost algorithms and K- arest neighbors point
The sorting algorithms such as class algorithm, and they are applied to the different necks such as face recognition, person's handwriting checking, data analysis and medical application
Domain.
The words of AdaBoost mono- are derived from the abbreviation of Adaptive Boosting (adaptive enhancing), are by Yoav
The machine learning Meta algorithm that Freund and Robert Schapire are proposed.Its guideline designed is to ensure currently to train sample
Originally there is highest nicety of grading.By the way that by different Weak Classifiers, (so-called Weak Classifier refers to that nicety of grading is better than slightly here
Random guess) reasonably combine, strong classifier is formed, although the nicety of grading of each Weak Classifier is not high, final
Strong classifier obtains tremendous increase on classification performance.AdaBoost algorithms pass through tune in the sense that be adaptive
By the wrong sample weights divided of Weak Classifier before whole, the attention degree that follow-up Weak Classifier divides mistake sample is improved, is realized final
The design of sorter model.This is based on, the rational design of one group of Weak Classifier can be combined into strong classifier, obtain one
Gratifying nicety of grading on the whole.And training sample decides as the important evidence for judging, distinguishing relevant information
The success or failure that AdaBoost disaggregated models learn and judged.Therefore, excavation and the structure for the key feature that related objective is distinguished are determined
It is most important.
Telecommunication department produces the ticket big data of TB ranks daily, and a large amount of useful information are contained in the inside, by this
The excavation of a little information, it can be that telecom operators bring bigger additional economic benefit, telecom operators' intensification pair can also be made
The understanding of telecommunication user, provided for telecommunication user and preferably, more accurately service, experience.The present invention is based on this, is used by excavating
Big data information in the ticket of family, targetedly user's characteristic information is excavated and created, is calculated using AdaBoost intelligent classifications
Method identifies the possible residence information of user, and then provides the user more accurately customer service, improves Consumer's Experience.
The content of the invention
For the user bill big data of telecom operators' magnanimity, the present invention passes through the universal rule of life to general user
Analysis, distinguished, proposed a kind of based on the essence of correlated characteristic information, consistency in residence and non-residence user bill
AdaBoost user residence method of discrimination based on user bill big data characteristic information.
The different statistical vectors of following 3 features are carried out based on statistical method:1st, user is calculated for a period of time, daily ticket
The registration at the middle non-air time interval of maximum;2nd, calculate in user's Duo tickets and overlap maximum non-call office every continuous day
Number;3rd, the non-duration of call of maximum in the daily ticket of user is calculated.Multiple statistical vectors of the ticket are obtained according to above- mentioned information,
Based on AdaBoost sorting algorithms, and the residence of user and non-residence sample, model is trained, and use training
Obtained model is judged the residence of user, is selected by accurately characteristic vector so that be based ultimately upon AdaBoost
The grader classifying quality of algorithm more preferably, nicety of grading it is higher.
A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information, including following step
Suddenly:
(1) call frequency in user bill is searched out in ticket big data and most base station location information occurs
The number of same base based on connection of being conversed in user bill filters to ticket, by setting certain threshold
Value, searches out the possible residence candidate item information of the user, and the wherein field of voice and short message ticket composition typically at least wraps
Include the information such as calling and called phone number, air time, connection base station.
(2) call bill data to these base station location information is analyzed, calculated, and provides the residence institute in call bill data
Essence, the uniqueness characteristic information having, improve AdaBoost Algorithm for Training sample precisions
Pass through the correlated characteristic state to non-residence in correlated characteristic state in the user bill of residence and user bill
Difference, find basic gender gap therein.Meanwhile according to the general mankind work and rest rule and the relative continuity of habits and customs and
Invariance feature, find the following relevant parameter in user bill:For a long time without air time interval registration parameter (words daily
Dan Zhongwu calls starting point, end point, think its coincidence in certain time threshold error), for a long time without duration of call parameter,
For a long time without Session continuity parameter (for example, the number of days overlapped), be connected the information such as base station parameter for a long time.
For a certain specific user, it is general have be relatively fixed, constant work and rest rule, for example user is relative in residence
The fixed beginning time of having a rest, substantially constant sleep duration, regular time of getting up etc., and these information can pass through
The statistic of correlated characteristic in user bill big data is reflected indirectly.
(3) statistical information based on features described above is built, is realized based on AdaBoost sorting algorithms and is based on ticket big data
User residence precisely identify
Above-mentioned parameter is calculated according to the above-mentioned feature searched out, and based on statistical method, asks for their correspondence
The parameters such as average m, variances sigma and High Order Moment ξ, as the input sample of AdaBoost sorter models, are instructed to disaggregated model
Practice.And then according to features described above information, the AdaBoost disaggregated models obtained using study may to user in above-mentioned place
Residence make a distinction, judge with non-residence.
Mass users ticket big data is extracted based on constructed AdaBoost graders, differentiated, determines user's
Residence information;
Compared with prior art, the present invention has following obvious advantage and beneficial effect:
(1) present invention proposes a kind of AdaBoost user residence differentiation side based on user bill big data characteristic information
Method, universal rule of life, work and rest feature of the algorithm from general user, according to residence and non-residence in user institute shape
The basic sex differernce of characteristic into ticket, excavated based on the analysis to user bill big data a series of with strong differentiation
The characteristic parameter of characteristic, premise is provided for accurate, quick training AdaBoost sorter models.
(2) present invention utilizes the characteristic parameter in user bill big data, builds the ASSOCIATE STATISTICS of above-mentioned essence feature
The parameters such as amount, average value, variance, High Order Moment are classifier training as the characteristic parameter for judging the possible residence of user
The input sample that user is objective, stable is established with identification, it is ensured that for the possibility of the differentiation call bill data of mass users
The accurate differentiation of residence.
(3) present invention using AdaBoost sorting algorithms to user ticket big data analysis, excavate in, improve to
The rapidity and accuracy that the possible residence in family judges.
Brief description of the drawings
Fig. 1 is the flow chart of the AdaBoost user residence method of discrimination based on user bill big data characteristic information.
Embodiment
The present invention will be further described with reference to the accompanying drawings and detailed description.
As shown in figure 1, a kind of AdaBoost user residence based on user bill big data characteristic information of the present invention is sentenced
Other method comprises the following steps:
(1) the possible whole candidate residences information of user is extracted in ticket big data
The ticket big data of storage is filtered in oracle database, i.e., searched out in ticket big data a certain
There are more whole base station location information in call frequency in targeted customer's ticket;
(2) the possible residence information of user is filtered out from ticket big data
Call frequency and sequence under counting user different base station connection herein, to sorting, forward and base station geographic position connects
Near base station location merges, and filters out 5 later data of ranking and gives up;
(3) feature extraction in call bill data
After filtering out possible residence, relevant feature parameters are obtained from above-mentioned call bill data as AdaBoost classification
The characteristic of division of device, is specifically included:
More days long-times without call time of coincidence parameter, not on the same day (set time started t without the air times, at the end of
Between te) with (setting time started c with reference to days, end time ce, it is to occur frequency in user certain time through counting herein with reference to day
Rate highest ticket without call the time started and the end time) the time of coincidence without the air time, such as count certain time be
30 days, certain day and the time of coincidence parameter T with reference to daycFor:
1)Tc=ce-ts cs<ts<ce<te
2)Tc=te-cs ts<cs<te<ce
3)Tc=ce-cs ts<cs<ce<te
4)Tc=te-ts cs<ts<te<ce
For a long time without duration of call parameter l, directly obtained by ticket;
For a long time without Session continuity parameter s (for example, number of days of the time period without call in one month);
The information such as the time parameter t of a certain FX base station are connected for a long time, the information is directly obtained by call bill data
Take.
(4) characteristic statisticses parameter calculates
The parameters such as the average m of above-mentioned acquisition feature, variances sigma and High Order Moment ξ are calculated:Average refers in one group of data
In all data sums again divided by data number;Sample variance is numerically equal to form the stochastic variable and population mean of sample
Between difference square;High Order Moment refers to distance of the variable in different exponent numbers apart from zero point or center;
For example, here with more days long-times without call time of coincidence parameter TcExemplified by, for more days long-times without call weight
Close time parameter Tc, its average m (T can be calculated respectively according to the above methodc), its variances sigma (Tc) and its High Order Moment ξ (Tc).It is above-mentioned
Average, variance, the High Order Moment of other characteristic parameters can be obtained by same method.
(5) the disaggregated model training based on AdaBoost algorithms
In the present invention, a certain amount of known users life ground sample is obtained first, and calculates their character pair parameter, is wrapped
Include for a long time without air time interval registration parameter, for a long time without duration of call parameter, for a long time without Session continuity parameter,
The time parameter of a certain FX base station is connected for a long time, and then draws the parameters such as their average, variance, High Order Moment, and
Based on AdaBoost sorting algorithms to train classification models, provide based on the patent formula, characteristic parameter and AdaBoost algorithms
Train the sorter model of gained;
(6) the user residence information based on AdaBoost disaggregated models is differentiated
Gained AdaBoost graders are trained based on features described above, the call bill data of targeted customer is analyzed, extracted
Correlated characteristic, provide the possible residence of each user.
By above-mentioned specific steps, the ticket big data information based on telecom operators can be realized accurately and rapidly pair
The possible residence of user carries out debating knowledge.
The AdaBoost user residence method of discrimination based on user bill big data characteristic information of the present invention, is related to
Following steps:1st, searched out in ticket big data in user bill it is continuous more days (it is self-defined, can be 1 month, 1
Season etc.), there is most base station location information in call frequency;2nd, the call bill data to these location informations is analyzed, united
Meter, excavates its inherent law, provides the characteristic information in call bill data;3rd, according to features described above information, using AdaBoost points
Class algorithm, the possible residence of user is made a distinction with non-residence in above-mentioned place, judged.Wherein, the 2nd step
In, it is related to excavation, analysis, the calculating to key difference characteristic information in ticket big data, whether they can be accurate if decide
Really, efficiently above-mentioned location information is made a distinction.The present invention passes through the analysis of the universal rule of life to general user, foundation
The typically relative continuity of mankind's work and rest rule and habits and customs and Invariance feature, are talked about based on residence and non-residence user
The essence of correlated characteristic information in list, consistency difference, propose a kind of based on user bill big data characteristic information
AdaBoost user residence method of discrimination.The present invention carries out the different statistical vectors of following 3 features based on statistical method:1、
Calculate user for a period of time, the registration at maximum non-air time interval in daily ticket;2nd, weight in user's Duo tickets is calculated
Maximum non-call office is closed every continuous number of days;3rd, the non-duration of call of maximum in the daily ticket of user is calculated.According to above- mentioned information,
Multiple statistical vectors are derived based on ticket, based on AdaBoost sorting algorithm models, to residence and the non-residence of user
It is trained, and obtains model using training and the residence of user is judged so that is based ultimately upon AdaBoost algorithms
Grader classifying quality more preferably, nicety of grading it is higher.
Claims (2)
- A kind of 1. AdaBoost user residence method of discrimination based on user bill big data characteristic information, it is characterised in that Comprise the following steps:The possible whole candidate residences information of user is extracted in step (1), ticket big data;Step (2), the possible residence information of user is filtered out from ticket big dataFeature extraction in step (3), call bill dataAfter filtering out possible residence, relevant feature parameters are obtained from above-mentioned call bill data as AdaBoost graders Characteristic of division, specifically include:More days long-times without call time of coincidence parameter, not on the same day (set time started t without the air times, end time te) With (setting time started c with reference to days, end time ce, it is through counting in the user certain time frequency of occurrences most herein with reference to day High ticket without call the time started and the end time) the time of coincidence without the air time, such as count certain time be 30 days, Certain day and the time of coincidence parameter T with reference to daycFor:1)Tc=ce-ts cs<ts<ce<te2)Tc=te-cs ts<cs<te<ce3)Tc=ce-cs ts<cs<ce<te4)Tc=te-ts cs<ts<te<ceFor a long time without duration of call parameter l, directly obtained by ticket;For a long time without Session continuity parameter s;The time parameter t information of a certain FX base station is connected for a long time, and the information is directly obtained by call bill data.Step (4) characteristic statisticses parameter calculatesThe average m of above-mentioned acquisition feature, variances sigma and High Order Moment ξ parameters are calculated:Average refers to own in one group of data Data sum again divided by data number;Sample variance is numerically equal between stochastic variable and the population mean of composition sample Square of difference;High Order Moment refers to distance of the variable in different exponent numbers apart from zero point or center;Disaggregated model training of the step (5) based on AdaBoost algorithmsA certain amount of known users life ground sample is obtained first, and calculates their character pair parameter, including:For a long time without logical Words time interval registration parameter, for a long time without duration of call parameter, for a long time without Session continuity parameter, connect certain for a long time The time parameter of one FX base station, and then their average, variance, High Order Moment parameter are drawn, and based on AdaBoost points Class Algorithm for Training disaggregated model, obtains AdaBoost sorter models;User residence information of the step (6) based on AdaBoost sorter models is differentiatedGained AdaBoost graders are trained based on features described above, the call bill data of targeted customer analyzed, extraction is related Feature, provide the possible residence of each user.
- 2. the AdaBoost user residence differentiation side based on user bill big data characteristic information as claimed in claim 1 Method, it is characterised in that the call frequency under the connection of counting user different base station and sequence in step (2), to the forward and base that sorts The close base station location in geographical position of standing merges, and filters out 5 later data of ranking and gives up.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710975842.2A CN107609196A (en) | 2017-10-19 | 2017-10-19 | A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710975842.2A CN107609196A (en) | 2017-10-19 | 2017-10-19 | A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107609196A true CN107609196A (en) | 2018-01-19 |
Family
ID=61078613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710975842.2A Pending CN107609196A (en) | 2017-10-19 | 2017-10-19 | A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107609196A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103052022A (en) * | 2011-10-17 | 2013-04-17 | ***通信集团公司 | User stabile point discovering method and system based on mobile behaviors |
CN105513351A (en) * | 2015-12-17 | 2016-04-20 | 北京亚信蓝涛科技有限公司 | Traffic travel characteristic data extraction method based on big data |
CN107133265A (en) * | 2017-03-31 | 2017-09-05 | 咪咕动漫有限公司 | A kind of method and device of identification behavior abnormal user |
WO2017168490A1 (en) * | 2016-03-28 | 2017-10-05 | アイホン株式会社 | Intercom system |
-
2017
- 2017-10-19 CN CN201710975842.2A patent/CN107609196A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103052022A (en) * | 2011-10-17 | 2013-04-17 | ***通信集团公司 | User stabile point discovering method and system based on mobile behaviors |
CN105513351A (en) * | 2015-12-17 | 2016-04-20 | 北京亚信蓝涛科技有限公司 | Traffic travel characteristic data extraction method based on big data |
WO2017168490A1 (en) * | 2016-03-28 | 2017-10-05 | アイホン株式会社 | Intercom system |
CN107133265A (en) * | 2017-03-31 | 2017-09-05 | 咪咕动漫有限公司 | A kind of method and device of identification behavior abnormal user |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106550155B (en) | Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted | |
CN107798032B (en) | Method and device for processing response message in self-service voice conversation | |
CN107222865A (en) | The communication swindle real-time detection method and system recognized based on suspicious actions | |
CN109819126B (en) | Abnormal number identification method and device | |
CN109615116A (en) | A kind of telecommunication fraud event detecting method and detection system | |
CN111159387B (en) | Recommendation method based on multi-dimensional alarm information text similarity analysis | |
CN108924333A (en) | Fraudulent call recognition methods, device and system | |
CN107729940A (en) | A kind of user bill big data base station connection information customer relationship estimates method | |
WO2017186090A1 (en) | Communication number processing method and apparatus | |
CN109800600A (en) | Ocean big data susceptibility assessment system and prevention method towards privacy requirements | |
CN106385693A (en) | Telecommunication fraud method for virtual number segments | |
CN109474756B (en) | Telecommunication anomaly detection method based on collaborative network representation learning | |
CN112866486B (en) | Multi-source feature-based fraud telephone identification method, system and equipment | |
CN108550050A (en) | A kind of user's portrait method based on call center data | |
CN112001170A (en) | Method and system for recognizing deformed sensitive words | |
CN110381218A (en) | A kind of method and device identifying telephone fraud clique | |
CN114841705B (en) | Anti-fraud monitoring method based on scene recognition | |
CN111510368A (en) | Family group identification method, device, equipment and computer readable storage medium | |
CN111105064A (en) | Method and device for determining suspected information of fraud event | |
CN105930430B (en) | Real-time fraud detection method and device based on non-accumulative attribute | |
CN110717068B (en) | Video retrieval method based on deep learning | |
CN110362828B (en) | Network information risk identification method and system | |
CN107609196A (en) | A kind of AdaBoost user residence method of discrimination based on user bill big data characteristic information | |
CN106407878A (en) | Face detection method and device based on multiple classifiers | |
CN114048294B (en) | Similar population extension model training method, similar population extension method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180119 |
|
RJ01 | Rejection of invention patent application after publication |